标签归档:types

如何在Pandas中找到数字列?

问题:如何在Pandas中找到数字列?

假设df是一个熊猫DataFrame。我想找到所有数字类型的列。就像是:

isNumeric = is_numeric(df)

Let’s say df is a pandas DataFrame. I would like to find all columns of numeric type. Something like:

isNumeric = is_numeric(df)

回答 0

您可以使用select_dtypesDataFrame的方法。它包括两个参数include和exclude。所以isNumeric看起来像:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)

You could use select_dtypes method of DataFrame. It includes two parameters include and exclude. So isNumeric would look like:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

newdf = df.select_dtypes(include=numerics)

回答 1

您可以使用未记录的功能_get_numeric_data()来仅过滤数字列:

df._get_numeric_data()

例:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

注意,这是一个“私有方法”(即实现细节),将来可能会更改或完全删除。请谨慎使用

You can use the undocumented function _get_numeric_data() to filter only numeric columns:

df._get_numeric_data()

Example:

In [32]: data
Out[32]:
   A  B
0  1  s
1  2  s
2  3  s
3  4  s

In [33]: data._get_numeric_data()
Out[33]:
   A
0  1
1  2
2  3
3  4

Note that this is a “private method” (i.e., an implementation detail) and is subject to change or total removal in the future. Use with caution.


回答 2

简单的单行答案即可创建仅包含数字列的新数据框:

df.select_dtypes(include=np.number)

如果需要数字列的名称:

df.select_dtypes(include=np.number).columns.tolist()

完整的代码:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']

Simple one-line answer to create a new dataframe with only numeric columns:

df.select_dtypes(include=np.number)

If you want the names of numeric columns:

df.select_dtypes(include=np.number).columns.tolist()

Complete code:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': range(7, 10),
                   'B': np.random.rand(3),
                   'C': ['foo','bar','baz'],
                   'D': ['who','what','when']})
df
#    A         B    C     D
# 0  7  0.704021  foo   who
# 1  8  0.264025  bar  what
# 2  9  0.230671  baz  when

df_numerics_only = df.select_dtypes(include=np.number)
df_numerics_only
#    A         B
# 0  7  0.704021
# 1  8  0.264025
# 2  9  0.230671

colnames_numerics_only = df.select_dtypes(include=np.number).columns.tolist()
colnames_numerics_only
# ['A', 'B']

回答 3

df.select_dtypes(exclude=['object'])
df.select_dtypes(exclude = ['object'])

Update

df.select_dtypes(inlcude = np.number)
#or with new version of panda
df.select_dtypes('number')

回答 4

简单的一线:

df.select_dtypes('number').columns

Simple one-liner:

df.select_dtypes('number').columns

回答 5

以下代码将返回数据集的数字列的名称列表。

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

marketing_train是我的数据集,它select_dtypes()是使用exclude和include参数选择数据类型的功能,而column用于获取上述代码输出的数据集的列名,如下所示:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']

谢谢

Following codes will return list of names of the numeric columns of a data set.

cnames=list(marketing_train.select_dtypes(exclude=['object']).columns)

here marketing_train is my data set and select_dtypes() is function to select data types using exclude and include arguments and columns is used to fetch the column name of data set output of above code will be following:

['custAge',
     'campaign',
     'pdays',
     'previous',
     'emp.var.rate',
     'cons.price.idx',
     'cons.conf.idx',
     'euribor3m',
     'nr.employed',
     'pmonths',
     'pastEmail']

Thanks


回答 6

这是用于在熊猫数据框中查找数字列的另一种简单代码,

numeric_clmns = df.dtypes[df.dtypes != "object"].index 

This is another simple code for finding numeric column in pandas data frame,

numeric_clmns = df.dtypes[df.dtypes != "object"].index 

回答 7

def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)
def is_type(df, baseType):
    import numpy as np
    import pandas as pd
    test = [issubclass(np.dtype(d).type, baseType) for d in df.dtypes]
    return pd.DataFrame(data = test, index = df.columns, columns = ["test"])
def is_float(df):
    import numpy as np
    return is_type(df, np.float)
def is_number(df):
    import numpy as np
    return is_type(df, np.number)
def is_integer(df):
    import numpy as np
    return is_type(df, np.integer)

回答 8

改编这个答案,你可以做

df.ix[:,df.applymap(np.isreal).all(axis=0)]

在这里,np.applymap(np.isreal)显示数据框中的每个单元格是否都是数字,并.axis(all=0)检查列中的所有值是否均为True,并返回一系列布尔值,这些布尔值可用于索引所需的列。

Adapting this answer, you could do

df.ix[:,df.applymap(np.isreal).all(axis=0)]

Here, np.applymap(np.isreal) shows whether every cell in the data frame is numeric, and .axis(all=0) checks if all values in a column are True and returns a series of Booleans that can be used to index the desired columns.


回答 9

请看下面的代码:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

这样,您可以检查值是否为数字,例如float和int或srting值。第二条if语句用于检查对象引用的字符串值。

Please see the below code:

if(dataset.select_dtypes(include=[np.number]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.number]).describe())
if(dataset.select_dtypes(include=[np.object]).shape[1] > 0):
display(dataset.select_dtypes(include=[np.object]).describe())

This way you can check whether the value are numeric such as float and int or the srting values. the second if statement is used for checking the string values which is referred by the object.


回答 10

我们可以根据以下要求包括和排除数据类型:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

从Jupyter Notebook引用。

要选择所有数字类型,请使用np.number'number'

  • 要选择字符串,您必须使用objectdtype,但是请注意,这将返回所有对象dtype列

  • NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

  • 要选择日期时间,使用np.datetime64'datetime''datetime64'

  • 要选择timedeltas,使用np.timedelta64'timedelta''timedelta64'

  • 要选择Pandas类别dtype,请使用 'category'

  • 要选择Pandas datetimetz dtypes,请使用'datetimetz'(0.20.0中的新功能)或“’datetime64 [ns,tz]’

We can include and exclude data types as per the requirement as below:

train.select_dtypes(include=None, exclude=None)
train.select_dtypes(include='number') #will include all the numeric types

Referred from Jupyter Notebook.

To select all numeric types, use np.number or 'number'

  • To select strings you must use the object dtype but note that this will return all object dtype columns

  • See the NumPy dtype hierarchy <http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html>__

  • To select datetimes, use np.datetime64, 'datetime' or 'datetime64'

  • To select timedeltas, use np.timedelta64, 'timedelta' or 'timedelta64'

  • To select Pandas categorical dtypes, use 'category'

  • To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or “’datetime64[ns, tz]’


检查对象是否为数字的最有效方法是什么?

问题:检查对象是否为数字的最有效方法是什么?

给定一个任意的python对象,确定它是否为数字的最佳方法是什么?这里is定义为acts like a number in certain circumstances

例如,假设您正在编写向量类。如果给定另一个向量,则要查找点积。如果给出标量,则要缩放整个矢量。

检查,如果事情是intfloatlongbool很烦人,不包括可能像数字用户定义的对象。但是,__mul__例如,检查并不够好,因为我刚刚描述的向量类将定义__mul__,但它不是我想要的那种类型。

Given an arbitrary python object, what’s the best way to determine whether it is a number? Here is is defined as acts like a number in certain circumstances.

For example, say you are writing a vector class. If given another vector, you want to find the dot product. If given a scalar, you want to scale the whole vector.

Checking if something is int, float, long, bool is annoying and doesn’t cover user-defined objects that might act like numbers. But, checking for __mul__, for example, isn’t good enough because the vector class I just described would define __mul__, but it wouldn’t be the kind of number I want.


回答 0

使用Numbernumbers模块测试isinstance(n, Number)(因为2.6可用)。

>>> from numbers import Number
... from decimal import Decimal
... from fractions import Fraction
... for n in [2, 2.0, Decimal('2.0'), complex(2, 0), Fraction(2, 1), '2']:
...     print(f'{n!r:>14} {isinstance(n, Number)}')
              2 True
            2.0 True
 Decimal('2.0') True
         (2+0j) True
 Fraction(2, 1) True
            '2' False

当然,这与鸭子的打字相反。如果你更关心的对象如何行为,而不是它什么,执行您的操作,如果你有一个号码,使用异常,否则告诉你。

Use Number from the numbers module to test isinstance(n, Number) (available since 2.6).

>>> from numbers import Number
... from decimal import Decimal
... from fractions import Fraction
... for n in [2, 2.0, Decimal('2.0'), complex(2, 0), Fraction(2, 1), '2']:
...     print(f'{n!r:>14} {isinstance(n, Number)}')
              2 True
            2.0 True
 Decimal('2.0') True
         (2+0j) True
 Fraction(2, 1) True
            '2' False

This is, of course, contrary to duck typing. If you are more concerned about how an object acts rather than what it is, perform your operations as if you have a number and use exceptions to tell you otherwise.


回答 1

您要检查是否有物体

在某些情况下像数字一样

如果您使用的是Python 2.5或更早版本,则唯一的真实方法是检查某些“特定情况”并查看。

在2.6或更好的,你可以使用isinstancenumbers.Number -一个抽象基类(ABC)存在正是为了这个目的(其它更多的ABC中存在的collections模块为各种形式的集合/容器,重新开始与2.6;以及同样仅在这些发行版中,如果需要,您可以轻松地添加自己的抽象基类。

0在某些情况下,Bach达到2.5或更早版本时,“可以添加但不能迭代”可能是一个很好的定义。但是,您确实需要问自己,您要问的是,您要考虑的“数字”一定一定能够做的,而它绝对不能做的是什么,然后检查。

在2.6或更高版本中也可能需要这样做,也许是出于进行自己的注册以添加您尚未注册的您所关心的类型的目的numbers.Numbers-如果您想排除某些声称其为数字的类型,但是您只是无法处理,这需要更多的注意,因为ABC没有unregister方法[[例如,您可以制作自己的ABC WeirdNum并在其中注册所有此类怪异类型,然后isinstance在继续进行之前先检查其保释金检查isinstance正常numbers.Number是否继续成功。

顺便说一句,是否以及何时需要检查是否x可以做某事,通常必须尝试以下操作:

try: 0 + x
except TypeError: canadd=False
else: canadd=True

__add__本身的存在告诉您没有什么用处,因为例如所有序列都具有将其与其他序列连接的目的。例如,此检查等效于定义“数字是某种东西,使得这样的事物的序列是内置函数的有效单个参数sum”。完全怪异的类型(例如,总和为0时引发“错误的”异常的类型,例如a ZeroDivisionErrorValueError&c)将传播异常,但这没关系,让用户尽快知道这样的疯狂类型根本不能接受公司;-); 但是,一个可乘以标量的“向量”(Python的标准库没有),但是在这里它当然是作为第三方扩展而流行的),在这里也会给出错误的结果,因此(例如“不允许迭代”(例如,检查是否iter(x)加注TypeError,或者是否存在特殊方法__iter__-如果您的年龄在2.5或更早,因此需要您自己进行检查)。

简要了解一下此类复杂性可能足以激励您在可行的情况下改为依赖抽象基类。

You want to check if some object

acts like a number in certain circumstances

If you’re using Python 2.5 or older, the only real way is to check some of those “certain circumstances” and see.

In 2.6 or better, you can use isinstance with numbers.Number — an abstract base class (ABC) that exists exactly for this purpose (lots more ABCs exist in the collections module for various forms of collections/containers, again starting with 2.6; and, also only in those releases, you can easily add your own abstract base classes if you need to).

Bach to 2.5 and earlier, “can be added to 0 and is not iterable” could be a good definition in some cases. But, you really need to ask yourself, what it is that you’re asking that what you want to consider “a number” must definitely be able to do, and what it must absolutely be unable to do — and check.

This may also be needed in 2.6 or later, perhaps for the purpose of making your own registrations to add types you care about that haven’t already be registered onto numbers.Numbers — if you want to exclude some types that claim they’re numbers but you just can’t handle, that takes even more care, as ABCs have no unregister method [[for example you could make your own ABC WeirdNum and register there all such weird-for-you types, then first check for isinstance thereof to bail out before you proceed to checking for isinstance of the normal numbers.Number to continue successfully.

BTW, if and when you need to check if x can or cannot do something, you generally have to try something like:

try: 0 + x
except TypeError: canadd=False
else: canadd=True

The presence of __add__ per se tells you nothing useful, since e.g all sequences have it for the purpose of concatenation with other sequences. This check is equivalent to the definition “a number is something such that a sequence of such things is a valid single argument to the builtin function sum“, for example. Totally weird types (e.g. ones that raise the “wrong” exception when summed to 0, such as, say, a ZeroDivisionError or ValueError &c) will propagate exception, but that’s OK, let the user know ASAP that such crazy types are just not acceptable in good company;-); but, a “vector” that’s summable to a scalar (Python’s standard library doesn’t have one, but of course they’re popular as third party extensions) would also give the wrong result here, so (e.g.) this check should come after the “not allowed to be iterable” one (e.g., check that iter(x) raises TypeError, or for the presence of special method __iter__ — if you’re in 2.5 or earlier and thus need your own checks).

A brief glimpse at such complications may be sufficient to motivate you to rely instead on abstract base classes whenever feasible…;-).


回答 2

这是一个exceptions真正发光的好例子。只需执行对数字类型的处理,然后TypeError从其他所有类型中捕获即可。

但是显然,这只会检查操作是否有效,而不是是否有意义!唯一真正的解决方案是永远不要混合类型,并且始终确切地知道您的值属于什么类型类。

This is a good example where exceptions really shine. Just do what you would do with the numeric types and catch the TypeError from everything else.

But obviously, this only checks if a operation works, not whether it makes sense! The only real solution for that is to never mix types and always know exactly what typeclass your values belong to.


回答 3

将对象乘以零。任何数字乘以零就是零。其他任何结果均表示该对象不是数字(包括异常)

def isNumber(x):
    try:
        return bool(0 == x*0)
    except:
        return False

因此,使用isNumber将给出以下输出:

class A: pass 

def foo(): return 1

for x in [1,1.4, A(), range(10), foo, foo()]:
    answer = isNumber(x)
    print('{answer} == isNumber({x})'.format(**locals()))

输出:

True == isNumber(1)
True == isNumber(1.4)
False == isNumber(<__main__.A instance at 0x7ff52c15d878>)
False == isNumber([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
False == isNumber(<function foo at 0x7ff52c121488>)
True == isNumber(1)

世界上可能有一些非数字对象定义__mul__为乘以零时返回零,但这是一个极端的exceptions。该解决方案应涵盖您生成/诱骗的所有正常代码和健全代码。

numpy.array示例:

import numpy as np

def isNumber(x):
    try:
        return bool(x*0 == 0)
    except:
        return False

x = np.array([0,1])

answer = isNumber(x)
print('{answer} == isNumber({x})'.format(**locals()))

输出:

False == isNumber([0 1])

Multiply the object by zero. Any number times zero is zero. Any other result means that the object is not a number (including exceptions)

def isNumber(x):
    try:
        return bool(0 == x*0)
    except:
        return False

Using isNumber thusly will give the following output:

class A: pass 

def foo(): return 1

for x in [1,1.4, A(), range(10), foo, foo()]:
    answer = isNumber(x)
    print('{answer} == isNumber({x})'.format(**locals()))

Output:

True == isNumber(1)
True == isNumber(1.4)
False == isNumber(<__main__.A instance at 0x7ff52c15d878>)
False == isNumber([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
False == isNumber(<function foo at 0x7ff52c121488>)
True == isNumber(1)

There probably are some non-number objects in the world that define __mul__ to return zero when multiplied by zero but that is an extreme exception. This solution should cover all normal and sane code that you generate/encouter.

numpy.array example:

import numpy as np

def isNumber(x):
    try:
        return bool(x*0 == 0)
    except:
        return False

x = np.array([0,1])

answer = isNumber(x)
print('{answer} == isNumber({x})'.format(**locals()))

output:

False == isNumber([0 1])

回答 4

为了改写您的问题,您正在尝试确定某物是集合还是单个值。试图比较某物是矢量还是数字,就是将苹果与橘子进行比较-我可以使用字符串或数字的矢量,也可以使用单个字符串或数字。您对拥有的数量(1个或更多)感兴趣,而不是实际的类型感兴趣

我对此问题的解决方案是通过检查是否存在来检查输入是单个值还是集合__len__。例如:

def do_mult(foo, a_vector):
    if hasattr(foo, '__len__'):
        return sum([a*b for a,b in zip(foo, a_vector)])
    else:
        return [foo*b for b in a_vector]

或者,对于鸭子输入法,您可以先尝试迭代foo

def do_mult(foo, a_vector):
    try:
        return sum([a*b for a,b in zip(foo, a_vector)])
    except TypeError:
        return [foo*b for b in a_vector]

最终,测试某事物是否类似于矢量比测试某事物是否类似于标量要容易。如果您遇到不同类型的值(例如,字符串,数字等),那么程序的逻辑可能需要做些工作-您最终是如何尝试将字符串乘以数字向量的?

To rephrase your question, you are trying to determine whether something is a collection or a single value. Trying to compare whether something is a vector or a number is comparing apples to oranges – I can have a vector of strings or numbers, and I can have a single string or single number. You are interested in how many you have (1 or more), not what type you actually have.

my solution for this problem is to check whether the input is a single value or a collection by checking the presence of __len__. For example:

def do_mult(foo, a_vector):
    if hasattr(foo, '__len__'):
        return sum([a*b for a,b in zip(foo, a_vector)])
    else:
        return [foo*b for b in a_vector]

Or, for the duck-typing approach, you can try iterating on foo first:

def do_mult(foo, a_vector):
    try:
        return sum([a*b for a,b in zip(foo, a_vector)])
    except TypeError:
        return [foo*b for b in a_vector]

Ultimately, it is easier to test whether something is vector-like than to test whether something is scalar-like. If you have values of different type (i.e. string, numeric, etc.) coming through, then the logic of your program may need some work – how did you end up trying to multiply a string by a numeric vector in the first place?


回答 5

总结/评估现有方法:

Candidate    | type                      | delnan | mat | shrewmouse | ant6n
-------------------------------------------------------------------------
0            | <type 'int'>              |      1 |   1 |          1 |     1
0.0          | <type 'float'>            |      1 |   1 |          1 |     1
0j           | <type 'complex'>          |      1 |   1 |          1 |     0
Decimal('0') | <class 'decimal.Decimal'> |      1 |   0 |          1 |     1
True         | <type 'bool'>             |      1 |   1 |          1 |     1
False        | <type 'bool'>             |      1 |   1 |          1 |     1
''           | <type 'str'>              |      0 |   0 |          0 |     0
None         | <type 'NoneType'>         |      0 |   0 |          0 |     0
'0'          | <type 'str'>              |      0 |   0 |          0 |     1
'1'          | <type 'str'>              |      0 |   0 |          0 |     1
[]           | <type 'list'>             |      0 |   0 |          0 |     0
[1]          | <type 'list'>             |      0 |   0 |          0 |     0
[1, 2]       | <type 'list'>             |      0 |   0 |          0 |     0
(1,)         | <type 'tuple'>            |      0 |   0 |          0 |     0
(1, 2)       | <type 'tuple'>            |      0 |   0 |          0 |     0

(我是通过这个问题来到这里的)

#!/usr/bin/env python

"""Check if a variable is a number."""

import decimal


def delnan_is_number(candidate):
    import numbers
    return isinstance(candidate, numbers.Number)


def mat_is_number(candidate):
    return isinstance(candidate, (int, long, float, complex))


def shrewmouse_is_number(candidate):
    try:
        return 0 == candidate * 0
    except:
        return False


def ant6n_is_number(candidate):
    try:
        float(candidate)
        return True
    except:
        return False

# Test
candidates = (0, 0.0, 0j, decimal.Decimal(0),
              True, False, '', None, '0', '1', [], [1], [1, 2], (1, ), (1, 2))

methods = [delnan_is_number, mat_is_number, shrewmouse_is_number, ant6n_is_number]

print("Candidate    | type                      | delnan | mat | shrewmouse | ant6n")
print("-------------------------------------------------------------------------")
for candidate in candidates:
    results = [m(candidate) for m in methods]
    print("{:<12} | {:<25} | {:>6} | {:>3} | {:>10} | {:>5}"
          .format(repr(candidate), type(candidate), *results))

To summarize / evaluate existing methods:

Candidate    | type                      | delnan | mat | shrewmouse | ant6n
-------------------------------------------------------------------------
0            | <type 'int'>              |      1 |   1 |          1 |     1
0.0          | <type 'float'>            |      1 |   1 |          1 |     1
0j           | <type 'complex'>          |      1 |   1 |          1 |     0
Decimal('0') | <class 'decimal.Decimal'> |      1 |   0 |          1 |     1
True         | <type 'bool'>             |      1 |   1 |          1 |     1
False        | <type 'bool'>             |      1 |   1 |          1 |     1
''           | <type 'str'>              |      0 |   0 |          0 |     0
None         | <type 'NoneType'>         |      0 |   0 |          0 |     0
'0'          | <type 'str'>              |      0 |   0 |          0 |     1
'1'          | <type 'str'>              |      0 |   0 |          0 |     1
[]           | <type 'list'>             |      0 |   0 |          0 |     0
[1]          | <type 'list'>             |      0 |   0 |          0 |     0
[1, 2]       | <type 'list'>             |      0 |   0 |          0 |     0
(1,)         | <type 'tuple'>            |      0 |   0 |          0 |     0
(1, 2)       | <type 'tuple'>            |      0 |   0 |          0 |     0

(I came here by this question)

Code

#!/usr/bin/env python

"""Check if a variable is a number."""

import decimal


def delnan_is_number(candidate):
    import numbers
    return isinstance(candidate, numbers.Number)


def mat_is_number(candidate):
    return isinstance(candidate, (int, long, float, complex))


def shrewmouse_is_number(candidate):
    try:
        return 0 == candidate * 0
    except:
        return False


def ant6n_is_number(candidate):
    try:
        float(candidate)
        return True
    except:
        return False

# Test
candidates = (0, 0.0, 0j, decimal.Decimal(0),
              True, False, '', None, '0', '1', [], [1], [1, 2], (1, ), (1, 2))

methods = [delnan_is_number, mat_is_number, shrewmouse_is_number, ant6n_is_number]

print("Candidate    | type                      | delnan | mat | shrewmouse | ant6n")
print("-------------------------------------------------------------------------")
for candidate in candidates:
    results = [m(candidate) for m in methods]
    print("{:<12} | {:<25} | {:>6} | {:>3} | {:>10} | {:>5}"
          .format(repr(candidate), type(candidate), *results))

回答 6

最好以相反的方式进行操作:检查它是否是向量。如果是,则进行点积运算,在所有其他情况下,将尝试进行标量乘法。

检查向量很容易,因为它应该是向量类类型(或从其继承)。您也可以先尝试做一个点积,如果失败了(=它实际上不是一个向量),然后退回到标量乘法。

Probably it’s better to just do it the other way around: You check if it’s a vector. If it is, you do a dot product and in all other cases you attempt scalar multiplication.

Checking for the vector is easy, since it should of your vector class type (or inherited from it). You could also just try first to do a dot-product, and if that fails (= it wasn’t really a vector), then fall back to scalar multiplication.


回答 7

只是为了补充。也许我们可以如下结合使用isinstance和isdigit来确定值是否为数字(int,float等)

如果isinstance(num1,int)或isinstance(num1,float)或num1.isdigit():

Just to add upon. Perhaps we can use a combination of isinstance and isdigit as follows to find whether a value is a number (int, float, etc)

if isinstance(num1, int) or isinstance(num1 , float) or num1.isdigit():


回答 8

对于假设的向量类:

假设v是一个向量,我们将其乘以x。如果是有意义的繁衍每个组件v通过x,我们或许意味着,所以尝试,第一。如果没有,也许我们可以点吗?否则是类型错误。

编辑 -以下代码不起作用,因为2*[0]==[0,0]而不是引发TypeError。我将其保留,因为它已被评论。

def __mul__( self, x ):
    try:
        return [ comp * x for comp in self ]
    except TypeError:
        return [ x * y for x, y in itertools.zip_longest( self, x, fillvalue = 0 )

For the hypothetical vector class:

Suppose v is a vector, and we are multiplying it by x. If it makes sense to multiply each component of v by x, we probably meant that, so try that first. If not, maybe we can dot? Otherwise it’s a type error.

EDIT — the below code doesn’t work, because 2*[0]==[0,0] instead of raising a TypeError. I leave it because it was commented-upon.

def __mul__( self, x ):
    try:
        return [ comp * x for comp in self ]
    except TypeError:
        return [ x * y for x, y in itertools.zip_longest( self, x, fillvalue = 0 )

回答 9

在实现某种矢量类时,我遇到了类似的问题。检查数字的一种方法是只转换为一个,即使用

float(x)

这应该拒绝x不能转换为数字的情况;但也可能会拒绝其他可能有效的类似数字的结构,例如复数。

I had a similar issue, when implementing a sort of vector class. One way to check for a number is to just convert to one, i.e. by using

float(x)

This should reject cases where x cannot be converted to a number; but may also reject other kinds of number-like structures that could be valid, for example complex numbers.


回答 10

如果要根据参数类型调用不同的方法,请查看multipledispatch

例如,假设您正在编写向量类。如果给定另一个向量,则要查找点积。如果给出标量,则要缩放整个矢量。

from multipledispatch import dispatch

class Vector(list):

    @dispatch(object)
    def __mul__(self, scalar):
        return Vector( x*scalar for x in self)

    @dispatch(list)
    def __mul__(self, other):
        return sum(x*y for x,y in zip(self, other))


>>> Vector([1,2,3]) * Vector([2,4,5])   # Vector time Vector is dot product
25
>>> Vector([1,2,3]) * 2                 # Vector times scalar is scaling
[2, 4, 6]

不幸的是,(据我所知)我们无法编写代码,@dispatch(Vector)因为我们仍在定义type Vector,因此尚未定义类型名称。相反,我使用的是基类型list,它甚至允许您找到a Vector和a 的点积list

If you want to call different methods depending on the argument type(s), look into multipledispatch.

For example, say you are writing a vector class. If given another vector, you want to find the dot product. If given a scalar, you want to scale the whole vector.

from multipledispatch import dispatch

class Vector(list):

    @dispatch(object)
    def __mul__(self, scalar):
        return Vector( x*scalar for x in self)

    @dispatch(list)
    def __mul__(self, other):
        return sum(x*y for x,y in zip(self, other))


>>> Vector([1,2,3]) * Vector([2,4,5])   # Vector time Vector is dot product
25
>>> Vector([1,2,3]) * 2                 # Vector times scalar is scaling
[2, 4, 6]

Unfortunately, (to my knowledge) we can’t write @dispatch(Vector) since we are still defining the type Vector, so that type name is not yet defined. Instead, I’m using the base type list, which allows you to even find the dot product of a Vector and a list.


回答 11

简短的方法:

obj = 12345
print(isinstance(obj,int))

输出:

True

如果对象是字符串,则将返回’False’:

obj = 'some string'
print(isinstance(obj,int))

输出:

False

Short and simple way :

obj = 12345
print(isinstance(obj,int))

Output :

True

If the object is a string, ‘False’ will be returned :

obj = 'some string'
print(isinstance(obj,int))

Output :

False

回答 12

您有一个数据项,说rec_day当写入文件时将是一个float。但程序处理期间,可以是floatintstr类型(str初始化一个新的记录时被使用并且包含一个伪标记的值)。

然后,您可以检查一下是否有此号码

                type(rec_day) != str 

我已经以这种方式构造了一个python程序,然后将其作为数字检查放入“维护补丁”中。这是Python方式吗?很可能没有,因为我以前使用COBOL编程。

You have a data item, say rec_day that when written to a file will be a float. But during program processing it can be either float, int or str type (the str is used when initializing a new record and contains a dummy flag value).

You can then check to see if you have a number with this

                type(rec_day) != str 

I’ve structured a python program this way and just put in ‘maintenance patch’ using this as a numeric check. Is it the Pythonic way? Most likely not since I used to program in COBOL.


回答 13

您可以使用isdigit()函数。

>>> x = "01234"
>>> a.isdigit()
True
>>> y = "1234abcd"
>>> y.isdigit()
False

You could use the isdigit() function.

>>> x = "01234"
>>> a.isdigit()
True
>>> y = "1234abcd"
>>> y.isdigit()
False

DataFrame中的字符串,但dtype是object

问题:DataFrame中的字符串,但dtype是object

为什么Pandas告诉我我有对象,尽管所选列中的每个项目都是一个字符串-即使经过显式转换也是如此。

这是我的DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id            56992  non-null values
attr1         56992  non-null values
attr2         56992  non-null values
attr3         56992  non-null values
attr4         56992  non-null values
attr5         56992  non-null values
attr6         56992  non-null values
dtypes: int64(2), object(5)

他们五个dtype object。我将这些对象明确转换为字符串:

for c in df.columns:
    if df[c].dtype == object:
        print "convert ", df[c].name, " to string"
        df[c] = df[c].astype(str)

然后,尽管显示,df["attr2"]仍然是正确的。dtype objecttype(df["attr2"].ix[0]str

熊猫区分int64float64object。没有时背后的逻辑是什么dtype str?为什么被str覆盖object

Why does Pandas tell me that I have objects, although every item in the selected column is a string — even after explicit conversion.

This is my DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 56992 entries, 0 to 56991
Data columns (total 7 columns):
id            56992  non-null values
attr1         56992  non-null values
attr2         56992  non-null values
attr3         56992  non-null values
attr4         56992  non-null values
attr5         56992  non-null values
attr6         56992  non-null values
dtypes: int64(2), object(5)

Five of them are dtype object. I explicitly convert those objects to strings:

for c in df.columns:
    if df[c].dtype == object:
        print "convert ", df[c].name, " to string"
        df[c] = df[c].astype(str)

Then, df["attr2"] still has dtype object, although type(df["attr2"].ix[0] reveals str, which is correct.

Pandas distinguishes between int64 and float64 and object. What is the logic behind it when there is no dtype str? Why is a str covered by object?


回答 0

dtype对象来自NumPy,它描述ndarray中元素的类型。ndarray中的每个元素都必须具有相同的字节大小。对于int64和float64,它们是8个字节。但是对于字符串,字符串的长度不是固定的。因此,熊猫没有直接将字符串的字节保存在ndarray中,而是使用对象ndarray来保存指向对象的指针,因此,这种ndarray的dtype是object。

这是一个例子:

  • int64数组包含4个int64值。
  • 对象数组包含4个指向3个字符串对象的指针。

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in a ndarray must has the same size in byte. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of save the bytes of strings in the ndarray directly, Pandas use object ndarray, which save pointers to objects, because of this the dtype of this kind ndarray is object.

Here is an example:

  • the int64 array contains 4 int64 value.
  • the object array contains 4 pointers to 3 string objects.


回答 1

接受的答案是好的。只是想提供一个参考文档的答案。该文档说:

熊猫使用对象dtype来存储字符串。

正如主要评论所说:“不用担心;它应该像这样。” (尽管可接受的答案在解释“为什么”方面做得很好,字符串是可变长度的)

但是对于字符串,字符串的长度不是固定的。

The accepted answer is good. Just wanted to provide an answer which referenced the documentation. The documentation says:

Pandas uses the object dtype for storing strings.

As the leading comment says “Don’t worry about it; it’s supposed to be like this.” (Although the accepted answer did a great job explaining the “why”; strings are variable-length)

But for strings, the length of the string is not fixed.


回答 2

@HYRY的答案很好。我只想提供更多背景信息。

阵列存储的数据作为连续的固定大小的存储器块。这些属性的结合使阵列可以快速进行数据访问。例如,考虑您的计算机可能如何存储32位整数数组[3,0,1]

如果您要求计算机获取数组中的第3个元素,它将从头开始,然后跨64位跳转到第3个元素。确切知道要跳过多少位才可以使数组快速运行

现在考虑字符串的顺序['hello', 'i', 'am', 'a', 'banana']。字符串是大小不同的对象,因此,如果您尝试将它们存储在连续的内存块中,它将最终看起来像这样。

现在,您的计算机没有快速的方法来访问随机请求的元素。克服这个问题的关键是使用指针。基本上,将每个字符串存储在某个随机的内存位置,然后用每个字符串的内存地址填充数组。(内存地址只是整数。)所以现在,事情看起来像这样

现在,如果您像以前一样要求计算机获取第三个元素,它可以跨64位跳转(假设内存地址是32位整数),然后再执行一个步骤来获取字符串。

NumPy面临的挑战是不能保证指针实际上指向字符串。这就是为什么它将dtype报告为“对象”的原因。

无耻地插入我自己的博客文章,最初是在此进行讨论的。

@HYRY’s answer is great. I just want to provide a little more context..

Arrays store data as contiguous, fixed-size memory blocks. The combination of these properties together is what makes arrays lightning fast for data access. For example, consider how your computer might store an array of 32-bit integers, [3,0,1].

If you ask your computer to fetch the 3rd element in the array, it’ll start at the beginning and then jump across 64 bits to get to the 3rd element. Knowing exactly how many bits to jump across is what makes arrays fast.

Now consider the sequence of strings ['hello', 'i', 'am', 'a', 'banana']. Strings are objects that vary in size, so if you tried to store them in contiguous memory blocks, it’d end up looking like this.

Now your computer doesn’t have a fast way to access a randomly requested element. The key to overcoming this is to use pointers. Basically, store each string in some random memory location, and fill the array with the memory address of each string. (Memory addresses are just integers.) So now, things look like this

Now, if you ask your computer to fetch the 3rd element, just as before, it can jump across 64 bits (assuming the memory addresses are 32-bit integers) and then make one extra step to go fetch the string.

The challenge for NumPy is that there’s no guarantee the pointers are actually pointing to strings. That’s why it reports the dtype as ‘object’.

Shamelessly gonna plug my own blog article where I originally discussed this.


回答 3

从1.0.0版开始(2020年1月),pandas作为实验功能被引入,它通过提供对字符串类型的一流支持pandas.StringDtype

虽然您仍然会object默认看到,但是可以通过指定dtypeof pd.StringDtype或简单地使用新类型'string'

>>> pd.Series(['abc', None, 'def'])
0     abc
1    None
2     def
dtype: object
>>> pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
0     abc
1    <NA>
2     def
dtype: string
>>> pd.Series(['abc', None, 'def']).astype('string')
0     abc
1    <NA>
2     def
dtype: string

As of version 1.0.0 (January 2020), pandas has introduced as an experimental feature providing first-class support for string types through pandas.StringDtype.

While you’ll still be seeing object by default, the new type can be used by specifying a dtype of pd.StringDtype or simply 'string':

>>> pd.Series(['abc', None, 'def'])
0     abc
1    None
2     def
dtype: object
>>> pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())
0     abc
1    <NA>
2     def
dtype: string
>>> pd.Series(['abc', None, 'def']).astype('string')
0     abc
1    <NA>
2     def
dtype: string

熊猫中的dtype(’O’)是什么?

问题:熊猫中的dtype(’O’)是什么?

我在pandas中有一个数据框,我试图找出其值的类型。我不确定column的类型'Test'。但是,当我跑步时myFrame['Test'].dtype,我得到了;

dtype('O')

这是什么意思?

I have a dataframe in pandas and I’m trying to figure out what the types of its values are. I am unsure what the type is of column 'Test'. However, when I run myFrame['Test'].dtype, I get;

dtype('O')

What does this mean?


回答 0

它的意思是:

'O'     (Python) objects

来源

第一个字符指定数据的类型,其余字符指定每个项目的字节数,Unicode除外,Unicode将其解释为字符数。项目大小必须与现有类型相对应,否则将引发错误。支持的类型为现有类型,否则将引发错误。支持的种类有:

'b'       boolean
'i'       (signed) integer
'u'       unsigned integer
'f'       floating-point
'c'       complex-floating point
'O'       (Python) objects
'S', 'a'  (byte-)string
'U'       Unicode
'V'       raw data (void)

如果需要检查,另一个答案会有所帮助type

It means:

'O'     (Python) objects

Source.

The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised. The supported kinds are to an existing type, or an error will be raised. The supported kinds are:

'b'       boolean
'i'       (signed) integer
'u'       unsigned integer
'f'       floating-point
'c'       complex-floating point
'O'       (Python) objects
'S', 'a'  (byte-)string
'U'       Unicode
'V'       raw data (void)

Another answer helps if need check types.


回答 1

当您dtype('O')在数据框内看到这意味着熊猫字符串。

什么dtype

属于pandasnumpy或两者兼而有之的东西?如果我们检查熊猫代码:

df = pd.DataFrame({'float': [1.0],
                    'int': [1],
                    'datetime': [pd.Timestamp('20180310')],
                    'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype

它将输出如下:

   float  int   datetime string    
0    1.0    1 2018-03-10    foo
---
float64 int64 datetime64[ns] object
---
dtype('O')

您可以将最后一个解释为Pandas dtype('O')或Pandas对象,它是Python类型的字符串,它对应于Numpy string_unicode_type。

Pandas dtype    Python type     NumPy type          Usage
object          str             string_, unicode_   Text

就像唐吉x德(Don Quixote)在屁股上一样,熊猫(Pandas)在Numpy上一样,Numpy理解系统的基础架构,并numpy.dtype为此使用类。

数据类型对象是numpy.dtype类的实例,可以更精确地理解数据类型,包括:

  • 数据类型(整数,浮点数,Python对象等)
  • 数据大小(例如整数中有多少个字节)
  • 数据的字节顺序(小端或大端)
  • 如果数据类型是结构化的,则为其他数据类型的集合(例如,描述由整数和浮点数组成的数组项)
  • 该结构的“字段”的名称是什么
  • 每个字段的数据类型是什么
  • 每个字段占用存储块的哪一部分
  • 如果数据类型是子数组,则其形状和数据类型是什么

在这个问题的上下文中dtype,它既属于pand又属于numpy,尤其dtype('O')意味着我们期望该字符串。


这是一些测试用的代码,并带有解释:如果我们将数据集作为字典

import pandas as pd
import numpy as np
from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)
print(df.dtypes)

最后几行将检查数据框并记录输出:

   id       date                  role  num   fnum
0   1 2018-12-12               Support  123   3.14
1   2 2018-12-12             Marketing  234   2.14
2   3 2018-12-12  Business Development  345  -0.14
3   4 2018-12-12                 Sales  456  41.30
4   5 2018-12-12           Engineering  567   3.14
id               int64
date    datetime64[ns]
role            object
num              int64
fnum           float64
dtype: object

各种不同 dtypes

df.iloc[1,:] = np.nan
df.iloc[2,:] = None

但是,如果我们尝试设置np.nanNone这将不会影响原始列的dtype。输出将如下所示:

print(df)
print(df.dtypes)

    id       date         role    num   fnum
0  1.0 2018-12-12      Support  123.0   3.14
1  NaN        NaT          NaN    NaN    NaN
2  NaN        NaT         None    NaN    NaN
3  4.0 2018-12-12        Sales  456.0  41.30
4  5.0 2018-12-12  Engineering  567.0   3.14
id             float64
date    datetime64[ns]
role            object
num            float64
fnum           float64
dtype: object

因此,np.nan否则None将不会更改列dtype,除非我们将所有列行都设置为np.nanNone。在这种情况下,列将分别变为float64object

您也可以尝试设置单行:

df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object

这里需要注意的是,如果我们在非字符串列中设置字符串,它将变成string或object dtype

When you see dtype('O') inside dataframe this means Pandas string.

What is dtype?

Something that belongs to pandas or numpy, or both, or something else? If we examine pandas code:

df = pd.DataFrame({'float': [1.0],
                    'int': [1],
                    'datetime': [pd.Timestamp('20180310')],
                    'string': ['foo']})
print(df)
print(df['float'].dtype,df['int'].dtype,df['datetime'].dtype,df['string'].dtype)
df['string'].dtype

It will output like this:

   float  int   datetime string    
0    1.0    1 2018-03-10    foo
---
float64 int64 datetime64[ns] object
---
dtype('O')

You can interpret the last as Pandas dtype('O') or Pandas object which is Python type string, and this corresponds to Numpy string_, or unicode_ types.

Pandas dtype    Python type     NumPy type          Usage
object          str             string_, unicode_   Text

Like Don Quixote is on ass, Pandas is on Numpy and Numpy understand the underlying architecture of your system and uses the class numpy.dtype for that.

Data type object is an instance of numpy.dtype class that understand the data type more precise including:

  • Type of the data (integer, float, Python object, etc.)
  • Size of the data (how many bytes is in e.g. the integer)
  • Byte order of the data (little-endian or big-endian)
  • If the data type is structured, an aggregate of other data types, (e.g., describing an array item consisting of an integer and a float)
  • What are the names of the “fields” of the structure
  • What is the data-type of each field
  • Which part of the memory block each field takes
  • If the data type is a sub-array, what is its shape and data type

In the context of this question dtype belongs to both pands and numpy and in particular dtype('O') means we expect the string.


Here is some code for testing with explanation: If we have the dataset as dictionary

import pandas as pd
import numpy as np
from pandas import Timestamp

data={'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'date': {0: Timestamp('2018-12-12 00:00:00'), 1: Timestamp('2018-12-12 00:00:00'), 2: Timestamp('2018-12-12 00:00:00'), 3: Timestamp('2018-12-12 00:00:00'), 4: Timestamp('2018-12-12 00:00:00')}, 'role': {0: 'Support', 1: 'Marketing', 2: 'Business Development', 3: 'Sales', 4: 'Engineering'}, 'num': {0: 123, 1: 234, 2: 345, 3: 456, 4: 567}, 'fnum': {0: 3.14, 1: 2.14, 2: -0.14, 3: 41.3, 4: 3.14}}
df = pd.DataFrame.from_dict(data) #now we have a dataframe

print(df)
print(df.dtypes)

The last lines will examine the dataframe and note the output:

   id       date                  role  num   fnum
0   1 2018-12-12               Support  123   3.14
1   2 2018-12-12             Marketing  234   2.14
2   3 2018-12-12  Business Development  345  -0.14
3   4 2018-12-12                 Sales  456  41.30
4   5 2018-12-12           Engineering  567   3.14
id               int64
date    datetime64[ns]
role            object
num              int64
fnum           float64
dtype: object

All kind of different dtypes

df.iloc[1,:] = np.nan
df.iloc[2,:] = None

But if we try to set np.nan or None this will not affect the original column dtype. The output will be like this:

print(df)
print(df.dtypes)

    id       date         role    num   fnum
0  1.0 2018-12-12      Support  123.0   3.14
1  NaN        NaT          NaN    NaN    NaN
2  NaN        NaT         None    NaN    NaN
3  4.0 2018-12-12        Sales  456.0  41.30
4  5.0 2018-12-12  Engineering  567.0   3.14
id             float64
date    datetime64[ns]
role            object
num            float64
fnum           float64
dtype: object

So np.nan or None will not change the columns dtype, unless we set the all column rows to np.nan or None. In that case column will become float64 or object respectively.

You may try also setting single rows:

df.iloc[3,:] = 0 # will convert datetime to object only
df.iloc[4,:] = '' # will convert all columns to object

And to note here, if we set string inside a non string column it will become string or object dtype.


回答 2

它的意思是“一个python对象”,即不是numpy支持的内置标量类型之一。

np.array([object()]).dtype
=> dtype('O')

It means “a python object”, i.e. not one of the builtin scalar types supported by numpy.

np.array([object()]).dtype
=> dtype('O')

回答 3

“ O”代表对象

#Loading a csv file as a dataframe
import pandas as pd 
train_df = pd.read_csv('train.csv')
col_name = 'Name of Employee'

#Checking the datatype of column name
train_df[col_name].dtype

#Instead try printing the same thing
print train_df[col_name].dtype

第一行返回: dtype('O')

带有print语句的行返回以下内容: object

‘O’ stands for object.

#Loading a csv file as a dataframe
import pandas as pd 
train_df = pd.read_csv('train.csv')
col_name = 'Name of Employee'

#Checking the datatype of column name
train_df[col_name].dtype

#Instead try printing the same thing
print train_df[col_name].dtype

The first line returns: dtype('O')

The line with the print statement returns the following: object


如何在Python中比较对象的类型?

问题:如何在Python中比较对象的类型?

基本上我想这样做:

obj = 'str'
type ( obj ) == string

我试过了:

type ( obj ) == type ( string )

而且没有用

另外,其他类型呢?例如,我无法复制NoneType

Basically I want to do this:

obj = 'str'
type ( obj ) == string

I tried:

type ( obj ) == type ( string )

and it didn’t work.

Also, what about the other types? For example, I couldn’t replicate NoneType.


回答 0

isinstance()

就您而言,isinstance("this is a string", str)将返回True

您可能还需要阅读以下内容:http : //www.canonical.org/~kragen/isinstance/

isinstance()

In your case, isinstance("this is a string", str) will return True.

You may also want to read this: http://www.canonical.org/~kragen/isinstance/


回答 1

isinstance 作品:

if isinstance(obj, MyClass): do_foo(obj)

但是请记住:如果它看起来像鸭子,听起来像鸭子,那就是鸭子。

编辑:对于无类型,您可以简单地做:

if obj is None: obj = MyClass()

isinstance works:

if isinstance(obj, MyClass): do_foo(obj)

but, keep in mind: if it looks like a duck, and if it sounds like a duck, it is a duck.

EDIT: For the None type, you can simply do:

if obj is None: obj = MyClass()

回答 2

首先,避免所有类型的比较。它们非常非常必要。有时,它们有助于检查函数中的参数类型-即使这种情况很少见。错误的类型数据将引发异常,这就是您所需要的。

所有基本转换函数都将映射为等于类型函数。

type(9) is int
type(2.5) is float
type('x') is str
type(u'x') is unicode
type(2+3j) is complex

还有其他一些情况。

isinstance( 'x', basestring )
isinstance( u'u', basestring )
isinstance( 9, int )
isinstance( 2.5, float )
isinstance( (2+3j), complex )

不用说,顺便说一句,永远不需要这种类型的检查。None是NoneType的唯一实例。None对象是一个Singleton。只需检查无

variable is None

顺便说一句,一般不要使用以上内容。使用普通异常和Python自己的自然多态性。

First, avoid all type comparisons. They’re very, very rarely necessary. Sometimes, they help to check parameter types in a function — even that’s rare. Wrong type data will raise an exception, and that’s all you’ll ever need.

All of the basic conversion functions will map as equal to the type function.

type(9) is int
type(2.5) is float
type('x') is str
type(u'x') is unicode
type(2+3j) is complex

There are a few other cases.

isinstance( 'x', basestring )
isinstance( u'u', basestring )
isinstance( 9, int )
isinstance( 2.5, float )
isinstance( (2+3j), complex )

None, BTW, never needs any of this kind of type checking. None is the only instance of NoneType. The None object is a Singleton. Just check for None

variable is None

BTW, do not use the above in general. Use ordinary exceptions and Python’s own natural polymorphism.


回答 3

对于其他类型,请检查类型模块:

>>> import types
>>> x = "mystring"
>>> isinstance(x, types.StringType)
True
>>> x = 5
>>> isinstance(x, types.IntType)
True
>>> x = None
>>> isinstance(x, types.NoneType)
True

PS类型检查不是一个好主意。

For other types, check out the types module:

>>> import types
>>> x = "mystring"
>>> isinstance(x, types.StringType)
True
>>> x = 5
>>> isinstance(x, types.IntType)
True
>>> x = None
>>> isinstance(x, types.NoneType)
True

P.S. Typechecking is a bad idea.


回答 4

您总是可以使用type(x) == type(y)把戏,哪里y是已知类型的东西。

# check if x is a regular string
type(x) == type('')
# check if x is an integer
type(x) == type(1)
# check if x is a NoneType
type(x) == type(None)

通常,有更好的方法可以做到这一点,尤其是使用任何最新的python。但是,如果您只想记住一件事,则可以记住。

在这种情况下,更好的方法是:

# check if x is a regular string
type(x) == str
# check if x is either a regular string or a unicode string
type(x) in [str, unicode]
# alternatively:
isinstance(x, basestring)
# check if x is an integer
type(x) == int
# check if x is a NoneType
x is None

请注意最后一种情况:NoneTypepython中只有一个实例,即None。您会在异常中看到很多NoneType(TypeError: 'NoneType' object is unsubscriptable -一直在我身上发生..),但是您几乎不需要在代码中引用它。

最后,正如fengshaun指出的那样,在python中进行类型检查并不总是一个好主意。只使用该值,就像它是您期望的类型一样,并捕获(或允许传播)由此产生的异常,这是更Python风格的。

You can always use the type(x) == type(y) trick, where y is something with known type.

# check if x is a regular string
type(x) == type('')
# check if x is an integer
type(x) == type(1)
# check if x is a NoneType
type(x) == type(None)

Often there are better ways of doing that, particularly with any recent python. But if you only want to remember one thing, you can remember that.

In this case, the better ways would be:

# check if x is a regular string
type(x) == str
# check if x is either a regular string or a unicode string
type(x) in [str, unicode]
# alternatively:
isinstance(x, basestring)
# check if x is an integer
type(x) == int
# check if x is a NoneType
x is None

Note the last case: there is only one instance of NoneType in python, and that is None. You’ll see NoneType a lot in exceptions (TypeError: 'NoneType' object is unsubscriptable — happens to me all the time..) but you’ll hardly ever need to refer to it in code.

Finally, as fengshaun points out, type checking in python is not always a good idea. It’s more pythonic to just use the value as though it is the type you expect, and catch (or allow to propagate) exceptions that result from it.


回答 5

你很亲密!string是模块,而不是类型。您可能要比较obj字符串的type对象和type对象的类型,即str

type(obj) == str  # this works because str is already a type

或者:

type(obj) == type('')

请注意,在Python 2中,如果obj是unicode类型,则以上两种都不起作用。也不会isinstance()。有关此问题的解决方法,请参见John对这篇文章的评论。我一直在想起它大约10分钟,但是有一个内存块!

You’re very close! string is a module, not a type. You probably want to compare the type of obj against the type object for strings, namely str:

type(obj) == str  # this works because str is already a type

Alternatively:

type(obj) == type('')

Note, in Python 2, if obj is a unicode type, then neither of the above will work. Nor will isinstance(). See John’s comments to this post for how to get around this… I’ve been trying to remember it for about 10 minutes now, but was having a memory block!


回答 6

因为你必须写

s="hello"
type(s) == type("")

type接受实例并返回其类型。在这种情况下,您必须比较两个实例的类型。

如果需要进行抢先检查,则检查受支持的接口比类型更好。

除了您的代码需要特定类型的实例这一事实之外,该类型实际上并不能告诉您太多信息,无论您是否可以拥有另一个完全不同类型的实例(因为它实现了相同的接口),这完全可以了。 。

例如,假设您有此代码

def firstElement(parameter):
    return parameter[0]

现在,假设您说:我希望这段代码仅接受一个元组。

import types

def firstElement(parameter):
    if type(parameter) != types.TupleType:
         raise TypeError("function accepts only a tuple")
    return parameter[0]

这降低了此例程的可重用性。如果您传递列表,字符串或numpy.array,则将无法使用。更好的是

def firstElement(parameter):
    if not (hasattr(parameter, "__getitem__") and callable(getattr(parameter,"__getitem__"))):
        raise TypeError("interface violation")
    return parameter[0]

但是这样做没有任何意义:如果无论如何都不满足协议,则parameter [0]会引发异常……这当然是除非您想防止副作用或必须从失败之前可以调用的调用中恢复过来。(愚蠢的)示例,只是为了说明这一点:

def firstElement(parameter):
    if not (hasattr(parameter, "__getitem__") and callable(getattr(parameter,"__getitem__"))):
        raise TypeError("interface violation")
    os.system("rm file")
    return parameter[0]

在这种情况下,您的代码将在运行system()调用之前引发异常。如果不进行接口检查,则将删除该文件,然后引发异常。

It is because you have to write

s="hello"
type(s) == type("")

type accepts an instance and returns its type. In this case you have to compare two instances’ types.

If you need to do preemptive checking, it is better if you check for a supported interface than the type.

The type does not really tell you much, apart of the fact that your code want an instance of a specific type, regardless of the fact that you could have another instance of a completely different type which would be perfectly fine because it implements the same interface.

For example, suppose you have this code

def firstElement(parameter):
    return parameter[0]

Now, suppose you say: I want this code to accept only a tuple.

import types

def firstElement(parameter):
    if type(parameter) != types.TupleType:
         raise TypeError("function accepts only a tuple")
    return parameter[0]

This is reducing the reusability of this routine. It won’t work if you pass a list, or a string, or a numpy.array. Something better would be

def firstElement(parameter):
    if not (hasattr(parameter, "__getitem__") and callable(getattr(parameter,"__getitem__"))):
        raise TypeError("interface violation")
    return parameter[0]

but there’s no point in doing it: parameter[0] will raise an exception if the protocol is not satisfied anyway… this of course unless you want to prevent side effects or having to recover from calls that you could invoke before failing. (Stupid) example, just to make the point:

def firstElement(parameter):
    if not (hasattr(parameter, "__getitem__") and callable(getattr(parameter,"__getitem__"))):
        raise TypeError("interface violation")
    os.system("rm file")
    return parameter[0]

in this case, your code will raise an exception before running the system() call. Without interface checks, you would have removed the file, and then raised the exception.


回答 7

使用str代替字符串

type ( obj ) == str

说明

>>> a = "Hello"
>>> type(a)==str
True
>>> type(a)
<type 'str'>
>>>

Use str instead of string

type ( obj ) == str

Explanation

>>> a = "Hello"
>>> type(a)==str
True
>>> type(a)
<type 'str'>
>>>

回答 8

我用 type(x) == type(y)

例如,如果我要检查的东西是一个数组:

type( x ) == type( [] )

字符串检查:

type( x ) == type( '' ) or type( x ) == type( u'' )

如果要检查无,请使用

x is None

I use type(x) == type(y)

For instance, if I want to check something is an array:

type( x ) == type( [] )

string check:

type( x ) == type( '' ) or type( x ) == type( u'' )

If you want to check against None, use is

x is None

回答 9

我认为这应该做到

if isinstance(obj, str)

i think this should do it

if isinstance(obj, str)

回答 10

类型不适用于某些类。如果不确定对象的类型,请使用__class__方法,如下所示:

>>>obj = 'a string'
>>>obj.__class__ == str
True

另请参阅这篇文章-http: //www.siafoo.net/article/56

Type doesn’t work on certain classes. If you’re not sure of the object’s type use the __class__ method, as so:

>>>obj = 'a string'
>>>obj.__class__ == str
True

Also see this article – http://www.siafoo.net/article/56


回答 11

要获取类型,请使用__class__成员,如下所示unknown_thing.__class__

在这里说鸭嘴式是没有用的,因为它不能回答一个完美的问题。在我的应用程序代码中,我永远不需要知道某种事物的类型,但是有一种学习对象类型的方法仍然很有用。有时我需要获得实际的类来验证单元测试。因为所有可能的对象都具有相同的API,但只有一个是正确的,因此鸭子类型会妨碍您的输入。另外,有时我正在维护其他人的代码,而且我不知道我传递了什么样的对象。这是诸如Python之类的动态类型语言的最大问题。版本1非常易于开发。第2版​​让您不知所措,尤其是如果您没有编写第1版时。因此,有时候,当我使用未编写的函数时,我需要知道参数的类型,

那就是__class__参数派上用场的地方。(据我所知)这是获取对象类型的最佳方法(也许是唯一方法)。

To get the type, use the __class__ member, as in unknown_thing.__class__

Talk of duck-typing is useless here because it doesn’t answer a perfectly good question. In my application code I never need to know the type of something, but it’s still useful to have a way to learn an object’s type. Sometimes I need to get the actual class to validate a unit test. Duck typing gets in the way there because all possible objects have the same API, but only one is correct. Also, sometimes I’m maintaining somebody else’s code, and I have no idea what kind of object I’ve been passed. This is my biggest problem with dynamically typed languages like Python. Version 1 is very easy and quick to develop. Version 2 is a pain in the buns, especially if you didn’t write version 1. So sometimes, when I’m working with a function I didn’t write, I need to know the type of a parameter, just so I know what methods I can call on it.

That’s where the __class__ parameter comes in handy. That (as far as I can tell) is the best way (maybe the only way) to get an object’s type.


回答 12

使用isinstance(object, type)。如上所述,如果您知道正确的方法type,这很容易使用,例如,

isinstance('dog', str) ## gives bool True

但是对于更深奥的物体,这可能很难使用。例如:

import numpy as np 
a = np.array([1,2,3]) 
isinstance(a,np.array) ## breaks

但您可以执行以下操作:

y = type(np.array([1]))
isinstance(a,y) ## gives bool True 

因此,我建议y使用要检查的对象类型(例如type(np.array()))实例化变量(在这种情况下),然后使用isinstance

Use isinstance(object, type). As above this is easy to use if you know the correct type, e.g.,

isinstance('dog', str) ## gives bool True

But for more esoteric objects, this can be difficult to use. For example:

import numpy as np 
a = np.array([1,2,3]) 
isinstance(a,np.array) ## breaks

but you can do this trick:

y = type(np.array([1]))
isinstance(a,y) ## gives bool True 

So I recommend instantiating a variable (y in this case) with a type of the object you want to check (e.g., type(np.array())), then using isinstance.


回答 13

您可以比较检查级别的类。

#!/usr/bin/env python
#coding:utf8

class A(object):
    def t(self):
        print 'A'
    def r(self):
        print 'rA',
        self.t()

class B(A):
    def t(self):
        print 'B'

class C(A):
    def t(self):
        print 'C'

class D(B, C):
    def t(self):
        print 'D',
        super(D, self).t()

class E(C, B):
    pass

d = D()
d.t()
d.r()

e = E()
e.t()
e.r()

print isinstance(e, D) # False
print isinstance(e, E) # True
print isinstance(e, C) # True
print isinstance(e, B) # True
print isinstance(e, (A,)) # True
print e.__class__ >= A, #False
print e.__class__ <= C, #False
print e.__class__ <  E, #False
print e.__class__ <= E  #True

You can compare classes for check level.

#!/usr/bin/env python
#coding:utf8

class A(object):
    def t(self):
        print 'A'
    def r(self):
        print 'rA',
        self.t()

class B(A):
    def t(self):
        print 'B'

class C(A):
    def t(self):
        print 'C'

class D(B, C):
    def t(self):
        print 'D',
        super(D, self).t()

class E(C, B):
    pass

d = D()
d.t()
d.r()

e = E()
e.t()
e.r()

print isinstance(e, D) # False
print isinstance(e, E) # True
print isinstance(e, C) # True
print isinstance(e, B) # True
print isinstance(e, (A,)) # True
print e.__class__ >= A, #False
print e.__class__ <= C, #False
print e.__class__ <  E, #False
print e.__class__ <= E  #True

熊猫可以自动识别日期吗?

问题:熊猫可以自动识别日期吗?

今天,我感到惊讶的是,pandas在从数据文件中读取数据时能够识别值的类型:

df = pandas.read_csv('test.dat', delimiter=r"\s+", names=['col1','col2','col3'])

例如,可以通过以下方式检查它:

for i, r in df.iterrows():
    print type(r['col1']), type(r['col2']), type(r['col3'])

特别是整数,浮点数和字符串可以正确识别。但是,我有一列的日期采用以下格式:2013-6-4。这些日期被识别为字符串(而不是python日期对象)。有没有一种方法可以“学习”熊猫到公认的日期?

Today I was positively surprised by the fact that while reading data from a data file (for example) pandas is able to recognize types of values:

df = pandas.read_csv('test.dat', delimiter=r"\s+", names=['col1','col2','col3'])

For example it can be checked in this way:

for i, r in df.iterrows():
    print type(r['col1']), type(r['col2']), type(r['col3'])

In particular integer, floats and strings were recognized correctly. However, I have a column that has dates in the following format: 2013-6-4. These dates were recognized as strings (not as python date-objects). Is there a way to “learn” pandas to recognized dates?


回答 0

您应该添加parse_dates=True,或者parse_dates=['column name']在阅读时通常足以神奇地解析它。但是总有一些奇怪的格式需要手动定义。在这种情况下,您还可以添加日期解析器功能,这是最灵活的方法。

假设您的字符串中有一列“ datetime”,然后:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

这样,您甚至可以将多个列合并为一个datetime列,从而将一个“ date”和一个“ time”列合并为一个“ datetime”列:

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

您可以在此页面strptimestrftime 找到指令(即用于不同格式的字母)。

You should add parse_dates=True, or parse_dates=['column name'] when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.

Suppose you have a column ‘datetime’ with your string, then:

from datetime import datetime
dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

This way you can even combine multiple columns into a single datetime column, this merges a ‘date’ and a ‘time’ column into a single ‘datetime’ column:

dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

You can find directives (i.e. the letters to be used for different formats) for strptime and strftime in this page.


回答 1

自@Rutger回答以来,熊猫界面可能已更改,但是在我使用的版本(0.15.2)中,该date_parser函数接收日期列表,而不是单个值。在这种情况下,他的代码应该这样更新:

dateparse = lambda dates: [pd.datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in dates]

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

Perhaps the pandas interface has changed since @Rutger answered, but in the version I’m using (0.15.2), the date_parser function receives a list of dates instead of a single value. In this case, his code should be updated like so:

dateparse = lambda dates: [pd.datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in dates]

df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

回答 2

pandas read_csv方法非常适合解析日期。完整的文档位于http://pandas.pydata.org/pandas-docs/stable/genic/pandas.io.parsers.read_csv.html

您甚至可以在不同的列中包含不同的日期部分,并传递参数:

parse_dates : boolean, list of ints or names, list of lists, or dict
If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a
separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date
column. {‘foo : [1, 3]} -> parse columns 1, 3 as date and call result foo

默认的日期检测效果很好,但似乎偏向于北美日期格式。如果您住在其他地方,您可能偶尔会被结果所吸引。据我所知,2000年1月6日是美国的1月6日,而不是我居住的6月1日。如果使用了2000年6月23日这样的日期,它足够聪明地摆弄它们。不过,使用YYYYMMDD日期变化可能更安全。向熊猫开发者表示歉意,但是最近我还没有在当地进行测试。

您可以使用date_parser参数传递一个函数来转换格式。

date_parser : function
Function to use for converting a sequence of string columns to an array of datetime
instances. The default uses dateutil.parser.parser to do the conversion.

pandas read_csv method is great for parsing dates. Complete documentation at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

you can even have the different date parts in different columns and pass the parameter:

parse_dates : boolean, list of ints or names, list of lists, or dict
If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a
separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date
column. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

The default sensing of dates works great, but it seems to be biased towards north american Date formats. If you live elsewhere you might occasionally be caught by the results. As far as I can remember 1/6/2000 means 6 January in the USA as opposed to 1 Jun where I live. It is smart enough to swing them around if dates like 23/6/2000 are used. Probably safer to stay with YYYYMMDD variations of date though. Apologies to pandas developers,here but i have not tested it with local dates recently.

you can use the date_parser parameter to pass a function to convert your format.

date_parser : function
Function to use for converting a sequence of string columns to an array of datetime
instances. The default uses dateutil.parser.parser to do the conversion.

回答 3

您可以pandas.to_datetime()按照文档中的建议使用pandas.read_csv()

如果列或索引包含不可解析的日期,则整个列或索引将按原样作为对象数据类型返回。对于非标准的日期时间解析,请pd.to_datetime在之后使用pd.read_csv

演示:

>>> D = {'date': '2013-6-4'}
>>> df = pd.DataFrame(D, index=[0])
>>> df
       date
0  2013-6-4
>>> df.dtypes
date    object
dtype: object
>>> df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d')
>>> df
        date
0 2013-06-04
>>> df.dtypes
date    datetime64[ns]
dtype: object

You could use pandas.to_datetime() as recommended in the documentation for pandas.read_csv():

If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv.

Demo:

>>> D = {'date': '2013-6-4'}
>>> df = pd.DataFrame(D, index=[0])
>>> df
       date
0  2013-6-4
>>> df.dtypes
date    object
dtype: object
>>> df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d')
>>> df
        date
0 2013-06-04
>>> df.dtypes
date    datetime64[ns]
dtype: object

回答 4

将两列合并为一个datetime列时,可接受的答案会产生错误(pandas版本0.20.3),因为这些列分别发送到date_parser函数。

以下作品:

def dateparse(d,t):
    dt = d + " " + t
    return pd.datetime.strptime(dt, '%d/%m/%Y %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

When merging two columns into a single datetime column, the accepted answer generates an error (pandas version 0.20.3), since the columns are sent to the date_parser function separately.

The following works:

def dateparse(d,t):
    dt = d + " " + t
    return pd.datetime.strptime(dt, '%d/%m/%Y %H:%M:%S')

df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

回答 5

是的-根据pandas.read_csv 文档

注意:存在iso8601格式日期的快速路径。

因此,如果您的csv有一个名为的列datetime,并且日期看起来像2013-01-01T01:01例如,运行此命令将使熊猫(我在v0.19.2上)自动获取日期和时间:

df = pd.read_csv('test.csv', parse_dates=['datetime'])

请注意,您需要显式传递parse_dates,否则将无法正常运行。

验证:

df.dtypes

您应该看到列的数据类型是 datetime64[ns]

Yes – according to the pandas.read_csv documentation:

Note: A fast-path exists for iso8601-formatted dates.

So if your csv has a column named datetime and the dates looks like 2013-01-01T01:01 for example, running this will make pandas (I’m on v0.19.2) pick up the date and time automatically:

df = pd.read_csv('test.csv', parse_dates=['datetime'])

Note that you need to explicitly pass parse_dates, it doesn’t work without.

Verify with:

df.dtypes

You should see the datatype of the column is datetime64[ns]


回答 6

如果性能对您很重要,请确保您有时间:

import sys
import timeit
import pandas as pd

print('Python %s on %s' % (sys.version, sys.platform))
print('Pandas version %s' % pd.__version__)

repeat = 3
numbers = 100

def time(statement, _setup=None):
    print (min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

print("Format %m/%d/%y")
setup = """import pandas as pd
import io

data = io.StringIO('''\
ProductCode,Date
''' + '''\
x1,07/29/15
x2,07/29/15
x3,07/29/15
x4,07/30/15
x5,07/29/15
x6,07/29/15
x7,07/29/15
y7,08/05/15
x8,08/05/15
z3,08/05/15
''' * 100)"""

time('pd.read_csv(data); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'infer_datetime_format=True); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'date_parser=lambda x: pd.datetime.strptime(x, "%m/%d/%y")); data.seek(0)')

print("Format %Y-%m-%d %H:%M:%S")
setup = """import pandas as pd
import io

data = io.StringIO('''\
ProductCode,Date
''' + '''\
x1,2016-10-15 00:00:43
x2,2016-10-15 00:00:56
x3,2016-10-15 00:00:56
x4,2016-10-15 00:00:12
x5,2016-10-15 00:00:34
x6,2016-10-15 00:00:55
x7,2016-10-15 00:00:06
y7,2016-10-15 00:00:01
x8,2016-10-15 00:00:00
z3,2016-10-15 00:00:02
''' * 1000)"""

time('pd.read_csv(data); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'infer_datetime_format=True); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'date_parser=lambda x: pd.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")); data.seek(0)')

印刷品:

Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28) 
[Clang 6.0 (clang-600.0.57)] on darwin
Pandas version 0.23.4
Format %m/%d/%y
0.19123052499999993
8.20691274
8.143124389
1.2384357139999977
Format %Y-%m-%d %H:%M:%S
0.5238807110000039
0.9202787830000005
0.9832778819999959
12.002349824999996

因此,与ISO8601格式的日期(%Y-%m-%d %H:%M:%S显然是一个ISO8601格式的日期,我猜的T 可以被丢弃,并用空格代替),你应该指定infer_datetime_format(不使更多常见的两种明显的差异),并通过自己的解析器只会破坏性能。另一方面,date_parser与标准日期格式相比确实有所不同。像往常一样,请务必先确定时间再进行优化。

If performance matters to you make sure you time:

import sys
import timeit
import pandas as pd

print('Python %s on %s' % (sys.version, sys.platform))
print('Pandas version %s' % pd.__version__)

repeat = 3
numbers = 100

def time(statement, _setup=None):
    print (min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

print("Format %m/%d/%y")
setup = """import pandas as pd
import io

data = io.StringIO('''\
ProductCode,Date
''' + '''\
x1,07/29/15
x2,07/29/15
x3,07/29/15
x4,07/30/15
x5,07/29/15
x6,07/29/15
x7,07/29/15
y7,08/05/15
x8,08/05/15
z3,08/05/15
''' * 100)"""

time('pd.read_csv(data); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'infer_datetime_format=True); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'date_parser=lambda x: pd.datetime.strptime(x, "%m/%d/%y")); data.seek(0)')

print("Format %Y-%m-%d %H:%M:%S")
setup = """import pandas as pd
import io

data = io.StringIO('''\
ProductCode,Date
''' + '''\
x1,2016-10-15 00:00:43
x2,2016-10-15 00:00:56
x3,2016-10-15 00:00:56
x4,2016-10-15 00:00:12
x5,2016-10-15 00:00:34
x6,2016-10-15 00:00:55
x7,2016-10-15 00:00:06
y7,2016-10-15 00:00:01
x8,2016-10-15 00:00:00
z3,2016-10-15 00:00:02
''' * 1000)"""

time('pd.read_csv(data); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'infer_datetime_format=True); data.seek(0)')
time('pd.read_csv(data, parse_dates=["Date"],'
     'date_parser=lambda x: pd.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")); data.seek(0)')

prints:

Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28) 
[Clang 6.0 (clang-600.0.57)] on darwin
Pandas version 0.23.4
Format %m/%d/%y
0.19123052499999993
8.20691274
8.143124389
1.2384357139999977
Format %Y-%m-%d %H:%M:%S
0.5238807110000039
0.9202787830000005
0.9832778819999959
12.002349824999996

So with iso8601-formatted date (%Y-%m-%d %H:%M:%S is apparently an iso8601-formatted date, I guess the T can be dropped and replaced by a space) you should not specify infer_datetime_format (which does not make a difference with more common ones either apparently) and passing your own parser in just cripples performance. On the other hand, date_parser does make a difference with not so standard day formats. Be sure to time before you optimize, as usual.


回答 7

加载csv文件中包含date列时,我们有两种方法可以使熊猫识别date列,即

  1. 熊猫通过arg明确识别格式 date_parser=mydateparser

  2. 熊猫通过AGR隐式识别格式 infer_datetime_format=True

一些日期列数据

18/01/18

18/02/02

这里我们不知道前两件事,可能是一个月或一天。因此,在这种情况下,我们必须使用方法1:-显式传递格式

    mydateparser = lambda x: pd.datetime.strptime(x, "%m/%d/%y")
    df = pd.read_csv(file_name, parse_dates=['date_col_name'],
date_parser=mydateparser)

方法2:-隐式或自动识别格式

df = pd.read_csv(file_name, parse_dates=[date_col_name],infer_datetime_format=True)

While loading csv file contain date column.We have two approach to to make pandas to recognize date column i.e

  1. Pandas explicit recognize the format by arg date_parser=mydateparser

  2. Pandas implicit recognize the format by agr infer_datetime_format=True

Some of the date column data

01/01/18

01/02/18

Here we don’t know the first two things It may be month or day. So in this case we have to use Method 1:- Explicit pass the format

    mydateparser = lambda x: pd.datetime.strptime(x, "%m/%d/%y")
    df = pd.read_csv(file_name, parse_dates=['date_col_name'],
date_parser=mydateparser)

Method 2:- Implicit or Automatically recognize the format

df = pd.read_csv(file_name, parse_dates=[date_col_name],infer_datetime_format=True)

Python 2如何比较string和int?为什么列表比较的结果大于数字,而元组的结果大于列表?

问题:Python 2如何比较string和int?为什么列表比较的结果大于数字,而元组的结果大于列表?

以下代码段带有输出注释(如ideone.com所示):

print "100" < "2"      # True
print "5" > "9"        # False

print "100" < 2        # False
print 100 < "2"        # True

print 5 > "9"          # False
print "5" > 9          # True

print [] > float('inf') # True
print () > []          # True

有人可以解释为什么这样的输出吗?


实施细节

  • 语言规范规定了这种行为,还是由实施者决定?
  • 任何主要的Python实现之间都有区别吗?
  • Python语言的版本之间有区别吗?

The following snippet is annotated with the output (as seen on ideone.com):

print "100" < "2"      # True
print "5" > "9"        # False

print "100" < 2        # False
print 100 < "2"        # True

print 5 > "9"          # False
print "5" > 9          # True

print [] > float('inf') # True
print () > []          # True

Can someone explain why the output is as such?


Implementation details

  • Is this behavior mandated by the language spec, or is it up to implementors?
  • Are there differences between any of the major Python implementations?
  • Are there differences between versions of the Python language?

回答 0

python 2手册

CPython实现细节:除数字外,其他类型的对象按其类型名称排序;不支持正确比较的相同类型的对象按其地址排序。

当您对两个字符串或两个数字类型进行排序时,将以预期的方式进行排序(字符串的字典顺序,整数的数字顺序)。

订购数字类型和非数字类型时,数字类型优先。

>>> 5 < 'foo'
True
>>> 5 < (1, 2)
True
>>> 5 < {}
True
>>> 5 < [1, 2]
True

当您订购两个都不兼容的类型(其中两个都不是数字)时,将按其类型名的字母顺序对其进行排序:

>>> [1, 2] > 'foo'   # 'list' < 'str' 
False
>>> (1, 2) > 'foo'   # 'tuple' > 'str'
True

>>> class Foo(object): pass
>>> class Bar(object): pass
>>> Bar() < Foo()
True

一个exceptions是旧样式类,它总是先于新样式类。

>>> class Foo: pass           # old-style
>>> class Bar(object): pass   # new-style
>>> Bar() < Foo()
False

语言规范规定了这种行为,还是由实施者决定?

没有语言规范。该语言参考说:

否则,不同类型的对象总是比较不相等,并且被一致地,任意地排序。

因此,这是一个实现细节。

任何主要的Python实现之间都有区别吗?

我无法回答这一问题,因为我只使用了官方的CPython实现,但是还有其他Python实现,例如PyPy。

Python语言的版本之间有区别吗?

在Python 3.x中,行为已更改,因此尝试对整数和字符串进行排序将引发错误:

>>> '10' > 5
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    '10' > 5
TypeError: unorderable types: str() > int()

From the python 2 manual:

CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.

When you order two strings or two numeric types the ordering is done in the expected way (lexicographic ordering for string, numeric ordering for integers).

When you order a numeric and a non-numeric type, the numeric type comes first.

>>> 5 < 'foo'
True
>>> 5 < (1, 2)
True
>>> 5 < {}
True
>>> 5 < [1, 2]
True

When you order two incompatible types where neither is numeric, they are ordered by the alphabetical order of their typenames:

>>> [1, 2] > 'foo'   # 'list' < 'str' 
False
>>> (1, 2) > 'foo'   # 'tuple' > 'str'
True

>>> class Foo(object): pass
>>> class Bar(object): pass
>>> Bar() < Foo()
True

One exception is old-style classes that always come before new-style classes.

>>> class Foo: pass           # old-style
>>> class Bar(object): pass   # new-style
>>> Bar() < Foo()
False

Is this behavior mandated by the language spec, or is it up to implementors?

There is no language specification. The language reference says:

Otherwise, objects of different types always compare unequal, and are ordered consistently but arbitrarily.

So it is an implementation detail.

Are there differences between any of the major Python implementations?

I can’t answer this one because I have only used the official CPython implementation, but there are other implementations of Python such as PyPy.

Are there differences between versions of the Python language?

In Python 3.x the behaviour has been changed so that attempting to order an integer and a string will raise an error:

>>> '10' > 5
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    '10' > 5
TypeError: unorderable types: str() > int()

回答 1

字符串字典顺序比较,不同类型由它们的类型的名称进行比较("int"< "string")。3.x通过使它们不可比来解决了第二点。

Strings are compared lexicographically, and dissimilar types are compared by the name of their type ("int" < "string"). 3.x fixes the second point by making them non-comparable.


测试变量是列表还是元组

问题:测试变量是列表还是元组

在python中,测试变量是否包含列表或元组的最佳方法是什么?(即集合)

isinstance()邪恶的建议在这里?http://www.canonical.org/~kragen/isinstance/

更新:我想从字符串中区分列表的最常见原因是当我有一些无限深的嵌套树/字符串列表等列表的数据结构时,我正在使用递归算法进行探索,我需要知道我何时击中“叶子”节点。

In python, what’s the best way to test if a variable contains a list or a tuple? (ie. a collection)

Is isinstance() as evil as suggested here? http://www.canonical.org/~kragen/isinstance/

Update: the most common reason I want to distinguish a list from a string is when I have some indefinitely deep nested tree / data-structure of lists of lists of lists of strings etc. which I’m exploring with a recursive algorithm and I need to know when I’ve hit the “leaf” nodes.


回答 0

继续使用,isinstance如果需要的话。这有点邪恶,因为它不包括自定义序列,迭代器和您可能实际需要的其他东西。但是,有时,例如,有人传递字符串时,您需要采取不同的行为。我的偏好是明确检查strunicode类似:

import types
isinstance(var, types.StringTypes)

NB千万不要误会types.StringTypetypes.StringTypes。后者包含strunicode对象。

types许多人认为该模块已过时,只支持直接检查对象的类型,因此,如果您不想使用以上内容,则可以替代地显式检查strand unicode,例如:

isinstance(var, (str, unicode)):

编辑:

更好的是:

isinstance(var, basestring)

结束编辑

在这两种情况中的任何一种之后,您都可以回到正常的序列状态,让非序列引发适当的异常。

看到关于类型检查的“邪恶”之处不是您可能不想对某种特定类型的对象表现出不同的行为,而是您人为地限制了函数使用意外的对象类型来执行正确的操作,否则它们将执行正确的操作。如果您有未经过类型检查的最终后备,则可以删除此限制。应该注意的是,过多的类型检查是一种代码异味,表明您可能想要进行一些重构,但这并不一定意味着您应该避免从getgo中进行此操作。

Go ahead and use isinstance if you need it. It is somewhat evil, as it excludes custom sequences, iterators, and other things that you might actually need. However, sometimes you need to behave differently if someone, for instance, passes a string. My preference there would be to explicitly check for str or unicode like so:

import types
isinstance(var, types.StringTypes)

N.B. Don’t mistake types.StringType for types.StringTypes. The latter incorporates str and unicode objects.

The types module is considered by many to be obsolete in favor of just checking directly against the object’s type, so if you’d rather not use the above, you can alternatively check explicitly against str and unicode, like this:

isinstance(var, (str, unicode)):

Edit:

Better still is:

isinstance(var, basestring)

End edit

After either of these, you can fall back to behaving as if you’re getting a normal sequence, letting non-sequences raise appropriate exceptions.

See the thing that’s “evil” about type checking is not that you might want to behave differently for a certain type of object, it’s that you artificially restrict your function from doing the right thing with unexpected object types that would otherwise do the right thing. If you have a final fallback that is not type-checked, you remove this restriction. It should be noted that too much type checking is a code smell that indicates that you might want to do some refactoring, but that doesn’t necessarily mean you should avoid it from the getgo.


回答 1

if type(x) is list:
    print 'a list'
elif type(x) is tuple:
    print 'a tuple'
else:
    print 'neither a tuple or a list'
if type(x) is list:
    print 'a list'
elif type(x) is tuple:
    print 'a tuple'
else:
    print 'neither a tuple or a list'

回答 2

没有什么错误使用isinstance,只要它不是多余的。如果变量仅应是列表/元组,则记录该接口并按原样使用它。否则,检查是完全合理的:

if isinstance(a, collections.Iterable):
    # use as a container
else:
    # not a container!

这种类型的检查确实有一些很好的使用情况,如与标准字符串startswith / 的endsWith方法(虽然是准确的,这些都是使用一个明确的检查,看它是否是一个元组用C语言实现的CPython的-有不止一种方法如您所链接的文章所述,以解决此问题)。

显式检查通常比尝试将对象用作容器并处理异常要好-这可能会导致部分或不必要地运行代码的各种问题。

There’s nothing wrong with using isinstance as long as it’s not redundant. If a variable should only be a list/tuple then document the interface and just use it as such. Otherwise a check is perfectly reasonable:

if isinstance(a, collections.Iterable):
    # use as a container
else:
    # not a container!

This type of check does have some good use-cases, such as with the standard string startswith / endswith methods (although to be accurate these are implemented in C in CPython using an explicit check to see if it’s a tuple – there’s more than one way to solve this problem, as mentioned in the article you link to).

An explicit check is often better than trying to use the object as a container and handling the exception – that can cause all sorts of problems with code being run partially or unnecessarily.


回答 3

将自变量需要记录为序列,并将其用作序列。不要检查类型。

Document the argument as needing to be a sequence, and use it as a sequence. Don’t check the type.


回答 4

如何:hasattr(a, "__iter__")

它告诉返回的对象是否可以作为生成器进行迭代。默认情况下,元组和列表可以,但字符串类型不能。

How about: hasattr(a, "__iter__") ?

It tells if the object returned can be iterated over as a generator. By default, tuples and lists can, but not the string types.


回答 5

在Python 2.8 type(list) is list返回上,false
我建议以这种可怕的方式比较类型:

if type(a) == type([]) :
  print "variable a is a list"

(至少在我的系统上,在Mac OS X Yosemite上使用anaconda)

On Python 2.8 type(list) is list returns false
I would suggest comparing the type in this horrible way:

if type(a) == type([]) :
  print "variable a is a list"

(well at least on my system, using anaconda on Mac OS X Yosemite)


回答 6

Python使用“鸭子类型”,即,如果变量像鸭子一样醒来,则它一定是鸭子。在您的情况下,您可能希望它是可迭代的,或者您想以某个索引访问该项目。您应该这样做:即在块中for var:或块var[idx]内使用对象try,如果遇到异常,它就不是鸭子。

Python uses “Duck typing”, i.e. if a variable kwaks like a duck, it must be a duck. In your case, you probably want it to be iterable, or you want to access the item at a certain index. You should just do this: i.e. use the object in for var: or var[idx] inside a try block, and if you get an exception it wasn’t a duck…


回答 7

>>> l = []
>>> l.__class__.__name__ in ('list', 'tuple')
True
>>> l = []
>>> l.__class__.__name__ in ('list', 'tuple')
True

回答 8

如果您只需要知道是否可以foo[123]对变量使用符号,则可以使用以下命令检查__getitem__属性的存在(这是python在通过索引访问时调用的内容)hasattr(foo, '__getitem__')

If you just need to know if you can use the foo[123] notation with the variable, you can check for the existence of a __getitem__ attribute (which is what python calls when you access by index) with hasattr(foo, '__getitem__')


回答 9

如果您真的想处理几乎任何函数参数,则必须进行更复杂的测试。

type(a) != type('') and hasattr(a, "__iter__")

尽管通常只需说明一个函数期望可迭代然后仅检查即可type(a) != type('')

也可能会发生这样的情况:对于字符串,您具有简单的处理路径,或者您会变得很好并进行拆分等,因此您不想大喊大叫,如果有人给您发送一些奇怪的东西,请让他拥有一个exceptions。

Has to be more complex test if you really want to handle just about anything as function argument.

type(a) != type('') and hasattr(a, "__iter__")

Although, usually it’s enough to just spell out that a function expects iterable and then check only type(a) != type('').

Also it may happen that for a string you have a simple processing path or you are going to be nice and do a split etc., so you don’t want to yell at strings and if someone sends you something weird, just let him have an exception.


回答 10

找出变量是列表变量还是元组变量或通常检查变量类型的另一种简便方法是:

    def islist(obj):

        if ("list" in str(type(obj)) ): return True

        else : return False

Another easy way to find out if a variable is either list or tuple or generally check variable type would be :

    def islist(obj):

        if ("list" in str(type(obj)) ): return True

        else : return False

回答 11

原则上,我同意上面的Ignacio,但是您也可以使用type来检查某项是元组还是列表。

>>> a = (1,)
>>> type(a)
(type 'tuple')
>>> a = [1]
>>> type(a)
(type 'list')

In principle, I agree with Ignacio, above, but you can also use type to check if something is a tuple or a list.

>>> a = (1,)
>>> type(a)
(type 'tuple')
>>> a = [1]
>>> type(a)
(type 'list')

如何检查对象是列表还是元组(而不是字符串)?

问题:如何检查对象是列表还是元组(而不是字符串)?

这就是我通常做,以确定输入是一个list/ tuple-但不是str。因为很多时候我偶然发现了一个错误,即一个函数str错误地传递了一个对象,而目标函数确实for x in lst假定这lst实际上是一个listor tuple

assert isinstance(lst, (list, tuple))

我的问题是:是否有更好的方法来实现这一目标?

This is what I normally do in order to ascertain that the input is a list/tuple – but not a str. Because many times I stumbled upon bugs where a function passes a str object by mistake, and the target function does for x in lst assuming that lst is actually a list or tuple.

assert isinstance(lst, (list, tuple))

My question is: is there a better way of achieving this?


回答 0

仅在python 2中(不是python 3):

assert not isinstance(lst, basestring)

实际上就是您想要的,否则您会错过很多像列表一样的东西,但它们不是listor的子类tuple

In python 2 only (not python 3):

assert not isinstance(lst, basestring)

Is actually what you want, otherwise you’ll miss out on a lot of things which act like lists, but aren’t subclasses of list or tuple.


回答 1

请记住,在Python中,我们要使用“鸭子类型”。因此,任何类似列表的行为都可以视为列表。因此,不要检查列表的类型,只看它是否像列表一样。

但是字符串也像列表一样,通常这不是我们想要的。有时甚至是一个问题!因此,显式检查字符串,然后使用鸭子类型。

这是我写的一个有趣的函数。这是它的特殊版本,repr()可以在尖括号('<‘,’>’)中打印任何序列。

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

总体而言,这是干净优雅的。但是那张isinstance()支票在那里做什么?这是一种hack。但这是必不可少的。

该函数以递归方式调用类似于列表的任何对象。如果我们不专门处理字符串,则将其视为列表,并一次拆分一个字符。但是,然后递归调用将尝试将每个字符视为一个列表-它将起作用!即使是一个字符的字符串也可以作为列表!该函数将继续递归调用自身,直到堆栈溢出为止。

像这样的函数,依赖于每个递归调用来分解要完成的工作,必须使用特殊情况的字符串-因为您不能将字符串分解为一个字符以下的字符串,甚至不能分解为一个以下的字符串-字符字符串的作用类似于列表。

注意:try/ except是表达我们意图的最干净的方法。但是,如果这段代码在某种程度上对时间很紧迫,我们可能要用某种测试来替换它,看看是否arg是一个序列。除了测试类型,我们可能应该测试行为。如果它有一个.strip()方法,它是一个字符串,所以不要认为它是一个序列。否则,如果它是可索引的或可迭代的,则它是一个序列:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

编辑:我最初写上面检查,__getslice__()但我注意到在collections模块文档中,有趣的方法是__getitem__(); 这很有意义,这就是您索引对象的方式。这似乎比根本,__getslice__()因此我更改了上面的内容。

Remember that in Python we want to use “duck typing”. So, anything that acts like a list can be treated as a list. So, don’t check for the type of a list, just see if it acts like a list.

But strings act like a list too, and often that is not what we want. There are times when it is even a problem! So, check explicitly for a string, but then use duck typing.

Here is a function I wrote for fun. It is a special version of repr() that prints any sequence in angle brackets (‘<‘, ‘>’).

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

This is clean and elegant, overall. But what’s that isinstance() check doing there? That’s kind of a hack. But it is essential.

This function calls itself recursively on anything that acts like a list. If we didn’t handle the string specially, then it would be treated like a list, and split up one character at a time. But then the recursive call would try to treat each character as a list — and it would work! Even a one-character string works as a list! The function would keep on calling itself recursively until stack overflow.

Functions like this one, that depend on each recursive call breaking down the work to be done, have to special-case strings–because you can’t break down a string below the level of a one-character string, and even a one-character string acts like a list.

Note: the try/except is the cleanest way to express our intentions. But if this code were somehow time-critical, we might want to replace it with some sort of test to see if arg is a sequence. Rather than testing the type, we should probably test behaviors. If it has a .strip() method, it’s a string, so don’t consider it a sequence; otherwise, if it is indexable or iterable, it’s a sequence:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

EDIT: I originally wrote the above with a check for __getslice__() but I noticed that in the collections module documentation, the interesting method is __getitem__(); this makes sense, that’s how you index an object. That seems more fundamental than __getslice__() so I changed the above.


回答 2

H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.
H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.

回答 3

对于Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

在版本3.3中进行了更改:将集合抽象基类移至collections.abc模块。为了向后兼容,它们在此模块中也将继续可见,直到3.8版将停止工作为止。

对于Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

For Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

Changed in version 3.3: Moved Collections Abstract Base Classes to the collections.abc module. For backwards compatibility, they will continue to be visible in this module as well until version 3.8 where it will stop working.

For Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

回答 4

具有PHP风格的Python:

def is_array(var):
    return isinstance(var, (list, tuple))

Python with PHP flavor:

def is_array(var):
    return isinstance(var, (list, tuple))

回答 5

一般来说,在对象上进行迭代的函数不仅可以处理错误,还可以处理字符串,元组和列表。您当然可以使用isinstance或鸭式输入来检查参数,但是为什么要这么做呢?

这听起来像是个反问,但事实并非如此。答案为“为什么我应该检查参数的类型?” 可能会建议解决实际问题,而不是感知到的问题。将字符串传递给函数时,为什么会出错?另外:如果将字符串传递给此函数是一个错误,是否将其他非列表/元组可迭代传递给它也是一个错误吗?为什么或者为什么不?

我认为这个问题的最常见答案可能是 f("abc")期望该函数的行为就像编写的一样f(["abc"])。在某些情况下,保护开发人员免受自身侵害比支持对字符串中的字符进行迭代的用例更有意义。但是我首先会考虑很长时间。

Generally speaking, the fact that a function which iterates over an object works on strings as well as tuples and lists is more feature than bug. You certainly can use isinstance or duck typing to check an argument, but why should you?

That sounds like a rhetorical question, but it isn’t. The answer to “why should I check the argument’s type?” is probably going to suggest a solution to the real problem, not the perceived problem. Why is it a bug when a string is passed to the function? Also: if it’s a bug when a string is passed to this function, is it also a bug if some other non-list/tuple iterable is passed to it? Why, or why not?

I think that the most common answer to the question is likely to be that developers who write f("abc") are expecting the function to behave as though they’d written f(["abc"]). There are probably circumstances where it makes more sense to protect developers from themselves than it does to support the use case of iterating across the characters in a string. But I’d think long and hard about it first.


回答 6

尝试此操作以提高可读性和最佳做法:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

希望能帮助到你。

Try this for readability and best practices:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

Hope it helps.


回答 7

str对象没有__iter__属性

>>> hasattr('', '__iter__')
False 

所以你可以检查一下

assert hasattr(x, '__iter__')

这也AssertionError将为其他任何不可迭代的对象带来好处。

编辑: 正如蒂姆在评论中提到的那样,这仅适用于python 2.x,而不是3.x

The str object doesn’t have an __iter__ attribute

>>> hasattr('', '__iter__')
False 

so you can do a check

assert hasattr(x, '__iter__')

and this will also raise a nice AssertionError for any other non-iterable object too.

Edit: As Tim mentions in the comments, this will only work in python 2.x, not 3.x


回答 8

这并不是要直接回答OP,而是要分享一些相关想法。

我对上面的@steveha回答非常感兴趣,这似乎举了一个鸭子输入似乎中断的示例。换个角度说,他的例子表明鸭子的分类很难遵循,但是并不能说明str值得任何特殊处理。

毕竟,非str类型(例如,维护一些复杂的递归结构的用户定义类型)可能导致@steveha srepr函数引起无限递归。尽管这确实不太可能,但我们不能忽略这种可能性。因此,与其特殊外壳strsrepr,我们应该明确,我们想要什么srepr在无限递归产生时的事情情况。

似乎一种合理的方法是srepr暂时中断当前递归list(arg) == [arg]。这,其实,彻底解决这个问题str,没有任何isinstance

但是,真正复杂的递归结构可能会导致无限循环,list(arg) == [arg]永远不会发生。因此,尽管上面的检查很有用,但还不够。我们需要对递归深度进行严格限制。

我的观点是,如果您打算处理任意参数类型,则str通过鸭子类型进行处理要比处理(理论上)遇到的更通用类型容易得多。因此,如果您需要排除str实例,则应该要求该参数是您明确指定的几种类型之一的实例。

This is not intended to directly answer the OP, but I wanted to share some related ideas.

I was very interested in @steveha answer above, which seemed to give an example where duck typing seems to break. On second thought, however, his example suggests that duck typing is hard to conform to, but it does not suggest that str deserves any special handling.

After all, a non-str type (e.g., a user-defined type that maintains some complicated recursive structures) may cause @steveha srepr function to cause an infinite recursion. While this is admittedly rather unlikely, we can’t ignore this possibility. Therefore, rather than special-casing str in srepr, we should clarify what we want srepr to do when an infinite recursion results.

It may seem that one reasonable approach is to simply break the recursion in srepr the moment list(arg) == [arg]. This would, in fact, completely solve the problem with str, without any isinstance.

However, a really complicated recursive structure may cause an infinite loop where list(arg) == [arg] never happens. Therefore, while the above check is useful, it’s not sufficient. We need something like a hard limit on the recursion depth.

My point is that if you plan to handle arbitrary argument types, handling str via duck typing is far, far easier than handling the more general types you may (theoretically) encounter. So if you feel the need to exclude str instances, you should instead demand that the argument is an instance of one of the few types that you explicitly specify.


回答 9

在tensorflow中找到了一个名为is_sequence的函数

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

而且我已经证实它可以满足您的需求。

I find such a function named is_sequence in tensorflow.

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

And I have verified that it meets your needs.


回答 10

我在测试用例中执行此操作。

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

未经生成器测试,我认为如果通过生成器,您将处于下一个“收益”状态,这可能会使下游情况恶化。但是再说一次,这是一个“单元测试”

I do this in my testcases.

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

Untested on generators, I think you are left at the next ‘yield’ if passed in a generator, which may screw things up downstream. But then again, this is a ‘unittest’


回答 11

以“鸭子打字”的方式

try:
    lst = lst + []
except TypeError:
    #it's not a list

要么

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

分别。这避免了isinstance/ hasattr内省的东西。

您也可以反之亦然:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

所有变体实际上都不会更改变量的内容,而是暗示了重新分配。我不确定这在某些情况下是否不受欢迎。

有趣的是,在任何情况下,如果是列表(不是元组),则在“就地”赋值时都不会引发+=no 。这就是为什么以这种方式完成分配的原因。也许有人可以阐明原因。TypeErrorlst

In “duck typing” manner, how about

try:
    lst = lst + []
except TypeError:
    #it's not a list

or

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

respectively. This avoids the isinstance / hasattr introspection stuff.

You could also check vice versa:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

All variants do not actually change the content of the variable, but imply a reassignment. I’m unsure whether this might be undesirable under some circumstances.

Interestingly, with the “in place” assignment += no TypeError would be raised in any case if lst is a list (not a tuple). That’s why the assignment is done this way. Maybe someone can shed light on why that is.


回答 12

最简单的方法…使用anyisinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

simplest way… using any and isinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

回答 13

鸭式打字的另一种形式,可以帮助区分类似字符串的对象和其他类似序列的对象。

类字符串对象的字符串表示形式是字符串本身,因此您可以检查是否从str构造函数中返回了相等的对象:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

这应该适用于与str所有可迭代对象兼容的所有对象。

Another version of duck-typing to help distinguish string-like objects from other sequence-like objects.

The string representation of string-like objects is the string itself, so you can check if you get an equal object back from the str constructor:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

This should work for all objects compatible with str and for all kinds of iterable objects.


回答 14

Python 3具有以下功能:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

因此,要同时检查列表和元组,将是:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

Python 3 has this:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

So to check for both Lists and Tuples, it would be:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

回答 15

assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"
assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"

回答 16

做这个

if type(lst) in (list, tuple):
    # Do stuff

Just do this

if type(lst) in (list, tuple):
    # Do stuff

回答 17

在python> 3.6中

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

in python >3.6

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

回答 18

我倾向于这样做(如果真的必须这样做的话):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

I tend to do this (if I really, really had to):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

如何找出Python对象是否是字符串?

问题:如何找出Python对象是否是字符串?

如何检查Python对象是字符串(常规还是Unicode)?

How can I check if a Python object is a string (either regular or Unicode)?


回答 0

Python 2

使用isinstance(obj, basestring)一个对象来测试obj

文件

Python 2

Use isinstance(obj, basestring) for an object-to-test obj.

Docs.


回答 1

Python 2

要检查对象o是否是字符串类型的子类的字符串类型:

isinstance(o, basestring)

因为str和和unicode都是的子类basestring

检查的类型o是否完全是str

type(o) is str

检查是否o是的实例str或的任何子类str

isinstance(o, str)

以上还为Unicode字符串的工作,如果你更换str使用unicode

但是,您可能根本不需要进行显式类型检查。“鸭子打字”可能符合您的需求。请参阅http://docs.python.org/glossary.html#term-duck-typing

另请参阅在python中检查类型的规范方法是什么?

Python 2

To check if an object o is a string type of a subclass of a string type:

isinstance(o, basestring)

because both str and unicode are subclasses of basestring.

To check if the type of o is exactly str:

type(o) is str

To check if o is an instance of str or any subclass of str:

isinstance(o, str)

The above also work for Unicode strings if you replace str with unicode.

However, you may not need to do explicit type checking at all. “Duck typing” may fit your needs. See http://docs.python.org/glossary.html#term-duck-typing.

See also What’s the canonical way to check for type in python?


回答 2

Python 3

在Python 3.x basestring中,str唯一的字符串类型(具有Python 2.x的语义unicode)不再可用。

因此,Python 3.x中的检查只是:

isinstance(obj_to_test, str)

这是对官方转换工具的修复2to3:转换basestringstr

Python 3

In Python 3.x basestring is not available anymore, as str is the sole string type (with the semantics of Python 2.x’s unicode).

So the check in Python 3.x is just:

isinstance(obj_to_test, str)

This follows the fix of the official 2to3 conversion tool: converting basestring to str.


回答 3

Python 2和3

(兼容)

如果您不想检查Python版本(2.x与3.x),请使用sixPyPI)及其string_types属性:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

six(一个重量很轻的单文件模块)中,它只是在做这件事

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

Python 2 and 3

(cross-compatible)

If you want to check with no regard for Python version (2.x vs 3.x), use six (PyPI) and its string_types attribute:

import six

if isinstance(obj, six.string_types):
    print('obj is a string!')

Within six (a very light-weight single-file module), it’s simply doing this:

import sys
PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str
else:
    string_types = basestring

回答 4

我发现了这个更多pythonic

if type(aObject) is str:
    #do your stuff here
    pass

由于类型对象是单例,因此可以用于将对象与str类型进行比较

I found this ans more pythonic:

if type(aObject) is str:
    #do your stuff here
    pass

since type objects are singleton, is can be used to do the compare the object to the str type


回答 5

如果一个人想从明确的类型检查(也有说走就走很好的理由远离它),可能是最安全的弦协议的一部分,以检查:

str(maybe_string) == maybe_string

它不会通过迭代的迭代或迭代器,它不会调用列表的串一个字符串,它正确地检测弦乐器的弦。

当然有缺点。例如,str(maybe_string)可能是繁重的计算。通常,答案取决于它

编辑:作为@Tcll 指出的意见,问题实际上询问的方式同时检测unicode字符串和字节串。在Python 2上,此答案将失败,但包含非ASCII字符的unicode字符串将exceptions,在Python 3上,它将False为所有字节串返回。

If one wants to stay away from explicit type-checking (and there are good reasons to stay away from it), probably the safest part of the string protocol to check is:

str(maybe_string) == maybe_string

It won’t iterate through an iterable or iterator, it won’t call a list-of-strings a string and it correctly detects a stringlike as a string.

Of course there are drawbacks. For example, str(maybe_string) may be a heavy calculation. As so often, the answer is it depends.

EDIT: As @Tcll points out in the comments, the question actually asks for a way to detect both unicode strings and bytestrings. On Python 2 this answer will fail with an exception for unicode strings that contain non-ASCII characters, and on Python 3 it will return False for all bytestrings.


回答 6

为了检查您的变量是否是某些东西,您可以像这样:

s='Hello World'
if isinstance(s,str):
#do something here,

isistance的输出将为您提供布尔值True或False,以便您可以进行相应的调整。您可以通过最初使用以下命令检查您的值的期望首字母缩写:type(s)这将返回您键入“ str”,以便您可以在isistance函数中使用它。

In order to check if your variable is something you could go like:

s='Hello World'
if isinstance(s,str):
#do something here,

The output of isistance will give you a boolean True or False value so you can adjust accordingly. You can check the expected acronym of your value by initially using: type(s) This will return you type ‘str’ so you can use it in the isistance function.


回答 7

我可能会像其他人提到的那样以鸭子打字的方式处理这个问题。我怎么知道一个字符串真的是一个字符串?好吧,显然是通过转换为字符串!

def myfunc(word):
    word = unicode(word)
    ...

如果arg已经是字符串或unicode类型,则real_word将保持其值不变。如果传递的对象实现一个__unicode__方法,则该方法用于获取其unicode表示形式。如果传递的对象不能用作字符串,则unicode内建函数引发异常。

I might deal with this in the duck-typing style, like others mention. How do I know a string is really a string? well, obviously by converting it to a string!

def myfunc(word):
    word = unicode(word)
    ...

If the arg is already a string or unicode type, real_word will hold its value unmodified. If the object passed implements a __unicode__ method, that is used to get its unicode representation. If the object passed cannot be used as a string, the unicode builtin raises an exception.


回答 8

isinstance(your_object, basestring)

如果您的对象确实是字符串类型,则将为True。’str’是保留字。

抱歉,正确的答案是使用’basestring’而不是’str’,以便它也包括unicode字符串-如上文其他响应者所述。

isinstance(your_object, basestring)

will be True if your object is indeed a string-type. ‘str’ is reserved word.

my apologies, the correct answer is using ‘basestring’ instead of ‘str’ in order of it to include unicode strings as well – as been noted above by one of the other responders.


回答 9

今天晚上,我遇到了一种情况,我以为我必须检查一下str类型,但事实证明我没有。

我解决问题的方法可能在许多情况下都可以使用,因此,在其他阅读此问题的人员感兴趣的情况下,我在下面提供了此方法(仅适用于Python 3)。

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

一些测试:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

This evening I ran into a situation in which I thought I was going to have to check against the str type, but it turned out I did not.

My approach to solving the problem will probably work in many situations, so I offer it below in case others reading this question are interested (Python 3 only).

# NOTE: fields is an object that COULD be any number of things, including:
# - a single string-like object
# - a string-like object that needs to be converted to a sequence of 
# string-like objects at some separator, sep
# - a sequence of string-like objects
def getfields(*fields, sep=' ', validator=lambda f: True):
    '''Take a field sequence definition and yield from a validated
     field sequence. Accepts a string, a string with separators, 
     or a sequence of strings'''
    if fields:
        try:
            # single unpack in the case of a single argument
            fieldseq, = fields
            try:
                # convert to string sequence if string
                fieldseq = fieldseq.split(sep)
            except AttributeError:
                # not a string; assume other iterable
                pass
        except ValueError:
            # not a single argument and not a string
            fieldseq = fields
        invalid_fields = [field for field in fieldseq if not validator(field)]
        if invalid_fields:
            raise ValueError('One or more field names is invalid:\n'
                             '{!r}'.format(invalid_fields))
    else:
        raise ValueError('No fields were provided')
    try:
        yield from fieldseq
    except TypeError as e:
        raise ValueError('Single field argument must be a string'
                         'or an interable') from e

Some tests:

from . import getfields

def test_getfields_novalidation():
    result = ['a', 'b']
    assert list(getfields('a b')) == result
    assert list(getfields('a,b', sep=',')) == result
    assert list(getfields('a', 'b')) == result
    assert list(getfields(['a', 'b'])) == result

回答 10

它很简单,请使用以下代码(我们假设提到的对象为obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

Its simple, use the following code (we assume the object mentioned to be obj)-

if type(obj) == str:
    print('It is a string')
else:
    print('It is not a string.')

回答 11

您可以通过连接一个空字符串来测试它:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

编辑

在指出指出列表失败的评论后纠正我的答案

def is_string(s):
  return isinstance(s, basestring)

You can test it by concatenating with an empty string:

def is_string(s):
  try:
    s += ''
  except:
    return False
  return True

Edit:

Correcting my answer after comments pointing out that this fails with lists

def is_string(s):
  return isinstance(s, basestring)

回答 12

对于类似字符串的鸭式打字方法,它具有同时使用Python 2.x和3.x的优点:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

明智的鱼转而使用鸭式输入法之前就与鸭式输入isinstance方式很接近,只是+=对列表的含义与以前不同+

For a nice duck-typing approach for string-likes that has the bonus of working with both Python 2.x and 3.x:

def is_string(obj):
    try:
        obj + ''
        return True
    except TypeError:
        return False

wisefish was close with the duck-typing before he switched to the isinstance approach, except that += has a different meaning for lists than + does.


回答 13

if type(varA) == str or type(varB) == str:
    print 'string involved'

来自EDX-在线类MITx:6.00.1x使用Python进行计算机科学和编程简介

if type(varA) == str or type(varB) == str:
    print 'string involved'

from EDX – online course MITx: 6.00.1x Introduction to Computer Science and Programming Using Python