标签归档:conditional

为什么Pylint认为在条件值中使用len(SEQUENCE)不正确?

问题:为什么Pylint认为在条件值中使用len(SEQUENCE)不正确?

考虑以下代码片段:

from os import walk

files = []
for (dirpath, _, filenames) in walk(mydir):
    # more code that modifies files
if len(files) == 0: # <-- C1801
    return None

Pylint使我对有关if语句行的消息感到震惊:

[pylint] C1801:请勿len(SEQUENCE)用作条件值

乍一看,规则C1801在我看来并不十分合理,参考指南中的定义也无法解释为什么这是一个问题。实际上,它彻头彻尾地称其为不正确的用法

len-as-condition(C1801)不要len(SEQUENCE)用作条件值当Pylint检测到内部条件不正确使用len(sequence)时使用。

我的搜索尝试也未能为我提供更深入的解释。我确实知道,序列的length属性可能会被延迟评估,并且__len__可以编程为具有副作用,但是令人怀疑的是,仅此一个问题是否足以使Pylint认为这种用法不正确。因此,在我简单地将项目配置为忽略规则之前,我想知道我的推理中是否缺少某些内容。

什么时候将len(SEQ)用作条件值有问题?Pylint尝试使用C1801避免哪些主要情况?

Considering this code snippet:

from os import walk

files = []
for (dirpath, _, filenames) in walk(mydir):
    # more code that modifies files
if len(files) == 0: # <-- C1801
    return None

I was alarmed by Pylint with this message regarding the line with the if statement:

[pylint] C1801:Do not use len(SEQUENCE) as condition value

The rule C1801, at first glance, did not sound very reasonable to me, and the definition on the reference guide does not explain why this is a problem. In fact, it downright calls it an incorrect use.

len-as-condition (C1801): Do not use len(SEQUENCE) as condition value Used when Pylint detects incorrect use of len(sequence) inside conditions.

My search attempts have also failed to provide me a deeper explanation. I do understand that a sequence’s length property may be lazily evaluated, and that __len__ can be programmed to have side effects, but it is questionable whether that alone is problematic enough for Pylint to call such a use incorrect. Hence, before I simply configure my project to ignore the rule, I would like to know whether I am missing something in my reasoning.

When is the use of len(SEQ) as a condition value problematic? What major situations is Pylint attempting to avoid with C1801?


回答 0

什么时候将len(SEQ)用作条件值有问题?Pylint尝试使用C1801避免哪些主要情况?

使用它并不是真的有问题len(SEQUENCE)-尽管它可能没有效率那么高(请参阅chepner的评论)。无论如何,Pylint会检查代码是否符合PEP 8样式指南,该指南指出

对于序列(字符串,列表,元组),请使用空序列为假的事实。

Yes: if not seq:
     if seq:

No:  if len(seq):
     if not len(seq):

作为偶尔在各种语言之间徘徊的Python程序员,我认为该len(SEQUENCE)结构更具可读性和显式性(“显式优于隐式”)。但是,使用空序列False在布尔上下文中求值的事实被认为更“ Pythonic”。

When is the use of len(SEQ) as a condition value problematic? What major situations is Pylint attempting to avoid with C1801?

It’s not really problematic to use len(SEQUENCE) – though it may not be as efficient (see chepner’s comment). Regardless, Pylint checks code for compliance with the PEP 8 style guide which states that

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

Yes: if not seq:
     if seq:

No:  if len(seq):
     if not len(seq):

As an occasional Python programmer, who flits between languages, I’d consider the len(SEQUENCE) construct to be more readable and explicit (“Explicit is better then implicit”). However, using the fact that an empty sequence evaluates to False in a Boolean context is considered more “Pythonic”.


回答 1

请注意,使用NumPy数组时,实际上需要使用len(seq)(而不是仅检查seq的bool值)。

a = numpy.array(range(10))
if a:
    print "a is not empty"

导致异常:ValueError:具有多个元素的数组的真值不明确。使用a.any()或a.all()

因此,对于同时使用Python列表和NumPy数组的代码,C1801消息的用处不大。

Note that the use of len(seq) is in fact required (instead of just checking the bool value of seq) when using NumPy arrays.

a = numpy.array(range(10))
if a:
    print "a is not empty"

results in an exception: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

And hence for code that uses both Python lists and NumPy arrays, the C1801 message is less than helpful.


回答 2

这是pylint中的问题,并且不再视为len(x) == 0不正确。

您不应以裸露 len(x)为条件。比较len(x)反对一个明确的值,如if len(x) == 0if len(x) > 0是PEP 8完全正常和不禁止。

PEP 8

# Correct:
if not seq:
if seq:

# Wrong:
if len(seq):
if not len(seq):

请注意,不禁止明确测试长度Python禅宗指出:

显式胜于隐式。

在这两者之间的选择if not seqif not len(seq),无一不是隐含的,而行为是不同的。但是if len(seq) == 0或者if len(seq) > 0是显式比较,并且在许多情况下是正确的行为。

在pylint中,PR 2815修复了此错误,该错误首先报告为问题2684。它会继续抱怨if len(seq),但不再抱怨if len(seq) > 0。PR已在2019-03-19合并,因此如果您使用的是pylint 2.4(于2019-09-14发布),则不应看到此问题。

This was a issue in pylint, and it no longer considers len(x) == 0 as incorrect.

You should not use a bare len(x) as a condition. Comparing len(x) against an explicit value, such as if len(x) == 0 of if len(x) > 0 is totally fine and not prohibited by PEP 8.

From PEP 8:

# Correct:
if not seq:
if seq:

# Wrong:
if len(seq):
if not len(seq):

Note that explicitly testing for the length is not prohibited. The Zen of Python states:

Explicit is better than implicit.

In the choice between if not seq and if not len(seq), both are implicit but behaviour is different. But if len(seq) == 0 or if len(seq) > 0 are explicit comparisons and in many contexts the correct behaviour.

In pylint, PR 2815 has fixed this bug, first reported as issue 2684. It will continue to complain about if len(seq), but it will no longer complain about if len(seq) > 0. The PR was merged 2019-03-19 so if you are using pylint 2.4 (released 2019-09-14) you should not see this problem.


回答 3

Pylint未能提供我的代码,研究使我转向了这篇文章:

../filename.py:49:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
../filename.py:49:34: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)

这是我之前的代码:

def list_empty_folders(directory):
"""The Module Has Been Build to list empty Mac Folders."""
for (fullpath, dirnames, filenames) in os.walk(directory):
    if len(dirnames) == 0 and len(filenames) == 0:
        print("Exists: {} : Absolute Path: {}".format(
            os.path.exists(fullpath), os.path.abspath(fullpath)))

这是我的代码修复之后。通过使用int() attribute,我似乎对Pep8 / Pylint感到满意,并且似乎对我的代码没有负面影响:

def list_empty_folders(directory):
"""The Module Has Been Build to list empty Mac Folders."""
for (fullpath, dirnames, filenames) in os.walk(directory):
    if len(dirnames).__trunc__() == 0 and len(filenames).__trunc__() == 0:
        print("Exists: {} : Absolute Path: {}".format(
            os.path.exists(fullpath), os.path.abspath(fullpath)))

我的修复

通过增加.__trunc__()顺序,似乎已经解决了需求。

我的行为没有区别,但是如果有人知道我所缺少的细节,请告诉我。

Pylint was failing for my code and research led me to this post:

../filename.py:49:11: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)
../filename.py:49:34: C1801: Do not use `len(SEQUENCE)` to determine if a sequence is empty (len-as-condition)

This was my code before:

def list_empty_folders(directory):
"""The Module Has Been Build to list empty Mac Folders."""
for (fullpath, dirnames, filenames) in os.walk(directory):
    if len(dirnames) == 0 and len(filenames) == 0:
        print("Exists: {} : Absolute Path: {}".format(
            os.path.exists(fullpath), os.path.abspath(fullpath)))

This was after my code fix. By using the int() attribute, I seem to have satisfied the Pep8/Pylint and doesn’t seem to have a negative impact on my code:

def list_empty_folders(directory):
"""The Module Has Been Build to list empty Mac Folders."""
for (fullpath, dirnames, filenames) in os.walk(directory):
    if len(dirnames).__trunc__() == 0 and len(filenames).__trunc__() == 0:
        print("Exists: {} : Absolute Path: {}".format(
            os.path.exists(fullpath), os.path.abspath(fullpath)))

My Fix

By adding .__trunc__() to the sequence it seems to have settled the need.

I do not see a difference in the behaviour, but if anyone knows specifics that I am missing, please let me know.


检查Python列表中是否有东西

问题:检查Python列表中是否有东西

我在Python中有一个元组列表,并且有一个条件,如果元组不在列表中,那么我只想接受分支(如果它在列表中,那么我就不想接受if分支)

if curr_x -1 > 0 and (curr_x-1 , curr_y) not in myList: 

    # Do Something

不过,这对我来说并不是很有效。我做错了什么?

I have a list of tuples in Python, and I have a conditional where I want to take the branch ONLY if the tuple is not in the list (if it is in the list, then I don’t want to take the if branch)

if curr_x -1 > 0 and (curr_x-1 , curr_y) not in myList: 

    # Do Something

This is not really working for me though. What have I done wrong?


回答 0

该错误可能在代码中的其他地方,因为它应该可以正常工作:

>>> 3 not in [2, 3, 4]
False
>>> 3 not in [4, 5, 6]
True

或与元组:

>>> (2, 3) not in [(2, 3), (5, 6), (9, 1)]
False
>>> (2, 3) not in [(2, 7), (7, 3), "hi"]
True

The bug is probably somewhere else in your code, because it should work fine:

>>> 3 not in [2, 3, 4]
False
>>> 3 not in [4, 5, 6]
True

Or with tuples:

>>> (2, 3) not in [(2, 3), (5, 6), (9, 1)]
False
>>> (2, 3) not in [(2, 7), (7, 3), "hi"]
True

回答 1

如何检查Python列表中是否包含某些内容?

最便宜,最易读的解决方案是使用in运算符(或在您的特定情况下为not in)。如文档中所述,

运营商innot in进行会员资格测试。x in s评估 True是否x为的成员sFalse否则为。x not in s返回的否定x in s

另外,

运算符not in被定义为具有的反真值in

y not in x在逻辑上与相同not y in x

这里有一些例子:

'a' in [1, 2, 3]
# False

'c' in ['a', 'b', 'c']
# True

'a' not in [1, 2, 3]
# True

'c' not in ['a', 'b', 'c']
# False

这也适用于元组,因为元组是可哈希的(由于它们也是不可变的):

(1, 2) in [(3, 4), (1, 2)]
#  True

如果RHS上的对象定义了一个__contains__()方法,in则将在内部调用该方法,如文档“ 比较”部分的最后一段所述。

innot in,由可迭代或实现该__contains__()方法的类型支持 。例如,您可以(但不应)这样做:

[3, 2, 1].__contains__(1)
# True

in短路,因此,如果您的元素位于列表的开头,则in求值速度更快:

lst = list(range(10001))
%timeit 1 in lst
%timeit 10000 in lst  # Expected to take longer time.

68.9 ns ± 0.613 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
178 µs ± 5.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

如果您要做的不仅仅是检查项目是否在列表中,还可以使用以下选项:

  • list.index可用于检索项目的索引。如果该元素不存在,ValueError则引发a。
  • list.count 如果您要计算发生次数,可以使用。

XY问题:您考虑过sets吗?

问自己以下问题:

  • 您是否需要检查一个项目是否在列表中多次?
  • 该检查是在循环内完成还是要重复调用一个函数?
  • 您存储在列表中的项目是否可哈希化?IOW,你可以打电话hash给他们吗?

如果您对这些问题的回答为“是”,则应改用“ a” set。s 的in隶属度检验list是O(n)时间复杂度。这意味着python必须对列表进行线性扫描,访问每个元素并将其与搜索项进行比较。如果您重复执行此操作,或者列表很大,那么此操作将产生开销。

set另一方面,对象会对其值进行哈希处理以进行恒定时间成员资格检查。该检查也可以使用in

1 in {1, 2, 3} 
# True

'a' not in {'a', 'b', 'c'}
# False

(1, 2) in {('a', 'c'), (1, 2)}
# True

如果您很不幸地要搜索/不搜索的元素位于列表的末尾,则python将一直扫描列表至末尾。从以下时间可以明显看出这一点:

l = list(range(100001))
s = set(l)

%timeit 100000 in l
%timeit 100000 in s

2.58 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
101 ns ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

提醒一下,这是一个合适的选项,只要要存储和查找的元素是可哈希的即可。IOW,它们要么必须是不可变的类型,要么是必须实现的对象__hash__

How do I check if something is (not) in a list in Python?

The cheapest and most readable solution is using the in operator (or in your specific case, not in). As mentioned in the documentation,

The operators in and not in test for membership. x in s evaluates to True if x is a member of s, and False otherwise. x not in s returns the negation of x in s.

Additionally,

The operator not in is defined to have the inverse true value of in.

y not in x is logically the same as not y in x.

Here are a few examples:

'a' in [1, 2, 3]
# False

'c' in ['a', 'b', 'c']
# True

'a' not in [1, 2, 3]
# True

'c' not in ['a', 'b', 'c']
# False

This also works with tuples, since tuples are hashable (as a consequence of the fact that they are also immutable):

(1, 2) in [(3, 4), (1, 2)]
#  True

If the object on the RHS defines a __contains__() method, in will internally call it, as noted in the last paragraph of the Comparisons section of the docs.

in and not in, are supported by types that are iterable or implement the __contains__() method. For example, you could (but shouldn’t) do this:

[3, 2, 1].__contains__(1)
# True

in short-circuits, so if your element is at the start of the list, in evaluates faster:

lst = list(range(10001))
%timeit 1 in lst
%timeit 10000 in lst  # Expected to take longer time.

68.9 ns ± 0.613 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
178 µs ± 5.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If you want to do more than just check whether an item is in a list, there are options:

  • list.index can be used to retrieve the index of an item. If that element does not exist, a ValueError is raised.
  • list.count can be used if you want to count the occurrences.

The XY Problem: Have you considered sets?

Ask yourself these questions:

  • do you need to check whether an item is in a list more than once?
  • Is this check done inside a loop, or a function called repeatedly?
  • Are the items you’re storing on your list hashable? IOW, can you call hash on them?

If you answered “yes” to these questions, you should be using a set instead. An in membership test on lists is O(n) time complexity. This means that python has to do a linear scan of your list, visiting each element and comparing it against the search item. If you’re doing this repeatedly, or if the lists are large, this operation will incur an overhead.

set objects, on the other hand, hash their values for constant time membership check. The check is also done using in:

1 in {1, 2, 3} 
# True

'a' not in {'a', 'b', 'c'}
# False

(1, 2) in {('a', 'c'), (1, 2)}
# True

If you’re unfortunate enough that the element you’re searching/not searching for is at the end of your list, python will have scanned the list upto the end. This is evident from the timings below:

l = list(range(100001))
s = set(l)

%timeit 100000 in l
%timeit 100000 in s

2.58 ms ± 58.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
101 ns ± 9.53 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

As a reminder, this is a suitable option as long as the elements you’re storing and looking up are hashable. IOW, they would either have to be immutable types, or objects that implement __hash__.