标签归档:find

Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

问题:Python ElementTree模块:使用方法“ find”,“ findall”时,如何忽略XML文件的命名空间以找到匹配的元素

我想使用“ findall”方法在ElementTree模块中找到源xml文件的某些元素。

但是,源xml文件(test.xml)具有命名空间。我截断一部分xml文件作为示例:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

示例python代码如下:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

尽管它可以工作,但是因为有一个命名空间“ {http://www.test.com}”,但是在每个标签前面添加一个命名空间非常不方便。

使用“ find”,“ findall”等方法时,如何忽略命名空间?

I want to use the method of “findall” to locate some elements of the source xml file in the ElementTree module.

However, the source xml file (test.xml) has namespace. I truncate part of xml file as sample:

<?xml version="1.0" encoding="iso-8859-1"?>
<XML_HEADER xmlns="http://www.test.com">
    <TYPE>Updates</TYPE>
    <DATE>9/26/2012 10:30:34 AM</DATE>
    <COPYRIGHT_NOTICE>All Rights Reserved.</COPYRIGHT_NOTICE>
    <LICENSE>newlicense.htm</LICENSE>
    <DEAL_LEVEL>
        <PAID_OFF>N</PAID_OFF>
        </DEAL_LEVEL>
</XML_HEADER>

The sample python code is below:

from xml.etree import ElementTree as ET
tree = ET.parse(r"test.xml")
el1 = tree.findall("DEAL_LEVEL/PAID_OFF") # Return None
el2 = tree.findall("{http://www.test.com}DEAL_LEVEL/{http://www.test.com}PAID_OFF") # Return <Element '{http://www.test.com}DEAL_LEVEL/PAID_OFF' at 0xb78b90>

Although it can works, because there is a namespace “{http://www.test.com}”, it’s very inconvenient to add a namespace in front of each tag.

How can I ignore the namespace when using the method of “find”, “findall” and so on?


回答 0

最好不要解析XML文档本身,而是先解析它,然后修改结果中的标记。这样,您可以处理多个命名空间和命名空间别名:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

这是基于此处的讨论:http : //bugs.python.org/issue18304

更新: rpartition而不是partition确保你得到的标签名postfix,即使没有命名空间。因此,您可以将其压缩:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

Instead of modifying the XML document itself, it’s best to parse it and then modify the tags in the result. This way you can handle multiple namespaces and namespace aliases:

from io import StringIO  # for Python 2 import from StringIO instead
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    prefix, has_namespace, postfix = el.tag.partition('}')
    if has_namespace:
        el.tag = postfix  # strip all namespaces
root = it.root

This is based on the discussion here: http://bugs.python.org/issue18304

Update: rpartition instead of partition makes sure you get the tag name in postfix even if there is no namespace. Thus you could condense it:

for _, el in it:
    _, _, el.tag = el.tag.rpartition('}') # strip ns

回答 1

如果您在解析前从xml中删除xmlns属性,则树中的每个标记都将没有命名空间。

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

If you remove the xmlns attribute from the xml before parsing it then there won’t be a namespace prepended to each tag in the tree.

import re

xmlstring = re.sub(' xmlns="[^"]+"', '', xmlstring, count=1)

回答 2

到目前为止,答案明确地将命名空间值放在脚本中。对于更通用的解决方案,我宁愿从xml中提取命名空间:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

并在查找方法中使用它:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

The answers so far explicitely put the namespace value in the script. For a more generic solution, I would rather extract the namespace from the xml:

import re
def get_namespace(element):
  m = re.match('\{.*\}', element.tag)
  return m.group(0) if m else ''

And use it in find method:

namespace = get_namespace(tree.getroot())
print tree.find('./{0}parent/{0}version'.format(namespace)).text

回答 3

这是对nonagon答案的扩展,它也剥离了命名空间的属性:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE:已添加,list()以便迭代器可以工作(Python 3所需)

Here’s an extension to nonagon’s answer, which also strips namespaces off attributes:

from StringIO import StringIO
import xml.etree.ElementTree as ET

# instead of ET.fromstring(xml)
it = ET.iterparse(StringIO(xml))
for _, el in it:
    if '}' in el.tag:
        el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
    for at in list(el.attrib.keys()): # strip namespaces of attributes too
        if '}' in at:
            newat = at.split('}', 1)[1]
            el.attrib[newat] = el.attrib[at]
            del el.attrib[at]
root = it.root

UPDATE: added list() so the iterator works (needed for Python 3)


回答 4

改善ericspod的答案:

无需全局更改解析模式,我们可以将其包装在支持with构造的对象中。

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

然后可以按如下方式使用

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

这种方式的优点在于,它不会更改with块之外无关代码的任何行为。我使用了ericspod的版本(在此同时也使用了expat)在不相关的库中出现错误之后,最终创建了该代码。

Improving on the answer by ericspod:

Instead of changing the parse mode globally we can wrap this in an object supporting the with construct.

from xml.parsers import expat

class DisableXmlNamespaces:
    def __enter__(self):
            self.oldcreate = expat.ParserCreate
            expat.ParserCreate = lambda encoding, sep: self.oldcreate(encoding, None)
    def __exit__(self, type, value, traceback):
            expat.ParserCreate = self.oldcreate

This can then be used as follows

import xml.etree.ElementTree as ET
with DisableXmlNamespaces():
     tree = ET.parse("test.xml")

The beauty of this way is that it does not change any behaviour for unrelated code outside the with block. I ended up creating this after getting errors in unrelated libraries after using the version by ericspod which also happened to use expat.


回答 5

您也可以使用优雅的字符串格式构造:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

或者,如果您确定PAID_OFF仅出现在树的一级中:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

You can use the elegant string formatting construct as well:

ns='http://www.test.com'
el2 = tree.findall("{%s}DEAL_LEVEL/{%s}PAID_OFF" %(ns,ns))

or, if you’re sure that PAID_OFF only appears in one level in tree:

el2 = tree.findall(".//{%s}PAID_OFF" % ns)

回答 6

如果不使用ElementTree,则cElementTree可以通过替换来强制Expat忽略命名空间处理ParserCreate()

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree尝试通过调用来使用Expat,ParserCreate()但没有提供不提供命名空间分隔符字符串的选项,以上代码将导致其被忽略,但被警告可能会破坏其他情况。

If you’re using ElementTree and not cElementTree you can force Expat to ignore namespace processing by replacing ParserCreate():

from xml.parsers import expat
oldcreate = expat.ParserCreate
expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

ElementTree tries to use Expat by calling ParserCreate() but provides no option to not provide a namespace separator string, the above code will cause it to be ignore but be warned this could break other things.


回答 7

我为此可能会迟到,但我认为这re.sub不是一个好的解决方案。

但是,该重写xml.parsers.expat不适用于Python 3.x版本,

罪魁祸首是xml/etree/ElementTree.py源代码的底部

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

真是可悲。

解决的办法是先摆脱它。

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

在Python 3.6上测试。

try如果在代码的某处重新加载或导入模块两次而遇到一些奇怪的错误,例如try 语句,则很有用

  • 超过最大递归深度
  • AttributeError:XMLParser

顺便说一句,etree源代码看起来真的很乱。

I might be late for this but I dont think re.sub is a good solution.

However the rewrite xml.parsers.expat does not work for Python 3.x versions,

The main culprit is the xml/etree/ElementTree.py see bottom of the source code

# Import the C accelerators
try:
    # Element is going to be shadowed by the C implementation. We need to keep
    # the Python version of it accessible for some "creative" by external code
    # (see tests)
    _Element_Py = Element

    # Element, SubElement, ParseError, TreeBuilder, XMLParser
    from _elementtree import *
except ImportError:
    pass

Which is kinda sad.

The solution is to get rid of it first.

import _elementtree
try:
    del _elementtree.XMLParser
except AttributeError:
    # in case deleted twice
    pass
else:
    from xml.parsers import expat  # NOQA: F811
    oldcreate = expat.ParserCreate
    expat.ParserCreate = lambda encoding, sep: oldcreate(encoding, None)

Tested on Python 3.6.

Try try statement is useful in case somewhere in your code you reload or import a module twice you get some strange errors like

  • maximum recursion depth exceeded
  • AttributeError: XMLParser

btw damn the etree source code looks really messy.


回答 8

让我们结合nonagon的答案mzjn对一个相关问题的答案

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

使用此功能,我们:

  1. 创建一个迭代器以获取命名空间和已解析的树对象

  2. 遍历创建的迭代器以获取命名空间命令,我们以后可以传入每个命名空间find()findall()调用iMom0的命名空间。

  3. 返回解析树的根元素对象和命名空间。

我认为这是最好的方法,因为无论源XML还是解析后的xml.etree.ElementTree输出都不会受到任何操纵。

我还要感谢Barny的回答,因为它提供了这个难题的重要组成部分(您可以从迭代器获得解析的根)。在此之前,我实际上在应用程序中遍历了两次XML树(一次获取命名空间,第二次获取根)。

Let’s combine nonagon’s answer with mzjn’s answer to a related question:

def parse_xml(xml_path: Path) -> Tuple[ET.Element, Dict[str, str]]:
    xml_iter = ET.iterparse(xml_path, events=["start-ns"])
    xml_namespaces = dict(prefix_namespace_pair for _, prefix_namespace_pair in xml_iter)
    return xml_iter.root, xml_namespaces

Using this function we:

  1. Create an iterator to get both namespaces and a parsed tree object.

  2. Iterate over the created iterator to get the namespaces dict that we can later pass in each find() or findall() call as sugested by iMom0.

  3. Return the parsed tree’s root element object and namespaces.

I think this is the best approach all around as there’s no manipulation either of a source XML or resulting parsed xml.etree.ElementTree output whatsoever involved.

I’d like also to credit barny’s answer with providing an essential piece of this puzzle (that you can get the parsed root from the iterator). Until that I actually traversed XML tree twice in my application (once to get namespaces, second for a root).


如何检查字符串中的字符是否为字母?(Python)

问题:如何检查字符串中的字符是否为字母?(Python)

我知道islowerisupper,但是您可以检查该字符是否是字母?例如:

>>> s = 'abcdefg'
>>> s2 = '123abcd'
>>> s3 = 'abcDEFG'
>>> s[0].islower()
True

>>> s2[0].islower()
False

>>> s3[0].islower()
True

除了做.islower()还是,有什么办法可以问它是否是一个角色.isupper()

I know about islower and isupper, but can you check whether or not that character is a letter? For Example:

>>> s = 'abcdefg'
>>> s2 = '123abcd'
>>> s3 = 'abcDEFG'
>>> s[0].islower()
True

>>> s2[0].islower()
False

>>> s3[0].islower()
True

Is there any way to just ask if it is a character besides doing .islower() or .isupper()?


回答 0

您可以使用str.isalpha()

例如:

s = 'a123b'

for char in s:
    print(char, char.isalpha())

输出:

a True
1 False
2 False
3 False
b True

You can use str.isalpha().

For example:

s = 'a123b'

for char in s:
    print(char, char.isalpha())

Output:

a True
1 False
2 False
3 False
b True

回答 1

str.isalpha()

如果字符串中的所有字符都是字母并且至少包含一个字符,则返回true,否则返回false。字母字符是在Unicode字符数据库中定义为“字母”的那些字符,即,具有一般类别属性为“ Lm”,“ Lt”,“ Lu”,“ Ll”或“ Lo”之一的那些字符。请注意,这与Unicode标准中定义的“字母”属性不同。

在python2.x中:

>>> s = u'a1中文'
>>> for char in s: print char, char.isalpha()
...
a True
1 False
 True
 True
>>> s = 'a1中文'
>>> for char in s: print char, char.isalpha()
...
a True
1 False
 False
 False
 False
 False
 False
 False
>>>

在python3.x中:

>>> s = 'a1中文'
>>> for char in s: print(char, char.isalpha())
...
a True
1 False
 True
 True
>>>

此代码的工作原理:

>>> def is_alpha(word):
...     try:
...         return word.encode('ascii').isalpha()
...     except:
...         return False
...
>>> is_alpha('中国')
False
>>> is_alpha(u'中国')
False
>>>

>>> a = 'a'
>>> b = 'a'
>>> ord(a), ord(b)
(65345, 97)
>>> a.isalpha(), b.isalpha()
(True, True)
>>> is_alpha(a), is_alpha(b)
(False, True)
>>>
str.isalpha()

Return true if all characters in the string are alphabetic and there is at least one character, false otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard.

In python2.x:

>>> s = u'a1中文'
>>> for char in s: print char, char.isalpha()
...
a True
1 False
中 True
文 True
>>> s = 'a1中文'
>>> for char in s: print char, char.isalpha()
...
a True
1 False
� False
� False
� False
� False
� False
� False
>>>

In python3.x:

>>> s = 'a1中文'
>>> for char in s: print(char, char.isalpha())
...
a True
1 False
中 True
文 True
>>>

This code work:

>>> def is_alpha(word):
...     try:
...         return word.encode('ascii').isalpha()
...     except:
...         return False
...
>>> is_alpha('中国')
False
>>> is_alpha(u'中国')
False
>>>

>>> a = 'a'
>>> b = 'a'
>>> ord(a), ord(b)
(65345, 97)
>>> a.isalpha(), b.isalpha()
(True, True)
>>> is_alpha(a), is_alpha(b)
(False, True)
>>>

回答 2

我发现使用函数和基本代码可以实现此目的。这是一个接受字符串并计算大写字母,小写字母以及“其他”数量的代码。其他分类为空格,标点符号,甚至日语和中文字符。

def check(count):

    lowercase = 0
    uppercase = 0
    other = 0

    low = 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
    upper = 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'



    for n in count:
        if n in low:
            lowercase += 1
        elif n in upper:
            uppercase += 1
        else:
            other += 1

    print("There are " + str(lowercase) + " lowercase letters.")
    print("There are " + str(uppercase) + " uppercase letters.")
    print("There are " + str(other) + " other elements to this sentence.")

I found a good way to do this with using a function and basic code. This is a code that accepts a string and counts the number of capital letters, lowercase letters and also ‘other’. Other is classed as a space, punctuation mark or even Japanese and Chinese characters.

def check(count):

    lowercase = 0
    uppercase = 0
    other = 0

    low = 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'
    upper = 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'



    for n in count:
        if n in low:
            lowercase += 1
        elif n in upper:
            uppercase += 1
        else:
            other += 1

    print("There are " + str(lowercase) + " lowercase letters.")
    print("There are " + str(uppercase) + " uppercase letters.")
    print("There are " + str(other) + " other elements to this sentence.")

回答 3

data = "abcdefg hi j 12345"

digits_count = 0
letters_count = 0
others_count = 0

for i in userinput:

    if i.isdigit():
        digits_count += 1 
    elif i.isalpha():
        letters_count += 1
    else:
        others_count += 1

print("Result:")        
print("Letters=", letters_count)
print("Digits=", digits_count)

输出:

Please Enter Letters with Numbers:
abcdefg hi j 12345
Result:
Letters = 10
Digits = 5

通过使用str.isalpha()您可以检查它是否是字母。

data = "abcdefg hi j 12345"

digits_count = 0
letters_count = 0
others_count = 0

for i in userinput:

    if i.isdigit():
        digits_count += 1 
    elif i.isalpha():
        letters_count += 1
    else:
        others_count += 1

print("Result:")        
print("Letters=", letters_count)
print("Digits=", digits_count)

Output:

Please Enter Letters with Numbers:
abcdefg hi j 12345
Result:
Letters = 10
Digits = 5

By using str.isalpha() you can check if it is a letter.


回答 4

这有效:

any(c.isalpha() for c in 'string')

This works:

any(c.isalpha() for c in 'string')

回答 5

这有效:

word = str(input("Enter string:"))
notChar = 0
isChar = 0
for char in word:
    if not char.isalpha():
        notChar += 1
    else:
        isChar += 1
print(isChar, " were letters; ", notChar, " were not letters.")

This works:

word = str(input("Enter string:"))
notChar = 0
isChar = 0
for char in word:
    if not char.isalpha():
        notChar += 1
    else:
        isChar += 1
print(isChar, " were letters; ", notChar, " were not letters.")

在python中处理list.index(可能不存在)的最佳方法?

问题:在python中处理list.index(可能不存在)的最佳方法?

我有看起来像这样的代码:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

好的,所以简化了,但是您知道了。现在thing可能实际上不在列表中,在这种情况下,我想通过-1 thing_index。在其他语言中index(),如果找不到该元素,这就是您期望返回的结果。实际上,它引发了一个错误ValueError

我可以这样做:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

但这感觉很脏,而且我不知道是否ValueError可以出于其他原因而提出该提议。我根据生成器函数提出了以下解决方案,但似乎有点复杂:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

有没有一种更清洁的方法来实现同一目标?假设列表未排序。

I have code which looks something like this:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

ok so that’s simplified but you get the idea. Now thing might not actually be in the list, in which case I want to pass -1 as thing_index. In other languages this is what you’d expect index() to return if it couldn’t find the element. In fact it throws a ValueError.

I could do this:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

But this feels dirty, plus I don’t know if ValueError could be raised for some other reason. I came up with the following solution based on generator functions, but it seems a little complex:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

Is there a cleaner way to achieve the same thing? Let’s assume the list isn’t sorted.


回答 0

使用try-except子句没有任何“肮脏”。这是Python方式。ValueError只会由.index方法引发,因为这是您唯一的代码!

要回答的评论:
在Python,容易请求原谅比获得许可的理念已经非常成熟,并没有 index不会提高这种类型的错误的任何其他问题。并不是我能想到的。

There is nothing “dirty” about using try-except clause. This is the pythonic way. ValueError will be raised by the .index method only, because it’s the only code you have there!

To answer the comment:
In Python, easier to ask forgiveness than to get permission philosophy is well established, and no index will not raise this type of error for any other issues. Not that I can think of any.


回答 1

thing_index = thing_list.index(elem) if elem in thing_list else -1

一条线。简单。没有exceptions。

thing_index = thing_list.index(elem) if elem in thing_list else -1

One line. Simple. No exceptions.


回答 2

dict类型具有一个get函数,如果字典中不存在该键,则to的第二个参数get是它应返回的值。同样setdefaultdict存在,如果键存在,则返回值;否则,它将根据您的默认参数设置值,然后返回您的默认参数。

您可以扩展list类型以具有getindexdefault方法。

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

然后可以这样使用:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

The dict type has a get function, where if the key doesn’t exist in the dictionary, the 2nd argument to get is the value that it should return. Similarly there is setdefault, which returns the value in the dict if the key exists, otherwise it sets the value according to your default parameter and then returns your default parameter.

You could extend the list type to have a getindexdefault method.

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

Which could then be used like:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

回答 3

使用的代码没有错ValueError。如果您想避免exceptions,这里还有另外一种说法:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

There is nothing wrong with your code that uses ValueError. Here’s yet another one-liner if you’d like to avoid exceptions:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

回答 4

这个问题是语言哲学之一。例如,在Java中,一直存在这样一种传统,即异常仅应仅在发生错误的“exceptions情况”中使用,而不是用于流控制。最初,这是出于性能原因,因为Java异常的速度很慢,但是现在这已成为公认的样式。

相反,Python一直使用异常来指示正常的程序流,就像ValueError我们在这里讨论的那样抛出异常。Python风格对此没有任何“污秽”,并且还有更多的来源。一个更常见的例子是StopIteration异常,它是由迭代器的next()方法引发的,以表示没有其他值。

This issue is one of language philosophy. In Java for example there has always been a tradition that exceptions should really only be used in “exceptional circumstances” that is when errors have happened, rather than for flow control. In the beginning this was for performance reasons as Java exceptions were slow but now this has become the accepted style.

In contrast Python has always used exceptions to indicate normal program flow, like raising a ValueError as we are discussing here. There is nothing “dirty” about this in Python style and there are many more where that came from. An even more common example is StopIteration exception which is raised by an iterator‘s next() method to signal that there are no further values.


回答 5

如果您经常这样做,那么最好在辅助功能中将其烘烤掉:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

If you are doing this often then it is better to stove it away in a helper function:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

回答 6

那😃呢?

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

What about this 😃 :

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

回答 7

那这个呢:

otherfunction(thing_collection, thing)

而不是在函数接口中公开一些依赖实现的东西(如列表索引),而是传递集合和东西,然后让其他函数处理“成员资格测试”问题。如果将其他函数编写为与集合类型无关,则可能以以下内容开头:

if thing in thing_collection:
    ... proceed with operation on thing

如果thing_collection是一个列表,元组,集合或字典,它将起作用。

这可能比以下更清楚:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

这是您在其他功能中已经拥有的代码。

What about this:

otherfunction(thing_collection, thing)

Rather than expose something so implementation-dependent like a list index in a function interface, pass the collection and the thing and let otherfunction deal with the “test for membership” issues. If otherfunction is written to be collection-type-agnostic, then it would probably start with:

if thing in thing_collection:
    ... proceed with operation on thing

which will work if thing_collection is a list, tuple, set, or dict.

This is possibly clearer than:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

which is the code you already have in otherfunction.


回答 8

像这样:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

What about like this:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

回答 9

列表中的“ .index()”方法存在相同的问题。我对它引发异常的事实没有任何疑问,但是我强烈不同意它是一个非描述性ValueError的事实。我可以理解是否会是IndexError。

我可以看到为什么返回“ -1”也是一个问题,因为它是Python中的有效索引。但实际上,我从不期望“ .index()”方法返回负数。

这是一行代码(好吧,这是一条相当长的线…),仅遍历列表一次,如果找不到该项目,则返回“无”。如果您愿意,将其重写为返回-1将是微不足道的。

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

如何使用:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

I have the same issue with the “.index()” method on lists. I have no issue with the fact that it throws an exception but I strongly disagree with the fact that it’s a non-descriptive ValueError. I could understand if it would’ve been an IndexError, though.

I can see why returning “-1” would be an issue too because it’s a valid index in Python. But realistically, I never expect a “.index()” method to return a negative number.

Here goes a one liner (ok, it’s a rather long line …), goes through the list exactly once and returns “None” if the item isn’t found. It would be trivial to rewrite it to return -1, should you so desire.

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

How to use:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

回答 10

我不知道为什么你应该认为它很脏…因为异常?如果您想要一个内衬,则为:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

但是我建议不要使用它。我认为Ross Rogers解决方案是最好的解决方案,请使用对象封装您的喜好行为,不要尝试以牺牲可读性为代价将语言推向极限。

I don’t know why you should think it is dirty… because of the exception? if you want a oneliner, here it is:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

but i would advise against using it; I think Ross Rogers solution is the best, use an object to encapsulate your desiderd behaviour, don’t try pushing the language to its limits at the cost of readability.


回答 11

我建议:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

I’d suggest:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

脾气暴躁:快速找到价值的第一指标

问题:脾气暴躁:快速找到价值的第一指标

如何找到Numpy数组中数字首次出现的索引?速度对我很重要。我对以下答案不感兴趣,因为它们会扫描整个数组,并且在发现第一个匹配项时不会停止:

itemindex = numpy.where(array==item)[0][0]
nonzero(array == item)[0][0]

注1:这个问题的答案似乎都不相关。是否有一个Numpy函数返回数组中某个对象的第一个索引?

注意2:使用C编译方法优于Python循环。

How can I find the index of the first occurrence of a number in a Numpy array? Speed is important to me. I am not interested in the following answers because they scan the whole array and don’t stop when they find the first occurrence:

itemindex = numpy.where(array==item)[0][0]
nonzero(array == item)[0][0]

Note 1: none of the answers from that question seem relevant Is there a Numpy function to return the first index of something in an array?

Note 2: using a C-compiled method is preferred to a Python loop.


回答 0

为此,计划在Numpy 2.0.0中进行功能请求:https : //github.com/numpy/numpy/issues/2269

There is a feature request for this scheduled for Numpy 2.0.0: https://github.com/numpy/numpy/issues/2269


回答 1

尽管对您来说太晚了,但是供以后参考:在numpy实现之前,使用numba(1)是最简单的方法。如果您使用anaconda python发行版,则应该已经安装了它。该代码将被编译,因此将很快。

@jit(nopython=True)
def find_first(item, vec):
    """return the index of the first occurence of item in vec"""
    for i in xrange(len(vec)):
        if item == vec[i]:
            return i
    return -1

然后:

>>> a = array([1,7,8,32])
>>> find_first(8,a)
2

Although it is way too late for you, but for future reference: Using numba (1) is the easiest way until numpy implements it. If you use anaconda python distribution it should already be installed. The code will be compiled so it will be fast.

@jit(nopython=True)
def find_first(item, vec):
    """return the index of the first occurence of item in vec"""
    for i in xrange(len(vec)):
        if item == vec[i]:
            return i
    return -1

and then:

>>> a = array([1,7,8,32])
>>> find_first(8,a)
2

回答 2

我已经为几种方法设定了基准:

  • argwhere
  • nonzero 如问题
  • .tostring() 就像@Rob Reilink的答案一样
  • python循环
  • Fortran循环

Python中的Fortran代码是可用的。我跳过了那些没有希望的事情,例如转换为列表。

结果以对数刻度表示。X轴是针的位置(查找针是否在阵列的下方需要更长的时间);最后一个值是不在数组中的指针。Y轴是找到它的时间。

该阵列具有100万个元素,并且运行了100次测试。结果仍然有些波动,但是定性趋势很明显:Python和f2py在第一个元素退出,因此它们的缩放比例不同。如果针头不在前1%,Python会变得太慢,而f2py速度会很快(但是您需要对其进行编译)。

总而言之,f2py是最快的解决方案,尤其是当针头出现得很早时。

它不是内置在其中的,但实际上只是2分钟的工作。将此添加到名为search.f90

subroutine find_first(needle, haystack, haystack_length, index)
    implicit none
    integer, intent(in) :: needle
    integer, intent(in) :: haystack_length
    integer, intent(in), dimension(haystack_length) :: haystack
!f2py intent(inplace) haystack
    integer, intent(out) :: index
    integer :: k
    index = -1
    do k = 1, haystack_length
        if (haystack(k)==needle) then
            index = k - 1
            exit
        endif
    enddo
end

如果您要查找的不是integer,请更改类型。然后使用以下命令进行编译:

f2py -c -m search search.f90

之后您可以执行(从Python):

import search
print(search.find_first.__doc__)
a = search.find_first(your_int_needle, your_int_array)

I’ve made a benchmark for several methods:

  • argwhere
  • nonzero as in the question
  • .tostring() as in @Rob Reilink’s answer
  • python loop
  • Fortran loop

The Python and Fortran code are available. I skipped the unpromising ones like converting to a list.

The results on log scale. X-axis is the position of the needle (it takes longer to find if it’s further down the array); last value is a needle that’s not in the array. Y-axis is the time to find it.

The array had 1 million elements and tests were run 100 times. Results still fluctuate a bit, but the qualitative trend is clear: Python and f2py quit at the first element so they scale differently. Python gets too slow if the needle is not in the first 1%, whereas f2py is fast (but you need to compile it).

To summarize, f2py is the fastest solution, especially if the needle appears fairly early.

It’s not built in which is annoying, but it’s really just 2 minutes of work. Add this to a file called search.f90:

subroutine find_first(needle, haystack, haystack_length, index)
    implicit none
    integer, intent(in) :: needle
    integer, intent(in) :: haystack_length
    integer, intent(in), dimension(haystack_length) :: haystack
!f2py intent(inplace) haystack
    integer, intent(out) :: index
    integer :: k
    index = -1
    do k = 1, haystack_length
        if (haystack(k)==needle) then
            index = k - 1
            exit
        endif
    enddo
end

If you’re looking for something other than integer, just change the type. Then compile using:

f2py -c -m search search.f90

after which you can do (from Python):

import search
print(search.find_first.__doc__)
a = search.find_first(your_int_needle, your_int_array)

回答 3

您可以使用array.tostring(),然后使用find()方法将布尔数组转换为Python字符串:

(array==item).tostring().find('\x01')

但是,这确实涉及到复制数据,因为Python字符串需要是不可变的。一个优点是您还可以通过查找以下内容来搜索例如上升沿\x00\x01

You can convert a boolean array to a Python string using array.tostring() and then using the find() method:

(array==item).tostring().find('\x01')

This does involve copying the data, though, since Python strings need to be immutable. An advantage is that you can also search for e.g. a rising edge by finding \x00\x01


回答 4

在排序数组的情况下np.searchsorted工作。

In case of sorted arrays np.searchsorted works.


回答 5

我认为您遇到了一个问题,在此问题上,可以使用其他方法和一些先验知识来真正帮助您。您有X概率在数据的前Y个百分比中找到答案的情况。希望将问题分解为幸运的事物,然后在python中使用嵌套列表理解或类似的方法进行处理。

使用ctypes编写C函数来实现这种蛮力也并不难。

我一起砍的C代码(index.c):

long index(long val, long *data, long length){
    long ans, i;
    for(i=0;i<length;i++){
        if (data[i] == val)
            return(i);
    }
    return(-999);
}

和python:

# to compile (mac)
# gcc -shared index.c -o index.dylib
import ctypes
lib = ctypes.CDLL('index.dylib')
lib.index.restype = ctypes.c_long
lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long)

import numpy as np
np.random.seed(8675309)
a = np.random.random_integers(0, 100, 10000)
print lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))

我得到92。

将python包装成适当的函数,然后就可以了。

对于这个种子,C版本要快很多(〜20倍)(警告我使用timeit不好)

import timeit
t = timeit.Timer('np.where(a==57)[0][0]', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000)')
t.timeit(100)/100
# 0.09761879920959472
t2 = timeit.Timer('lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000); import ctypes; lib = ctypes.CDLL("index.dylib"); lib.index.restype = ctypes.c_long; lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long) ')
t2.timeit(100)/100
# 0.005288000106811523

I think you have hit a problem where a different method and some a priori knowledge of the array would really help. The kind of thing where you have a X probability of finding your answer in the first Y percent of the data. The splitting up the problem with the hope of getting lucky then doing this in python with a nested list comprehension or something.

Writing a C function to do this brute force isn’t too hard using ctypes either.

The C code I hacked together (index.c):

long index(long val, long *data, long length){
    long ans, i;
    for(i=0;i<length;i++){
        if (data[i] == val)
            return(i);
    }
    return(-999);
}

and the python:

# to compile (mac)
# gcc -shared index.c -o index.dylib
import ctypes
lib = ctypes.CDLL('index.dylib')
lib.index.restype = ctypes.c_long
lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long)

import numpy as np
np.random.seed(8675309)
a = np.random.random_integers(0, 100, 10000)
print lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))

and I get 92.

Wrap up the python into a proper function and there you go.

The C version is a lot (~20x) faster for this seed (warning I am not good with timeit)

import timeit
t = timeit.Timer('np.where(a==57)[0][0]', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000)')
t.timeit(100)/100
# 0.09761879920959472
t2 = timeit.Timer('lib.index(57, a.ctypes.data_as(ctypes.POINTER(ctypes.c_long)), len(a))', 'import numpy as np; np.random.seed(1); a = np.random.random_integers(0, 1000000, 10000000); import ctypes; lib = ctypes.CDLL("index.dylib"); lib.index.restype = ctypes.c_long; lib.index.argtypes = (ctypes.c_long, ctypes.POINTER(ctypes.c_long), ctypes.c_long) ')
t2.timeit(100)/100
# 0.005288000106811523

回答 6

@tal已经提供了numba查找第一个索引的函数,但是仅适用于一维数组。使用,np.ndenumerate您还可以在任意维数组中找到第一个索引:

from numba import njit
import numpy as np

@njit
def index(array, item):
    for idx, val in np.ndenumerate(array):
        if val == item:
            return idx
    return None

样品盒:

>>> arr = np.arange(9).reshape(3,3)
>>> index(arr, 3)
(1, 0)

时间显示,其性能与tals解决方案相似:

arr = np.arange(100000)
%timeit index(arr, 5)           # 1000000 loops, best of 3: 1.88 µs per loop
%timeit find_first(5, arr)      # 1000000 loops, best of 3: 1.7 µs per loop

%timeit index(arr, 99999)       # 10000 loops, best of 3: 118 µs per loop
%timeit find_first(99999, arr)  # 10000 loops, best of 3: 96 µs per loop

@tal already presented a numba function to find the first index but that only works for 1D arrays. With np.ndenumerate you can also find the first index in an arbitarly dimensional array:

from numba import njit
import numpy as np

@njit
def index(array, item):
    for idx, val in np.ndenumerate(array):
        if val == item:
            return idx
    return None

Sample case:

>>> arr = np.arange(9).reshape(3,3)
>>> index(arr, 3)
(1, 0)

Timings show that it’s similar in performance to tals solution:

arr = np.arange(100000)
%timeit index(arr, 5)           # 1000000 loops, best of 3: 1.88 µs per loop
%timeit find_first(5, arr)      # 1000000 loops, best of 3: 1.7 µs per loop

%timeit index(arr, 99999)       # 10000 loops, best of 3: 118 µs per loop
%timeit find_first(99999, arr)  # 10000 loops, best of 3: 96 µs per loop

回答 7

如果您的列表已排序,则可以使用“ bisect”包非常快速地搜索索引。它是O(log(n))而不是O(n)。

bisect.bisect(a, x)

在数组a中找到x,在排序情况下肯定比通过所有第一个元素的C例程更快(对于足够长的列表)。

有时候很高兴知道。

If your list is sorted, you can achieve very quick search of index with the ‘bisect’ package. It’s O(log(n)) instead of O(n).

bisect.bisect(a, x)

finds x in the array a, definitely quicker in the sorted case than any C-routine going through all the first elements (for long enough lists).

It’s good to know sometimes.


回答 8

据我所知,布尔数组上只有np.any和np.all是短路的。

在您的情况下,numpy必须遍历整个数组两次,一次创建布尔条件,第二次查找索引。

在这种情况下,我的建议是使用cython。我认为为这种情况调整示例应该很容易,尤其是对于不同的dtype和形状不需要很大的灵活性的情况下。

As far as I know only np.any and np.all on boolean arrays are short-circuited.

In your case, numpy has to go through the entire array twice, once to create the boolean condition and a second time to find the indices.

My recommendation in this case would be to use cython. I think it should be easy to adjust an example for this case, especially if you don’t need much flexibility for different dtypes and shapes.


回答 9

我的工作需要这个,所以我自学了Python和Numpy的C接口并编写了自己的C。 http://pastebin.com/GtcXuLyd它仅适用于一维数组,但适用于大多数数据类型(int,float或字符串),并且测试表明它再次比纯Python-中预期的方法快约20倍。麻木

I needed this for my job so I taught myself Python and Numpy’s C interface and wrote my own. http://pastebin.com/GtcXuLyd It’s only for 1-D arrays, but works for most data types (int, float, or strings) and testing has shown it is again about 20 times faster than the expected approach in pure Python-numpy.


回答 10

通过分块处理数组,可以在纯numpy中有效解决此问题:

def find_first(x):
    idx, step = 0, 32
    while idx < x.size:
        nz, = x[idx: idx + step].nonzero()
        if len(nz): # found non-zero, return it
            return nz[0] + idx
        # move to the next chunk, increase step
        idx += step
        step = min(9600, step + step // 2)
    return -1

数组以size的块进行处理step。的step时间越长,步骤是,更快的正在处理归零阵列(最坏情况)的。它越小,开始处理非零数组的速度就越快。诀窍是从小开始,step然后按指数增加。此外,由于收益有限,无需将其增加到某个阈值以上。

我已经将纯ndarary.nonzero和numba解决方案与1000万个浮点数组进行了比较。

import numpy as np
from numba import jit
from timeit import timeit

def find_first(x):
    idx, step = 0, 32
    while idx < x.size:
        nz, = x[idx: idx + step].nonzero()
        if len(nz):
            return nz[0] + idx
        idx += step
        step = min(9600, step + step // 2)
    return -1

@jit(nopython=True)
def find_first_numba(vec):
    """return the index of the first occurence of item in vec"""
    for i in range(len(vec)):
        if vec[i]:
            return i
    return -1


SIZE = 10_000_000
# First only
x = np.empty(SIZE)

find_first_numba(x[:10])

print('---- FIRST ----')
x[:] = 0
x[0] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=1000), 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=1000), 'ms')

print('---- LAST ----')
x[:] = 0
x[-1] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- NONE ----')
x[:] = 0
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- ALL ----')
x[:] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

结果在我的机器上:

---- FIRST ----
ndarray.nonzero 54.733994480002366 ms
find_first 0.0013148509997336078 ms
find_first_numba 0.0002839310000126716 ms
---- LAST ----
ndarray.nonzero 54.56336712999928 ms
find_first 25.38929685000312 ms
find_first_numba 8.022820680002951 ms
---- NONE ----
ndarray.nonzero 24.13432420999925 ms
find_first 25.345200140000088 ms
find_first_numba 8.154927100003988 ms
---- ALL ----
ndarray.nonzero 55.753537260002304 ms
find_first 0.0014760300018679118 ms
find_first_numba 0.0004358099977253005 ms

ndarray.nonzero是绝对宽松的。在最佳情况下,numba解决方案的速度提高了约5倍。在最坏的情况下,速度快大约3倍。

This problem can be effectively solved in pure numpy by processing the array in chunks:

def find_first(x):
    idx, step = 0, 32
    while idx < x.size:
        nz, = x[idx: idx + step].nonzero()
        if len(nz): # found non-zero, return it
            return nz[0] + idx
        # move to the next chunk, increase step
        idx += step
        step = min(9600, step + step // 2)
    return -1

The array is processed in chunk of size step. The step longer the step is, the faster is processing of zeroed-array (worst case). The smaller it is, the faster processing of array with non-zero at the start. The trick is to start with a small step and increase it exponentially. Moreover, there is no need to increment it above some threshold due to limited benefits.

I’ve compared the solution with pure ndarary.nonzero and numba solution against 10 million array of floats.

import numpy as np
from numba import jit
from timeit import timeit

def find_first(x):
    idx, step = 0, 32
    while idx < x.size:
        nz, = x[idx: idx + step].nonzero()
        if len(nz):
            return nz[0] + idx
        idx += step
        step = min(9600, step + step // 2)
    return -1

@jit(nopython=True)
def find_first_numba(vec):
    """return the index of the first occurence of item in vec"""
    for i in range(len(vec)):
        if vec[i]:
            return i
    return -1


SIZE = 10_000_000
# First only
x = np.empty(SIZE)

find_first_numba(x[:10])

print('---- FIRST ----')
x[:] = 0
x[0] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=1000), 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=1000), 'ms')

print('---- LAST ----')
x[:] = 0
x[-1] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- NONE ----')
x[:] = 0
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- ALL ----')
x[:] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

And results on my machine:

---- FIRST ----
ndarray.nonzero 54.733994480002366 ms
find_first 0.0013148509997336078 ms
find_first_numba 0.0002839310000126716 ms
---- LAST ----
ndarray.nonzero 54.56336712999928 ms
find_first 25.38929685000312 ms
find_first_numba 8.022820680002951 ms
---- NONE ----
ndarray.nonzero 24.13432420999925 ms
find_first 25.345200140000088 ms
find_first_numba 8.154927100003988 ms
---- ALL ----
ndarray.nonzero 55.753537260002304 ms
find_first 0.0014760300018679118 ms
find_first_numba 0.0004358099977253005 ms

Pure ndarray.nonzero is definite looser. The numba solution is circa 5 times faster for the best case. It is circa 3 times faster in the worst case.


回答 11

如果您正在寻找第一个非零元素,则可以使用以下技巧:

idx = x.view(bool).argmax() // x.itemsize
idx = idx if x[idx] else -1

这是一个非常快速的 “ numpy-pure”解决方案,但在下面讨论的某些情况下失败。

该解决方案利用了以下事实:数字类型的几乎所有零表示都由0字节组成。它也适用于numpy bool。在最新版本的numpy中,argmax()函数在处理bool类型时使用短路逻辑。的大小bool为1个字节。

因此,需要:

  • 创建数组的视图为bool。没有创建副本
  • 用于argmax()使用短路逻辑查找第一个非零字节
  • 通过将偏移量的整数除(运算符//)除以以字节(x.itemsize)表示的单个元素的大小,来重新计算此字节与第一个非零元素的索引的偏移量
  • 检查是否x[idx]实际上非零,以识别不存在非零时的情况

我已经针对numba解决方案建立了一些基准,并进行了构建np.nonzero

import numpy as np
from numba import jit
from timeit import timeit

def find_first(x):
    idx = x.view(bool).argmax() // x.itemsize
    return idx if x[idx] else -1

@jit(nopython=True)
def find_first_numba(vec):
    """return the index of the first occurence of item in vec"""
    for i in range(len(vec)):
        if vec[i]:
            return i
    return -1


SIZE = 10_000_000
# First only
x = np.empty(SIZE)

find_first_numba(x[:10])

print('---- FIRST ----')
x[:] = 0
x[0] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=1000), 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=1000), 'ms')

print('---- LAST ----')
x[:] = 0
x[-1] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- NONE ----')
x[:] = 0
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- ALL ----')
x[:] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

我的机器上的结果是:

---- FIRST ----
ndarray.nonzero 57.63976670001284 ms
find_first 0.0010841979965334758 ms
find_first_numba 0.0002308919938514009 ms
---- LAST ----
ndarray.nonzero 58.96685277999495 ms
find_first 5.923203580023255 ms
find_first_numba 8.762269750004634 ms
---- NONE ----
ndarray.nonzero 25.13398071998381 ms
find_first 5.924289370013867 ms
find_first_numba 8.810063839919167 ms
---- ALL ----
ndarray.nonzero 55.181210660084616 ms
find_first 0.001246920000994578 ms
find_first_numba 0.00028766007744707167 ms

该解决方案比numba 33%,并且是“ numpy-pure”。

缺点:

  • 不适用于numpy可接受的类型,例如 object
  • 因偶尔出现在floatdouble计算中的负零而失败

If you are looking for the first non-zero element you can use a following hack:

idx = x.view(bool).argmax() // x.itemsize
idx = idx if x[idx] else -1

It is a very fast “numpy-pure” solution but it fails for some cases discussed below.

The solution takes advantage from the fact that pretty much all representation of zero for numeric types consists of 0 bytes. It applies to numpy’s bool as well. In recent versions of numpy, argmax() function uses short-circuit logic when processing the bool type. The size of bool is 1 byte.

So one needs to:

  • create a view of the array as bool. No copy is created
  • use argmax() to find the first non-zero byte using short-circuit logic
  • recalculate the offset of this byte to the index of the first non-zero element by integer division (operator //) of the offset by a size of a single element expressed in bytes (x.itemsize)
  • check if x[idx] is actually non-zero to identify the case when no non-zero is present

I’ve made some benchmark against numba solution and build it np.nonzero.

import numpy as np
from numba import jit
from timeit import timeit

def find_first(x):
    idx = x.view(bool).argmax() // x.itemsize
    return idx if x[idx] else -1

@jit(nopython=True)
def find_first_numba(vec):
    """return the index of the first occurence of item in vec"""
    for i in range(len(vec)):
        if vec[i]:
            return i
    return -1


SIZE = 10_000_000
# First only
x = np.empty(SIZE)

find_first_numba(x[:10])

print('---- FIRST ----')
x[:] = 0
x[0] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=1000), 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=1000), 'ms')

print('---- LAST ----')
x[:] = 0
x[-1] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- NONE ----')
x[:] = 0
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

print('---- ALL ----')
x[:] = 1
print('ndarray.nonzero', timeit(lambda: x.nonzero()[0][0], number=100)*10, 'ms')
print('find_first', timeit(lambda: find_first(x), number=100)*10, 'ms')
print('find_first_numba', timeit(lambda: find_first_numba(x), number=100)*10, 'ms')

The result on my machine are:

---- FIRST ----
ndarray.nonzero 57.63976670001284 ms
find_first 0.0010841979965334758 ms
find_first_numba 0.0002308919938514009 ms
---- LAST ----
ndarray.nonzero 58.96685277999495 ms
find_first 5.923203580023255 ms
find_first_numba 8.762269750004634 ms
---- NONE ----
ndarray.nonzero 25.13398071998381 ms
find_first 5.924289370013867 ms
find_first_numba 8.810063839919167 ms
---- ALL ----
ndarray.nonzero 55.181210660084616 ms
find_first 0.001246920000994578 ms
find_first_numba 0.00028766007744707167 ms

The solution is 33% faster than numba and it is “numpy-pure”.

The disadvantages:

  • does not work for numpy acceptable types like object
  • fails for negative zero that occasionally appears in float or double computations

回答 12

作为matlab的长期用户,我很久以来一直在寻找有效解决此问题的方法。最后,在此讨论一个命题动机线程我试图拿出与正在实施类似建议什么的API的解决方案在这里,配套目前只有一维数组。

你会这样使用

import numpy as np
import utils_find_1st as utf1st
array = np.arange(100000)
item = 1000
ind = utf1st.find_1st(array, item, utf1st.cmp_larger_eq)

支持的条件运算符为:cmp_equal,cmp_not_equal,cmp_larger,cmp_smaller,cmp_larger_eq,cmp_smaller_eq。为了提高效率,扩展名用c编写。

您可以在此处找到来源,基准和其他详细信息:

https://pypi.python.org/pypi?name=py_find_1st&:action=display

为了供我们团队使用(在Linux和MacOS上使用anaconda),我制作了一个anaconda安装程序以简化安装,您可以按此处所述使用它

https://anaconda.org/roebel/py_find_1st

As a longtime matlab user I a have been searching for an efficient solution to this problem for quite a while. Finally, motivated by discussions a propositions in this thread I have tried to come up with a solution that is implementing an API similar to what was suggested here, supporting for the moment only 1D arrays.

You would use it like this

import numpy as np
import utils_find_1st as utf1st
array = np.arange(100000)
item = 1000
ind = utf1st.find_1st(array, item, utf1st.cmp_larger_eq)

The condition operators supported are: cmp_equal, cmp_not_equal, cmp_larger, cmp_smaller, cmp_larger_eq, cmp_smaller_eq. For efficiency the extension is written in c.

You find the source, benchmarks and other details here:

https://pypi.python.org/pypi?name=py_find_1st&:action=display

for the use in our team (anaconda on linux and macos) I have made an anaconda installer that simplifies installation, you may use it as described here

https://anaconda.org/roebel/py_find_1st


回答 13

请注意,如果您要执行一系列搜索,则如果搜索维不够大,则通过执行诸如转换为字符串之类的聪明操作而获得的性能提升可能会在外循环中丢失。了解使用上述提议的字符串转换技巧的find1和沿着内轴使用argmax的find2的性能(以及进行调整以确保不匹配返回-1的性能)

import numpy,time
def find1(arr,value):
    return (arr==value).tostring().find('\x01')

def find2(arr,value): #find value over inner most axis, and return array of indices to the match
    b = arr==value
    return b.argmax(axis=-1) - ~(b.any())


for size in [(1,100000000),(10000,10000),(1000000,100),(10000000,10)]:
    print(size)
    values = numpy.random.choice([0,0,0,0,0,0,0,1],size=size)
    v = values>0

    t=time.time()
    numpy.apply_along_axis(find1,-1,v,1)
    print('find1',time.time()-t)

    t=time.time()
    find2(v,1)
    print('find2',time.time()-t)

输出

(1, 100000000)
('find1', 0.25300002098083496)
('find2', 0.2780001163482666)
(10000, 10000)
('find1', 0.46200013160705566)
('find2', 0.27300000190734863)
(1000000, 100)
('find1', 20.98099994659424)
('find2', 0.3040001392364502)
(10000000, 10)
('find1', 206.7590000629425)
('find2', 0.4830000400543213)

也就是说,用C语言编写的查找至少比这两种方法都快一点

Just a note that if you are doing a sequence of searches, the performance gain from doing something clever like converting to string, might be lost in the outer loop if the search dimension isn’t big enough. See how the performance of iterating find1 that uses the string conversion trick proposed above and find2 that uses argmax along the inner axis (plus an adjustment to ensure a non-match returns as -1)

import numpy,time
def find1(arr,value):
    return (arr==value).tostring().find('\x01')

def find2(arr,value): #find value over inner most axis, and return array of indices to the match
    b = arr==value
    return b.argmax(axis=-1) - ~(b.any())


for size in [(1,100000000),(10000,10000),(1000000,100),(10000000,10)]:
    print(size)
    values = numpy.random.choice([0,0,0,0,0,0,0,1],size=size)
    v = values>0

    t=time.time()
    numpy.apply_along_axis(find1,-1,v,1)
    print('find1',time.time()-t)

    t=time.time()
    find2(v,1)
    print('find2',time.time()-t)

outputs

(1, 100000000)
('find1', 0.25300002098083496)
('find2', 0.2780001163482666)
(10000, 10000)
('find1', 0.46200013160705566)
('find2', 0.27300000190734863)
(1000000, 100)
('find1', 20.98099994659424)
('find2', 0.3040001392364502)
(10000000, 10)
('find1', 206.7590000629425)
('find2', 0.4830000400543213)

That said, a find written in C would be at least a little faster than either of these approaches


回答 14

这个怎么样

import numpy as np
np.amin(np.where(array==item))

how about this

import numpy as np
np.amin(np.where(array==item))

回答 15

您可以将数组隐藏为list并使用其index()方法:

i = list(array).index(item)

据我所知,这是C编译方法。

You can covert your array into a list and use it’s index() method:

i = list(array).index(item)

As far as I’m aware, this is a C compiled method.


如何检查字典中是否存在值(Python)

问题:如何检查字典中是否存在值(Python)

我在python中有以下字典:

d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}

我需要一种方法来查找此字典中是否存在诸如“一个”或“两个”的值。

例如,如果我想知道索引“ 1”是否存在,我只需要键入:

"1" in d

然后python会告诉我这是对还是错,但是我需要做同样的事情,只是要找到一个值是否存在。

I have the following dictionary in python:

d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}

I need a way to find if a value such as “one” or “two” exists in this dictionary.

For example, if I wanted to know if the index “1” existed I would simply have to type:

"1" in d

And then python would tell me if that is true or false, however I need to do that same exact thing except to find if a value exists.


回答 0

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> 'one' in d.values()
True

出于好奇,一些比较时机:

>>> T(lambda : 'one' in d.itervalues()).repeat()
[0.28107285499572754, 0.29107213020324707, 0.27941107749938965]
>>> T(lambda : 'one' in d.values()).repeat()
[0.38303399085998535, 0.37257885932922363, 0.37096405029296875]
>>> T(lambda : 'one' in d.viewvalues()).repeat()
[0.32004380226135254, 0.31716084480285645, 0.3171098232269287]

编辑:并且,如果您想知道为什么…原因是上述每种方法返回的对象类型不同,则该对象可能适合或可能不适合于查找操作:

>>> type(d.viewvalues())
<type 'dict_values'>
>>> type(d.values())
<type 'list'>
>>> type(d.itervalues())
<type 'dictionary-valueiterator'>

EDIT2:根据注释中的请求…

>>> T(lambda : 'four' in d.itervalues()).repeat()
[0.41178202629089355, 0.3959040641784668, 0.3970959186553955]
>>> T(lambda : 'four' in d.values()).repeat()
[0.4631338119506836, 0.43541407585144043, 0.4359898567199707]
>>> T(lambda : 'four' in d.viewvalues()).repeat()
[0.43414998054504395, 0.4213531017303467, 0.41684913635253906]
>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> 'one' in d.values()
True

Out of curiosity, some comparative timing:

>>> T(lambda : 'one' in d.itervalues()).repeat()
[0.28107285499572754, 0.29107213020324707, 0.27941107749938965]
>>> T(lambda : 'one' in d.values()).repeat()
[0.38303399085998535, 0.37257885932922363, 0.37096405029296875]
>>> T(lambda : 'one' in d.viewvalues()).repeat()
[0.32004380226135254, 0.31716084480285645, 0.3171098232269287]

EDIT: And in case you wonder why… the reason is that each of the above returns a different type of object, which may or may not be well suited for lookup operations:

>>> type(d.viewvalues())
<type 'dict_values'>
>>> type(d.values())
<type 'list'>
>>> type(d.itervalues())
<type 'dictionary-valueiterator'>

EDIT2: As per request in comments…

>>> T(lambda : 'four' in d.itervalues()).repeat()
[0.41178202629089355, 0.3959040641784668, 0.3970959186553955]
>>> T(lambda : 'four' in d.values()).repeat()
[0.4631338119506836, 0.43541407585144043, 0.4359898567199707]
>>> T(lambda : 'four' in d.viewvalues()).repeat()
[0.43414998054504395, 0.4213531017303467, 0.41684913635253906]

回答 1

您可以使用

"one" in d.itervalues()

测试是否"one"在字典的值中。

You can use

"one" in d.itervalues()

to test if "one" is among the values of your dictionary.


回答 2

Python字典具有get(key)函数

>>> d.get(key)

例如,

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> d.get('3')
'three'
>>> d.get('10')
none

如果您的密钥不存在,将返回none值。

foo = d[key] # raise error if key doesn't exist
foo = d.get(key) # return none if key doesn't exist

与小于3.0和大于5.0的版本相关的内容。。

Python dictionary has get(key) funcion

>>> d.get(key)

For Example,

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> d.get('3')
'three'
>>> d.get('10')
none

If your key does’nt exist, will return none value.

foo = d[key] # raise error if key doesn't exist
foo = d.get(key) # return none if key doesn't exist

Content relevant to versions less than 3.0 and greater than 5.0. .


回答 3

使用字典视图:

if x in d.viewvalues():
    dosomething()..

Use dictionary views:

if x in d.viewvalues():
    dosomething()..

回答 4

不同类型检查值是否存在

d = {"key1":"value1", "key2":"value2"}
"value10" in d.values() 
>> False

如果值列表怎么办

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6']}
"value4" in [x for v in test.values() for x in v]
>>True

如果带有字符串值的值列表怎么办

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6'], 'key5':'value10'}
values = test.values()
"value10" in [x for v in test.values() for x in v] or 'value10' in values
>>True

Different types to check the values exists

d = {"key1":"value1", "key2":"value2"}
"value10" in d.values() 
>> False

What if list of values

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6']}
"value4" in [x for v in test.values() for x in v]
>>True

What if list of values with string values

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6'], 'key5':'value10'}
values = test.values()
"value10" in [x for v in test.values() for x in v] or 'value10' in values
>>True

回答 5

在Python 3中,您可以使用values()字典的功能。它返回值的视图对象。这又可以传递给iter返回迭代器对象的函数。可以使用来检查迭代器in,如下所示:

'one' in iter(d.values())

或者您可以直接使用视图对象,因为它类似于列表

'one' in d.values()

In Python 3 you can use the values() function of the dictionary. It returns a view object of the values. This, in turn, can be passed to the iter function which returns an iterator object. The iterator can be checked using in, like this,

'one' in iter(d.values())

Or you can use the view object directly since it is similar to a list

'one' in d.values()

Python:在列表中查找

问题:Python:在列表中查找

我遇到了这个:

item = someSortOfSelection()
if item in myList:
    doMySpecialFunction(item)

但有时它不适用于我的所有物品,好像它们在列表中没有被识别(当它是字符串列表时)。

这是在列表中查找项目的最“ pythonic”方式if x in l:吗?

I have come across this:

item = someSortOfSelection()
if item in myList:
    doMySpecialFunction(item)

but sometimes it does not work with all my items, as if they weren’t recognized in the list (when it’s a list of string).

Is this the most ‘pythonic’ way of finding an item in a list: if x in l:?


回答 0

关于您的第一个问题:该代码非常好,并且如果与item其中的一个元素相等就应该可以工作myList。也许您尝试找到与其中一项不完全匹配的字符串,或者您使用的浮点值会导致不正确。

至于第二个问题:如果“查找”列表中的内容,实际上有几种可能的方法。

检查里面是否有东西

这是您描述的用例:检查列表中是否包含某些内容。如您所知,您可以使用in运算符:

3 in [1, 2, 3] # => True

过滤集合

即,找到满足特定条件的序列中的所有元素。您可以为此使用列表推导或生成器表达式:

matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)

后者将返回一个生成器,您可以将其想象为一种懒惰列表,该列表只有在您对其进行迭代时才会被构建。顺便说一句,第一个完全等于

matches = filter(fulfills_some_condition, lst)

在Python 2中。在这里您可以看到工作中的高阶函数。在Python 3中,filter不返回列表,而是返回类似生成器的对象。

寻找第一次出现

如果您只想匹配条件的第一件事(但是您还不知道它是什么),那么可以使用for循环(可能也使用该else子句,这并不是很知名)。您也可以使用

next(x for x in lst if ...)

StopIteration如果没有找到任何匹配项,则将返回第一个匹配项或引发a 。或者,您可以使用

next((x for x in lst if ...), [default value])

查找物品的位置

对于列表,index如果您想知道某个元素在列表中的何处,还有一种方法有时会很有用:

[1,2,3].index(2) # => 1
[1,2,3].index(4) # => ValueError

但是,请注意,如果有重复项,则.index始终返回最低索引:……

[1,2,3,2].index(2) # => 1

如果有重复项,并且想要所有索引,则可以enumerate()改用:

[i for i,x in enumerate([1,2,3,2]) if x==2] # => [1, 3]

As for your first question: that code is perfectly fine and should work if item equals one of the elements inside myList. Maybe you try to find a string that does not exactly match one of the items or maybe you are using a float value which suffers from inaccuracy.

As for your second question: There’s actually several possible ways if “finding” things in lists.

Checking if something is inside

This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:

3 in [1, 2, 3] # => True

Filtering a collection

That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:

matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)

The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to

matches = filter(fulfills_some_condition, lst)

in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn’t return a list, but a generator-like object.

Finding the first occurrence

If you only want the first thing that matches a condition (but you don’t know what it is yet), it’s fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use

next(x for x in lst if ...)

which will return the first match or raise a StopIteration if none is found. Alternatively, you can use

next((x for x in lst if ...), [default value])

Finding the location of an item

For lists, there’s also the index method that can sometimes be useful if you want to know where a certain element is in the list:

[1,2,3].index(2) # => 1
[1,2,3].index(4) # => ValueError

However, note that if you have duplicates, .index always returns the lowest index:……

[1,2,3,2].index(2) # => 1

If there are duplicates and you want all the indexes then you can use enumerate() instead:

[i for i,x in enumerate([1,2,3,2]) if x==2] # => [1, 3]

回答 1

如果要查找一个元素或在中None使用default next,则StopIteration在列表中未找到该元素时不会提高:

first_or_default = next((x for x in lst if ...), None)

If you want to find one element or None use default in next, it won’t raise StopIteration if the item was not found in the list:

first_or_default = next((x for x in lst if ...), None)

回答 2

虽然Niklas B.的答案非常全面,但是当我们想在列表中查找某项时,有时获得其索引很有用:

next((i for i, x in enumerate(lst) if [condition on x]), [default value])

While the answer from Niklas B. is pretty comprehensive, when we want to find an item in a list it is sometimes useful to get its index:

next((i for i, x in enumerate(lst) if [condition on x]), [default value])

回答 3

寻找第一次出现

在其中有一个配方itertools

def first_true(iterable, default=False, pred=None):
    """Returns the first true value in the iterable.

    If no true value is found, returns *default*

    If *pred* is not None, returns the first item
    for which pred(item) is true.

    """
    # first_true([a,b,c], x) --> a or b or c or x
    # first_true([a,b], x, f) --> a if f(a) else b if f(b) else x
    return next(filter(pred, iterable), default)

例如,以下代码查找列表中的第一个奇数:

>>> first_true([2,3,4,5], None, lambda x: x%2==1)
3  

Finding the first occurrence

There’s a recipe for that in itertools:

def first_true(iterable, default=False, pred=None):
    """Returns the first true value in the iterable.

    If no true value is found, returns *default*

    If *pred* is not None, returns the first item
    for which pred(item) is true.

    """
    # first_true([a,b,c], x) --> a or b or c or x
    # first_true([a,b], x, f) --> a if f(a) else b if f(b) else x
    return next(filter(pred, iterable), default)

For example, the following code finds the first odd number in a list:

>>> first_true([2,3,4,5], None, lambda x: x%2==1)
3  

回答 4

另一种选择:您可以使用来检查项目是否在列表中if item in list:,但这是订单O(n)。如果您要处理大量项目,而您只需要知道某项是否是列表的成员,则可以先将列表转换为集合,然后利用恒定时间集查找

my_set = set(my_list)
if item in my_set:  # much faster on average than using a list
    # do something

并非在每种情况下都是正确的解决方案,但是在某些情况下,这可能会为您带来更好的性能。

请注意,使用来创建集合set(my_list)也是O(n),因此,如果您只需要执行一次此操作,则以这种方式进行操作不会更快。但是,如果您需要反复检查成员资格,则在初始集创建之后,每次查找将为O(1)。

Another alternative: you can check if an item is in a list with if item in list:, but this is order O(n). If you are dealing with big lists of items and all you need to know is whether something is a member of your list, you can convert the list to a set first and take advantage of constant time set lookup:

my_set = set(my_list)
if item in my_set:  # much faster on average than using a list
    # do something

Not going to be the correct solution in every case, but for some cases this might give you better performance.

Note that creating the set with set(my_list) is also O(n), so if you only need to do this once then it isn’t any faster to do it this way. If you need to repeatedly check membership though, then this will be O(1) for every lookup after that initial set creation.


回答 5

在处理字符串列表时,您可能想使用两种可能的搜索之一:

  1. 如果list元素等于一个项目(’example’在[‘one’,’example’,’two’]中):

    if item in your_list: some_function_on_true()

    [‘one’,’ex’,’two’]中的’ex’=>真

    [‘one’,’ex’,’two’]中的’ex_1’=>否

  2. 如果list元素就像一个项目(“ ex”在[‘one,’example’,’two’]中,或者’example_1’在[‘one’,’example’,’two’]中):

    matches = [el for el in your_list if item in el]

    要么

    matches = [el for el in your_list if el in item]

    然后只需要检查len(matches)或阅读即可。

You may want to use one of two possible searches while working with list of strings:

  1. if list element is equal to an item (‘example’ is in [‘one’,’example’,’two’]):

    if item in your_list: some_function_on_true()

    ‘ex’ in [‘one’,’ex’,’two’] => True

    ‘ex_1’ in [‘one’,’ex’,’two’] => False

  2. if list element is like an item (‘ex’ is in [‘one,’example’,’two’] or ‘example_1’ is in [‘one’,’example’,’two’]):

    matches = [el for el in your_list if item in el]

    or

    matches = [el for el in your_list if el in item]

    then just check len(matches) or read them if needed.


回答 6

定义和用法

count()方法返回具有指定值的元素数。

句法

list.count(value)

例:

fruits = ['apple', 'banana', 'cherry']

x = fruits.count("cherry")

问题的例子:

item = someSortOfSelection()

if myList.count(item) >= 1 :

    doMySpecialFunction(item)

Definition and Usage

the count() method returns the number of elements with the specified value.

Syntax

list.count(value)

example:

fruits = ['apple', 'banana', 'cherry']

x = fruits.count("cherry")

Question’s example:

item = someSortOfSelection()

if myList.count(item) >= 1 :

    doMySpecialFunction(item)

回答 7

而不是使用的list.index(x)如果在列表中找到返回x的指数或返回#ValueError,如果没有找到X,你可以使用邮件list.count(x)返回列表x的发生次数(验证x是确实在列表中),或者它否则返回0(在没有x的情况下)。有趣的count()是,它不会破坏您的代码,也不会要求您在找不到x时抛出异常

Instead of using list.index(x) which returns the index of x if it is found in list or returns a #ValueError message if x is not found, you could use list.count(x) which returns the number of occurrences of x in list (validation that x is indeed in the list) or it returns 0 otherwise (in the absence of x). The cool thing about count() is that it doesn’t break your code or require you to throw an exception for when x is not found


回答 8

如果您要检查一次收藏品中是否存在值,则可以使用“ in”运算符。但是,如果要检查一次以上,则建议使用bisect模块。请记住,使用bisect模块的数据必须进行排序。因此,您可以对数据进行一次排序,然后可以使用二等分。在我的机器上使用bisect模块比使用“ in”运算符快12倍。

这是使用Python 3.8及更高版本语法的代码示例:

import bisect
from timeit import timeit

def bisect_search(container, value):
    return (
      (index := bisect.bisect_left(container, value)) < len(container) 
      and container[index] == value
    )

data = list(range(1000))
# value to search
true_value = 666
false_value = 66666

# times to test
ttt = 1000

print(f"{bisect_search(data, true_value)=} {bisect_search(data, false_value)=}")

t1 = timeit(lambda: true_value in data, number=ttt)
t2 = timeit(lambda: bisect_search(data, true_value), number=ttt)

print("Performance:", f"{t1=:.4f}, {t2=:.4f}, diffs {t1/t2=:.2f}")

输出:

bisect_search(data, true_value)=True bisect_search(data, false_value)=False
Performance: t1=0.0220, t2=0.0019, diffs t1/t2=11.71

If you are going to check if value exist in the collectible once then using ‘in’ operator is fine. However, if you are going to check for more than once then I recommend using bisect module. Keep in mind that using bisect module data must be sorted. So you sort data once and then you can use bisect. Using bisect module on my machine is about 12 times faster than using ‘in’ operator.

Here is an example of code using Python 3.8 and above syntax:

import bisect
from timeit import timeit

def bisect_search(container, value):
    return (
      (index := bisect.bisect_left(container, value)) < len(container) 
      and container[index] == value
    )

data = list(range(1000))
# value to search
true_value = 666
false_value = 66666

# times to test
ttt = 1000

print(f"{bisect_search(data, true_value)=} {bisect_search(data, false_value)=}")

t1 = timeit(lambda: true_value in data, number=ttt)
t2 = timeit(lambda: bisect_search(data, true_value), number=ttt)

print("Performance:", f"{t1=:.4f}, {t2=:.4f}, diffs {t1/t2=:.2f}")

Output:

bisect_search(data, true_value)=True bisect_search(data, false_value)=False
Performance: t1=0.0220, t2=0.0019, diffs t1/t2=11.71

回答 9

检查字符串列表中的项目是否没有其他多余的空格。这就是可能无法解释项目的原因。

Check there are no additional/unwanted whites space in the items of the list of strings. That’s a reason that can be interfering explaining the items cannot be found.