标签归档:search

最好的Django搜索应用是什么?[关闭]

问题:最好的Django搜索应用是什么?[关闭]

我正在建立一个需要搜索功能的Django项目,直到有一个 django.contrib.search,我必须选择一个搜索应用程序。那么,哪个最好?“最好”是指…

  • 易于安装/设置
  • 具有Django或至少Python友好的API
  • 可以执行相当复杂的搜索

这是我听说过的一些应用程序,如果您知道其中任何一个,请建议其他应用程序:

我还想避免使用第三方搜索引擎(例如Google SiteSearch),因为我想索引的某些数据仅适用于网站成员,不应公开。

I’m building a Django project that needs search functionality, and until there’s a django.contrib.search, I have to choose a search app. So, which is the best? By “best” I mean…

  • easy to install / set up
  • has a Django- or at least Python-friendly API
  • can perform reasonably complex searches

Here are some apps I’ve heard of, please suggest others if you know of any:

I’d also like to avoid using a third-party search engine (like Google SiteSearch), because some of the data I’d like to index is for site members only and should not be public.


回答 0

查阅Haystack Search-一个基于模型的新模型,该抽象层目前支持XapianSolrWhoosh。看起来它得到了很好的支持和记录。

Check out Haystack Search – a new model based search abstraction layer that currently supports Xapian, Solr and Whoosh. Looks like it’s well supported and documented.


回答 1

贾斯汀,我先尝试djangosearch:Jacob Kaplan-Moss(Django的首席开发人员)正在研究它。

潜在危险:

  • 主页警告该API可能并不完全稳定

潜在的好处:

  • “长远的目标是要实现这一目标django.contrib.search。”

Justin, I’d try djangosearch first: Jacob Kaplan-Moss (Django’s lead developer) is working on it.

Potential hazards:

  • The home page warns the API might not be entirely stable

Potential benefits:

  • “The long term goal is for this to become django.contrib.search.”

回答 2

我正在寻找同一件事,就像其他许多人一样。希望很快会添加django.contrib.search

同时,这是我发现的内容:

对我来说,大多数看起来很复杂,坦率地说,实施起来有些艰巨。我很想了解您对这些的看法。

I am searching for the same thing, as are a lot of other people. Let’s hope that django.contrib.search will be added soon.

In the meantime, this is what I found:

To me, most look quite complicated and, frankly, a little daunting to implement. I’d be interested to learn what you think of these.


回答 3

djangosearch的google代码页表明它不再处于积极开发中,并建议haystacksolango

The google code page for djangosearch indicates that it is no longer under active development, and suggests haystack or solango.


回答 4

我建议使用Sphinx进行全文搜索和聚合,而django-sphinx足以用于生产环境。我们发现Sphinx是索引和搜索文档的资源最少,最快的方式,而django-sphinx是sphinx客户端之上的一个很好的包装器。

如果您想显示有多少个带有特定标签或由某个作者(或两者兼有)匹配的文档,则按聚合分组特别好。在内存中,属性更新也很方便,特别是对于立即删除已删除的文章。

I’d recommend Sphinx for full-text search and aggregation, and django-sphinx is good enough for production use. We found that Sphinx was the least resource-intensive and fastest way to index and search our documents and that django-sphinx was a nice wrapper on top of the sphinx client.

The group by aggregation is particularly nice, if for example you want to display how many documents with a certain tag or by a certain author (or both) matched a search. In memory attribute updates were convenient too, especially for removing deleted articles immediately.


回答 5

谢谢加思。我曾经见过djangosearch想成为正式的Django搜索,但是由于无法找到任何文档,我还是犹豫使用它!幸运的是,在Subversion中有一个自述文件,我以前从未见过,它使API看起来很酷:

# set up the model
class Event(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateField()
    is_outdoors = models.BooleanField()

    index = djangosearch.ModelIndex(text=['title'], 
                                    additional=['date', 'is_outdoors'])

# run a search
results = Event.index.search("django conference")

Thanks Garth. I had seen that djangosearch wanted to become the official Django search, but I was hesitant to use it because I couldn’t find any documentation! Luckily, there’s a README in subversion that I hadn’t seen before, and it makes the API look very cool:

# set up the model
class Event(models.Model):
    title = models.CharField(max_length=255)
    date = models.DateField()
    is_outdoors = models.BooleanField()

    index = djangosearch.ModelIndex(text=['title'], 
                                    additional=['date', 'is_outdoors'])

# run a search
results = Event.index.search("django conference")

回答 6

我只需要一个非常快速的解决方案,对于内部应用程序来说就算是小菜一碟。

我很快找到了文章“ 将搜索添加到Django”,这对我来说非常出色!

显然,它缺乏像Haystack这样的真实项目的速度,可伸缩性和功能,但是此项目更易于设置,除关键字AND-search外,我真的不需要其他任何东西。

I just needed a very quick solution that was no-fuss for an internal app.

I found the article Adding search to Django in a snap, and that worked splendid for me!

Obviously it lacks the speed, scalability and features of the real projects like Haystack, but this one is easier to set up, and I don’t really need anything else than keyword AND-search.


回答 7

您可能需要考虑让Yahoo借助其构建自己的搜索服务(BOSS)来完成所有艰苦的工作。这是一篇很棒的博客文章,它将引导您完成整个过程:http : //www.peterkrantz.com/2008/yahoo-search-in-django/

You might want to consider letting Yahoo do all the hard work with their Build your own Search Service (BOSS). Here is a great blog post that walks you through the process: http://www.peterkrantz.com/2008/yahoo-search-in-django/


回答 8

好像这里的每个人都错过了django-xappy

在对Django的所有现有搜索插件进行了快速评估之后,我发现这是最灵活,最易于使用的插件。在几个地方的边缘都很粗糙,但这仍然是在Django项目中使用Xapian搜索引擎功能的最佳方法。

It looks like everyone here missed django-xappy

After quick evaluation of all existing search addons for Django, I found this one as most flexible and easiest to use. It’s rough on the edges in few places, but it’s still the best way to use power of Xapian search engine inside Django projects.


回答 9

您可能想看一下Django Solr搜索(又名“ Solango”),它附带了一些不错的文档来帮助您入门…

You might want to look at Django Solr search (aka “Solango”) which comes with some nice documentation to get you started…


回答 10

如果您要对大量数据建立索引或期望获得高流量,建议您使用一些外部搜索引擎,例如Solr。这样,您将保持无共享方式,并能够独立扩展站点组件。

If you have large amount of data to be indexed or you expect high traffic, I’d suggest using some external search engine, like Solr. This way, you’ll keep shared-nothing approach and be able to scale your site components independently.


回答 11

我想我将不得不向贾比安大喊大叫。

坚如磐石…只需拉下源代码分布并查看内部即可。一流的代码,评论不多。

它仍然是一个年轻的软件项目,但是我认为django社区应该将它的精力放在这个项目上。

I think I am going to have to give a shout out to Djapian.

It is rock-solid…just pull down a source distribution and peek inside. Top notch code, not very many comments tho..

It’s still a young software project, but I think the django community should throw it’s weight behind this one.


回答 12

谢谢乔,

我们决定使用Tsearch2和自定义的postgres适配器。Tsearch2不需要额外的进程来运行,这很方便,因为我们是在内存有限的WebFaction主机上进行的……它尚未完全完成,但似乎是一个很好的解决方案……

Thanks Joe,

We decided to go with Tsearch2 and a custom postgres adaptor. Tsearch2 does not need an extra process to run, which was convenient since we are on a WebFaction hosting with limited memory… It’s not completely done yet, but seems to be a good solution…


回答 13

我发现Djoosh依靠纯Python外部搜索引擎Whoosh与我的“ Python”大脑很好地配合。

I found Djoosh which relies on the pure-python external search engine Whoosh to work well with my ‘Python’ brain.


回答 14

如果您愿意使用第三方搜索引擎,我可以推荐Yahoo BOSSdjango-bosssearch

Yahoo BOSS是一项付费服务​​,但是它可以节省您在服务器上设置和维护其他搜索软件的费用。

If you are willing to use a 3rd party search engine I can recommend Yahoo BOSS and django-bosssearch.

Yahoo BOSS is a paid service, but it saves you setting up and maintaining other search software on your server.


将Python搜索路径扩展到其他来源

问题:将Python搜索路径扩展到其他来源

我刚刚加入了一个具有相当大的现有代码库的项目。我们用linux开发,不使用和IDE。我们通过命令行运行。我试图弄清楚如何在运行项目模块时让python搜索正确的路径。例如,当我运行类似的内容时:

python someprojectfile.py

我懂了

ImportError: no module named core.'somemodule'

我认为所有导入的内容都是这样,我认为这是路径问题。

TLDR:

如何~/codez/project/在导入语句期间让Python搜索* .py文件以及所有文件和文件夹。

I have just joined a project with a rather large existing code base. We develop in linux and do not use and IDE. We run through the command line. I’m trying to figure out how to get python to search for the right path when I run project modules. For instance, when I run something like:

python someprojectfile.py

I get

ImportError: no module named core.'somemodule'

I get this for all of my imports to I assume it’s an issue with the path.

TLDR:

How do I get Python to search ~/codez/project/ and all the files and folders for *.py files during import statements.


回答 0

有几种方法可以做到这一点:

  • 将环境变量设置PYTHONPATH为用冒号分隔的目录列表,以搜索导入的模块。
  • 在您的程序中,用于sys.path.append('/path/to/search')添加您希望Python搜索导入的模块的目录名称。sys.path只是Python在每次被要求导入模块时搜索的目录列表,您可以根据需要进行更改(尽管我不建议删除任何标准目录!)。Python启动时,您放入环境变量中的所有目录PYTHONPATH都将插入sys.path其中。
  • 用于site.addsitedir向添加目录sys.path。此和纯附加的区别在于,当您使用时addsitedir,它还会.pth在该目录中查找文件,并使用它们sys.path根据文件的内容向其中添加其他目录。有关更多详细信息,请参见文档。

您要使用哪一种取决于您的情况。请记住,当您将项目分发给其他用户时,他们通常会以某种方式安装该项目,以便Python的导入程序会自动检测到Python代码文件(即,程序包通常安装在site-packages目录中),因此,如果您弄乱了sys.path代码,当该代码在另一台计算机上运行时,可能是不必要的,甚至可能产生不利影响。对于开发而言,我会大胆猜测设置PYTHONPATH通常是最好的选择。

但是,当您使用仅在自己的计算机上运行的内容时(或当您进行非标准设置时(例如有时在Web应用程序框架中)),执行诸如此类的操作并非完全不常见。

import sys
from os.path import dirname
sys.path.append(dirname(__file__))

There are a few possible ways to do this:

  • Set the environment variable PYTHONPATH to a colon-separated list of directories to search for imported modules.
  • In your program, use sys.path.append('/path/to/search') to add the names of directories you want Python to search for imported modules. sys.path is just the list of directories Python searches every time it gets asked to import a module, and you can alter it as needed (although I wouldn’t recommend removing any of the standard directories!). Any directories you put in the environment variable PYTHONPATH will be inserted into sys.path when Python starts up.
  • Use site.addsitedir to add a directory to sys.path. The difference between this and just plain appending is that when you use addsitedir, it also looks for .pth files within that directory and uses them to possibly add additional directories to sys.path based on the contents of the files. See the documentation for more detail.

Which one of these you want to use depends on your situation. Remember that when you distribute your project to other users, they typically install it in such a manner that the Python code files will be automatically detected by Python’s importer (i.e. packages are usually installed in the site-packages directory), so if you mess with sys.path in your code, that may be unnecessary and might even have adverse effects when that code runs on another computer. For development, I would venture a guess that setting PYTHONPATH is usually the best way to go.

However, when you’re using something that just runs on your own computer (or when you have nonstandard setups, e.g. sometimes in web app frameworks), it’s not entirely uncommon to do something like

import sys
from os.path import dirname
sys.path.append(dirname(__file__))

回答 1

您还应该在这里阅读有关python软件包的信息:http : //docs.python.org/tutorial/modules.html

从您的示例中,我想您确实在拥有一个软件包~/codez/project__init__.pypython目录中的文件将目录映射到命名空间。如果所有子目录都有一个__init__.py文件,则只需将基本目录添加到中PYTHONPATH。例如:

PYTHONPATH = $ PYTHONPATH:$ HOME / adaifotis / project

正如David解释的那样,除了测试您的PYTHONPATH环境变量外,您还可以像这样在python中对其进行测试:

$ python
>>> import project                      # should work if PYTHONPATH set
>>> import sys
>>> for line in sys.path: print line    # print current python path

You should also read about python packages here: http://docs.python.org/tutorial/modules.html.

From your example, I would guess that you really have a package at ~/codez/project. The file __init__.py in a python directory maps a directory into a namespace. If your subdirectories all have an __init__.py file, then you only need to add the base directory to your PYTHONPATH. For example:

PYTHONPATH=$PYTHONPATH:$HOME/adaifotis/project

In addition to testing your PYTHONPATH environment variable, as David explains, you can test it in python like this:

$ python
>>> import project                      # should work if PYTHONPATH set
>>> import sys
>>> for line in sys.path: print line    # print current python path


回答 2

我知道这个线程有些陈旧,但是花了我一些时间才能真正理解这一点,所以我想分享一下。

在我的项目中,我的主脚本位于父目录中,为了区分模块,我将所有支持的模块放在一个称为“模块”的子文件夹中。在我的主脚本中,我像这样导入这些模块(对于名为report.py的模块):

from modules.report import report, reportError

如果我调用主脚本,则可以正常工作。但是,我想通过main()在每个模块中包含一个并直接调用每个模块来测试每个模块,如下所示:

python modules/report.py

现在,Python抱怨它找不到“称为模块的模块”。这里的关键是,默认情况下,Python在其搜索路径中包括脚本的文件夹,但不是CWD。因此,此错误实际上表示“我找不到模块子文件夹”。这是因为report.py模块所在的目录中没有“ modules”子目录。

我发现最巧妙的解决方案是通过在顶部包括以下内容来将CWD附加到Python搜索路径中:

import sys

sys.path.append(".")

现在,Python搜索CWD(当前目录),找到“模块”子文件夹,一切顺利。

I know this thread is a bit old, but it took me some time to get to the heart of this, so I wanted to share.

In my project, I had the main script in a parent directory, and, to differentiate the modules, I put all the supporting modules in a sub-folder called “modules”. In my main script, I import these modules like this (for a module called report.py):

from modules.report import report, reportError

If I call my main script, this works. HOWEVER, I wanted to test each module by including a main() in each, and calling each directly, as:

python modules/report.py

Now Python complains that it can’t find “a module called modules”. The key here is that, by default, Python includes the folder of the script in its search path, BUT NOT THE CWD. So what this error says, really, is “I can’t find a modules subfolder”. The is because there is no “modules” subdirectory from the directory where the report.py module resides.

I find that the neatest solution to this is to append the CWD in Python search path by including this at the top:

import sys

sys.path.append(".")

Now Python searches the CWD (current directory), finds the “modules” sub-folder, and all is well.


回答 3

我阅读了此问题以寻找答案,但不喜欢其中任何一个。

所以我写了一个快速而肮脏的解决方案。只需将其放在sys.path上的某个位置,它将在folder(来自当前工作目录)或下添加任何目录abspath

#using.py

import sys, os.path

def all_from(folder='', abspath=None):
    """add all dirs under `folder` to sys.path if any .py files are found.
    Use an abspath if you'd rather do it that way.

    Uses the current working directory as the location of using.py. 
    Keep in mind that os.walk goes *all the way* down the directory tree.
    With that, try not to use this on something too close to '/'

    """
    add = set(sys.path)
    if abspath is None:
        cwd = os.path.abspath(os.path.curdir)
        abspath = os.path.join(cwd, folder)
    for root, dirs, files in os.walk(abspath):
        for f in files:
            if f[-3:] in '.py':
                add.add(root)
                break
    for i in add: sys.path.append(i)

>>> import using, sys, pprint
>>> using.all_from('py') #if in ~, /home/user/py/
>>> pprint.pprint(sys.path)
[
#that was easy
]

我之所以喜欢它,是因为我可以为一些随机工具提供一个文件夹,而不必让它们成为软件包或任何东西的一部分,并且仍然可以通过几行代码访问其中的一些(或全部)。

I read this question looking for an answer, and didn’t like any of them.

So I wrote a quick and dirty solution. Just put this somewhere on your sys.path, and it’ll add any directory under folder (from the current working directory), or under abspath:

#using.py

import sys, os.path

def all_from(folder='', abspath=None):
    """add all dirs under `folder` to sys.path if any .py files are found.
    Use an abspath if you'd rather do it that way.

    Uses the current working directory as the location of using.py. 
    Keep in mind that os.walk goes *all the way* down the directory tree.
    With that, try not to use this on something too close to '/'

    """
    add = set(sys.path)
    if abspath is None:
        cwd = os.path.abspath(os.path.curdir)
        abspath = os.path.join(cwd, folder)
    for root, dirs, files in os.walk(abspath):
        for f in files:
            if f[-3:] in '.py':
                add.add(root)
                break
    for i in add: sys.path.append(i)

>>> import using, sys, pprint
>>> using.all_from('py') #if in ~, /home/user/py/
>>> pprint.pprint(sys.path)
[
#that was easy
]

And I like it because I can have a folder for some random tools and not have them be a part of packages or anything, and still get access to some (or all) of them in a couple lines of code.


回答 4

我发现最简单的方法是创建一个文件“ any_name.pth”,并将其放在文件夹“ \ Lib \ site-packages”中。您应该在安装python的任何位置都找到该文件夹​​。

在该文件中,放入要保留要导入模块的目录列表。例如,在该文件中这样一行:

C:\ Users \ example … \ example

您可以通过在python中运行它来告诉它工作:

import sys
for line in sys: print line

您将看到已打印的目录,以及从中可以导入的目录。现在,您可以轻松地导入该目录中的“ mymodule.py”文件:

import mymodule

这将不会导入子文件夹。为此,您可以想象创建一个python脚本来创建一个.pth文件,其中包含您定义的文件夹的所有子文件夹。让它在启动时运行。

The easiest way I find is to create a file “any_name.pth” and put it in your folder “\Lib\site-packages”. You should find that folder wherever python is installed.

In that file, put a list of directories where you want to keep modules for importing. For instance, make a line in that file like this:

C:\Users\example…\example

You will be able to tell it works by running this in python:

import sys
for line in sys: print line

You will see your directory printed out, amongst others from where you can also import. Now you can import a “mymodule.py” file that sits in that directory as easily as:

import mymodule

This will not import subfolders. For that you could imagine creating a python script to create a .pth file containing all sub folders of a folder you define. Have it run at startup perhaps.


回答 5

旧问题的新选择。在Debian上
安装fail2ban软件包,看起来像是硬编码安装/usr/lib/python3/dist-packages/fail2ban在python3以外的路径上sys.path


> python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Jun 25 2019, 18:51:50)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/lib/python3.7/site-packages']
>>>

因此,我(bash)将库链接到较新的版本,而不仅仅是复制。
将来对原始应用程序的更新也将自动应用于链接版本。

 if [ -d /usr/lib/python3/dist-packages/fail2ban ]
   then
      for d in /usr/lib/python3.*
      do
         [ -d ${d}/fail2ban ] || \
            ln -vs /usr/lib/python3/dist-packages/fail2ban ${d}/
      done
   fi

New option for old question.
Installing fail2ban package on Debian, looks like it’s hardcoded to install on /usr/lib/python3/dist-packages/fail2ban a path not on python3 sys.path.


> python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Jun 25 2019, 18:51:50)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/lib/python3.7/site-packages']
>>>

so, instead of just copying, I (bash) linked the library to newer versions.
Future updates to the original app, will also be automatically applied to the linked versions.

 if [ -d /usr/lib/python3/dist-packages/fail2ban ]
   then
      for d in /usr/lib/python3.*
      do
         [ -d ${d}/fail2ban ] || \
            ln -vs /usr/lib/python3/dist-packages/fail2ban ${d}/
      done
   fi

如何实现二叉树?

问题:如何实现二叉树?

哪种最佳数据结构可用于在Python中实现二叉树?

Which is the best data structure that can be used to implement a binary tree in Python?


回答 0

这是二进制搜索树的简单递归实现。

#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if self.root is None:
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if val < node.v:
            if node.l is not None:
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if node.r is not None:
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if self.root is not None:
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if val == node.v:
            return node
        elif (val < node.v and node.l is not None):
            self._find(val, node.l)
        elif (val > node.v and node.r is not None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if self.root is not None:
            self._printTree(self.root)

    def _printTree(self, node):
        if node is not None:
            self._printTree(node.l)
            print(str(node.v) + ' ')
            self._printTree(node.r)

#     3
# 0     4
#   2      8
tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print(tree.find(3).v)
print(tree.find(10))
tree.deleteTree()
tree.printTree()

Here is my simple recursive implementation of binary search tree.

#!/usr/bin/python

class Node:
    def __init__(self, val):
        self.l = None
        self.r = None
        self.v = val

class Tree:
    def __init__(self):
        self.root = None

    def getRoot(self):
        return self.root

    def add(self, val):
        if self.root is None:
            self.root = Node(val)
        else:
            self._add(val, self.root)

    def _add(self, val, node):
        if val < node.v:
            if node.l is not None:
                self._add(val, node.l)
            else:
                node.l = Node(val)
        else:
            if node.r is not None:
                self._add(val, node.r)
            else:
                node.r = Node(val)

    def find(self, val):
        if self.root is not None:
            return self._find(val, self.root)
        else:
            return None

    def _find(self, val, node):
        if val == node.v:
            return node
        elif (val < node.v and node.l is not None):
            self._find(val, node.l)
        elif (val > node.v and node.r is not None):
            self._find(val, node.r)

    def deleteTree(self):
        # garbage collector will do this for us. 
        self.root = None

    def printTree(self):
        if self.root is not None:
            self._printTree(self.root)

    def _printTree(self, node):
        if node is not None:
            self._printTree(node.l)
            print(str(node.v) + ' ')
            self._printTree(node.r)

#     3
# 0     4
#   2      8
tree = Tree()
tree.add(3)
tree.add(4)
tree.add(0)
tree.add(8)
tree.add(2)
tree.printTree()
print(tree.find(3).v)
print(tree.find(10))
tree.deleteTree()
tree.printTree()

回答 1

# simple binary tree
# in this implementation, a node is inserted between an existing node and the root


class BinaryTree():

    def __init__(self,rootid):
      self.left = None
      self.right = None
      self.rootid = rootid

    def getLeftChild(self):
        return self.left
    def getRightChild(self):
        return self.right
    def setNodeValue(self,value):
        self.rootid = value
    def getNodeValue(self):
        return self.rootid

    def insertRight(self,newNode):
        if self.right == None:
            self.right = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.right = self.right
            self.right = tree

    def insertLeft(self,newNode):
        if self.left == None:
            self.left = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.left = self.left
            self.left = tree


def printTree(tree):
        if tree != None:
            printTree(tree.getLeftChild())
            print(tree.getNodeValue())
            printTree(tree.getRightChild())



# test tree

def testTree():
    myTree = BinaryTree("Maud")
    myTree.insertLeft("Bob")
    myTree.insertRight("Tony")
    myTree.insertRight("Steven")
    printTree(myTree)

在此处了解更多信息:-这是一个非常简单的实现二进制树。

是一个很好的教程,中间有问题

# simple binary tree
# in this implementation, a node is inserted between an existing node and the root


class BinaryTree():

    def __init__(self,rootid):
      self.left = None
      self.right = None
      self.rootid = rootid

    def getLeftChild(self):
        return self.left
    def getRightChild(self):
        return self.right
    def setNodeValue(self,value):
        self.rootid = value
    def getNodeValue(self):
        return self.rootid

    def insertRight(self,newNode):
        if self.right == None:
            self.right = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.right = self.right
            self.right = tree

    def insertLeft(self,newNode):
        if self.left == None:
            self.left = BinaryTree(newNode)
        else:
            tree = BinaryTree(newNode)
            tree.left = self.left
            self.left = tree


def printTree(tree):
        if tree != None:
            printTree(tree.getLeftChild())
            print(tree.getNodeValue())
            printTree(tree.getRightChild())



# test tree

def testTree():
    myTree = BinaryTree("Maud")
    myTree.insertLeft("Bob")
    myTree.insertRight("Tony")
    myTree.insertRight("Steven")
    printTree(myTree)

Read more about it Here:-This is a very simple implementation of a binary tree.

This is a nice tutorial with questions in between


回答 2

[采访所需的内容] Node类是足以表示二叉树的数据结构。

(尽管其他答案大多数都是正确的,但对于二叉树而言,它们不是必需的:无需扩展对象类,无需成为BST,无需导入双端队列)。

class Node:

    def __init__(self, value = None):
        self.left  = None
        self.right = None
        self.value = value

这是一棵树的例子:

n1 = Node(1)
n2 = Node(2)
n3 = Node(3)
n1.left  = n2
n1.right = n3

在此示例中,n1是具有n2,n3作为其子级的树的根。

在此处输入图片说明

[What you need for interviews] A Node class is the sufficient data structure to represent a binary tree.

(While other answers are mostly correct, they are not required for a binary tree: no need to extend object class, no need to be a BST, no need to import deque).

class Node:

    def __init__(self, value = None):
        self.left  = None
        self.right = None
        self.value = value

Here is an example of a tree:

n1 = Node(1)
n2 = Node(2)
n3 = Node(3)
n1.left  = n2
n1.right = n3

In this example n1 is the root of the tree having n2, n3 as its children.

enter image description here


回答 3

BST在Python中的简单实现

class TreeNode:
    def __init__(self, value):
        self.left = None
        self.right = None
        self.data = value

class Tree:
    def __init__(self):
        self.root = None

    def addNode(self, node, value):
        if(node==None):
            self.root = TreeNode(value)
        else:
            if(value<node.data):
                if(node.left==None):
                    node.left = TreeNode(value)
                else:
                    self.addNode(node.left, value)
            else:
                if(node.right==None):
                    node.right = TreeNode(value)
                else:
                    self.addNode(node.right, value)

    def printInorder(self, node):
        if(node!=None):
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)

def main():
    testTree = Tree()
    testTree.addNode(testTree.root, 200)
    testTree.addNode(testTree.root, 300)
    testTree.addNode(testTree.root, 100)
    testTree.addNode(testTree.root, 30)
    testTree.printInorder(testTree.root)

Simple implementation of BST in Python

class TreeNode:
    def __init__(self, value):
        self.left = None
        self.right = None
        self.data = value

class Tree:
    def __init__(self):
        self.root = None

    def addNode(self, node, value):
        if(node==None):
            self.root = TreeNode(value)
        else:
            if(value<node.data):
                if(node.left==None):
                    node.left = TreeNode(value)
                else:
                    self.addNode(node.left, value)
            else:
                if(node.right==None):
                    node.right = TreeNode(value)
                else:
                    self.addNode(node.right, value)

    def printInorder(self, node):
        if(node!=None):
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)

def main():
    testTree = Tree()
    testTree.addNode(testTree.root, 200)
    testTree.addNode(testTree.root, 300)
    testTree.addNode(testTree.root, 100)
    testTree.addNode(testTree.root, 30)
    testTree.printInorder(testTree.root)

回答 4

使用列表实现二叉树的一种非常快捷的方法。这不是最有效的方法,也不能很好地处理nil值。但这非常透明(至少对我而言):

def _add(node, v):
    new = [v, [], []]
    if node:
        left, right = node[1:]
        if not left:
            left.extend(new)
        elif not right:
            right.extend(new)
        else:
            _add(left, v)
    else:
        node.extend(new)

def binary_tree(s):
    root = []
    for e in s:
        _add(root, e)
    return root

def traverse(n, order):
    if n:
        v = n[0]
        if order == 'pre':
            yield v
        for left in traverse(n[1], order):
            yield left
        if order == 'in':
            yield v
        for right in traverse(n[2], order):
            yield right
        if order == 'post':
            yield v

从可迭代构造树:

 >>> tree = binary_tree('A B C D E'.split())
 >>> print tree
 ['A', ['B', ['D', [], []], ['E', [], []]], ['C', [], []]]

遍历一棵树:

 >>> list(traverse(tree, 'pre')), list(traverse(tree, 'in')), list(traverse(tree, 'post'))
 (['A', 'B', 'D', 'E', 'C'],
  ['D', 'B', 'E', 'A', 'C'],
  ['D', 'E', 'B', 'C', 'A'])

A very quick ‘n dirty way of implementing a binary tree using lists. Not the most efficient, nor does it handle nil values all too well. But it’s very transparent (at least to me):

def _add(node, v):
    new = [v, [], []]
    if node:
        left, right = node[1:]
        if not left:
            left.extend(new)
        elif not right:
            right.extend(new)
        else:
            _add(left, v)
    else:
        node.extend(new)

def binary_tree(s):
    root = []
    for e in s:
        _add(root, e)
    return root

def traverse(n, order):
    if n:
        v = n[0]
        if order == 'pre':
            yield v
        for left in traverse(n[1], order):
            yield left
        if order == 'in':
            yield v
        for right in traverse(n[2], order):
            yield right
        if order == 'post':
            yield v

Constructing a tree from an iterable:

 >>> tree = binary_tree('A B C D E'.split())
 >>> print tree
 ['A', ['B', ['D', [], []], ['E', [], []]], ['C', [], []]]

Traversing a tree:

 >>> list(traverse(tree, 'pre')), list(traverse(tree, 'in')), list(traverse(tree, 'post'))
 (['A', 'B', 'D', 'E', 'C'],
  ['D', 'B', 'E', 'A', 'C'],
  ['D', 'E', 'B', 'C', 'A'])

回答 5

我不禁注意到这里的大多数答案都在实现二进制搜索树。二进制搜索树!=二进制树。

  • 二叉搜索树具有非常特殊的属性:对于任何节点X,X的密钥都大于其左子节点的任何后代的关键字,并且小于其右子节点的任何后代的关键字。

  • 二叉树不施加这样的限制。二叉树只是具有“键”元素和两个孩子的数据结构,分别是“左”和“右”。

  • 树是二进制树的更一般的情况,其中每个节点可以具有任意数量的子代。通常,每个节点都有一个“孩子”元素,其类型为列表/数组。

现在,为了回答OP的问题,我将在Python中包含Binary Tree的完整实现。给定它提供最佳的O(1)查找,存储每个BinaryTreeNode的基础数据结构是一个字典。我还实现了深度优先遍历和深度优先遍历。这些是在树上执行的非常常见的操作。

from collections import deque

class BinaryTreeNode:
    def __init__(self, key, left=None, right=None):
        self.key = key
        self.left = left
        self.right = right

    def __repr__(self):
        return "%s l: (%s) r: (%s)" % (self.key, self.left, self.right)

    def __eq__(self, other):
        if self.key == other.key and \
            self.right == other.right and \
                self.left == other.left:
            return True
        else:
            return False

class BinaryTree:
    def __init__(self, root_key=None):
        # maps from BinaryTreeNode key to BinaryTreeNode instance.
        # Thus, BinaryTreeNode keys must be unique.
        self.nodes = {}
        if root_key is not None:
            # create a root BinaryTreeNode
            self.root = BinaryTreeNode(root_key)
            self.nodes[root_key] = self.root

    def add(self, key, left_key=None, right_key=None):
        if key not in self.nodes:
            # BinaryTreeNode with given key does not exist, create it
            self.nodes[key] = BinaryTreeNode(key)
        # invariant: self.nodes[key] exists

        # handle left child
        if left_key is None:
            self.nodes[key].left = None
        else:
            if left_key not in self.nodes:
                self.nodes[left_key] = BinaryTreeNode(left_key)
            # invariant: self.nodes[left_key] exists
            self.nodes[key].left = self.nodes[left_key]

        # handle right child
        if right_key == None:
            self.nodes[key].right = None
        else:
            if right_key not in self.nodes:
                self.nodes[right_key] = BinaryTreeNode(right_key)
            # invariant: self.nodes[right_key] exists
            self.nodes[key].right = self.nodes[right_key]

    def remove(self, key):
        if key not in self.nodes:
            raise ValueError('%s not in tree' % key)
        # remove key from the list of nodes
        del self.nodes[key]
        # if node removed is left/right child, update parent node
        for k in self.nodes:
            if self.nodes[k].left and self.nodes[k].left.key == key:
                self.nodes[k].left = None
            if self.nodes[k].right and self.nodes[k].right.key == key:
                self.nodes[k].right = None
        return True

    def _height(self, node):
        if node is None:
            return 0
        else:
            return 1 + max(self._height(node.left), self._height(node.right))

    def height(self):
        return self._height(self.root)

    def size(self):
        return len(self.nodes)

    def __repr__(self):
        return str(self.traverse_inorder(self.root))

    def bfs(self, node):
        if not node or node not in self.nodes:
            return
        reachable = []    
        q = deque()
        # add starting node to queue
        q.append(node)
        while len(q):
            visit = q.popleft()
            # add currently visited BinaryTreeNode to list
            reachable.append(visit)
            # add left/right children as needed
            if visit.left:
                q.append(visit.left)
            if visit.right:
                q.append(visit.right)
        return reachable

    # visit left child, root, then right child
    def traverse_inorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_inorder(node.left, reachable)
        reachable.append(node.key)
        self.traverse_inorder(node.right, reachable)
        return reachable

    # visit left and right children, then root
    # root of tree is always last to be visited
    def traverse_postorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_postorder(node.left, reachable)
        self.traverse_postorder(node.right, reachable)
        reachable.append(node.key)
        return reachable

    # visit root, left, then right children
    # root is always visited first
    def traverse_preorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        reachable.append(node.key)
        self.traverse_preorder(node.left, reachable)
        self.traverse_preorder(node.right, reachable)
        return reachable

I can’t help but notice that most answers here are implementing a Binary Search Tree. Binary Search Tree != Binary Tree.

  • A Binary Search Tree has a very specific property: for any node X, X’s key is larger than the key of any descendent of its left child, and smaller than the key of any descendant of its right child.

  • A Binary Tree imposes no such restriction. A Binary Tree is simply a data structure with a ‘key’ element, and two children, say ‘left’ and ‘right’.

  • A Tree is an even more general case of a Binary Tree where each node can have an arbitrary number of children. Typically, each node has a ‘children’ element which is of type list/array.

Now, to answer the OP’s question, I am including a full implementation of a Binary Tree in Python. The underlying data structure storing each BinaryTreeNode is a dictionary, given it offers optimal O(1) lookups. I’ve also implemented depth-first and breadth-first traversals. These are very common operations performed on trees.

from collections import deque

class BinaryTreeNode:
    def __init__(self, key, left=None, right=None):
        self.key = key
        self.left = left
        self.right = right

    def __repr__(self):
        return "%s l: (%s) r: (%s)" % (self.key, self.left, self.right)

    def __eq__(self, other):
        if self.key == other.key and \
            self.right == other.right and \
                self.left == other.left:
            return True
        else:
            return False

class BinaryTree:
    def __init__(self, root_key=None):
        # maps from BinaryTreeNode key to BinaryTreeNode instance.
        # Thus, BinaryTreeNode keys must be unique.
        self.nodes = {}
        if root_key is not None:
            # create a root BinaryTreeNode
            self.root = BinaryTreeNode(root_key)
            self.nodes[root_key] = self.root

    def add(self, key, left_key=None, right_key=None):
        if key not in self.nodes:
            # BinaryTreeNode with given key does not exist, create it
            self.nodes[key] = BinaryTreeNode(key)
        # invariant: self.nodes[key] exists

        # handle left child
        if left_key is None:
            self.nodes[key].left = None
        else:
            if left_key not in self.nodes:
                self.nodes[left_key] = BinaryTreeNode(left_key)
            # invariant: self.nodes[left_key] exists
            self.nodes[key].left = self.nodes[left_key]

        # handle right child
        if right_key == None:
            self.nodes[key].right = None
        else:
            if right_key not in self.nodes:
                self.nodes[right_key] = BinaryTreeNode(right_key)
            # invariant: self.nodes[right_key] exists
            self.nodes[key].right = self.nodes[right_key]

    def remove(self, key):
        if key not in self.nodes:
            raise ValueError('%s not in tree' % key)
        # remove key from the list of nodes
        del self.nodes[key]
        # if node removed is left/right child, update parent node
        for k in self.nodes:
            if self.nodes[k].left and self.nodes[k].left.key == key:
                self.nodes[k].left = None
            if self.nodes[k].right and self.nodes[k].right.key == key:
                self.nodes[k].right = None
        return True

    def _height(self, node):
        if node is None:
            return 0
        else:
            return 1 + max(self._height(node.left), self._height(node.right))

    def height(self):
        return self._height(self.root)

    def size(self):
        return len(self.nodes)

    def __repr__(self):
        return str(self.traverse_inorder(self.root))

    def bfs(self, node):
        if not node or node not in self.nodes:
            return
        reachable = []    
        q = deque()
        # add starting node to queue
        q.append(node)
        while len(q):
            visit = q.popleft()
            # add currently visited BinaryTreeNode to list
            reachable.append(visit)
            # add left/right children as needed
            if visit.left:
                q.append(visit.left)
            if visit.right:
                q.append(visit.right)
        return reachable

    # visit left child, root, then right child
    def traverse_inorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_inorder(node.left, reachable)
        reachable.append(node.key)
        self.traverse_inorder(node.right, reachable)
        return reachable

    # visit left and right children, then root
    # root of tree is always last to be visited
    def traverse_postorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        self.traverse_postorder(node.left, reachable)
        self.traverse_postorder(node.right, reachable)
        reachable.append(node.key)
        return reachable

    # visit root, left, then right children
    # root is always visited first
    def traverse_preorder(self, node, reachable=None):
        if not node or node.key not in self.nodes:
            return
        if reachable is None:
            reachable = []
        reachable.append(node.key)
        self.traverse_preorder(node.left, reachable)
        self.traverse_preorder(node.right, reachable)
        return reachable

回答 6

你不需要两节课

class Tree:
    val = None
    left = None
    right = None

    def __init__(self, val):
        self.val = val


    def insert(self, val):
        if self.val is not None:
            if val < self.val:
                if self.left is not None:
                    self.left.insert(val)
                else:
                    self.left = Tree(val)
            elif val > self.val:
                if self.right is not None:
                    self.right.insert(val)
                else:
                    self.right = Tree(val)
            else:
                return
        else:
            self.val = val
            print("new node added")

    def showTree(self):
        if self.left is not None:
            self.left.showTree()
        print(self.val, end = ' ')
        if self.right is not None:
            self.right.showTree()

you don’t need to have two classes

class Tree:
    val = None
    left = None
    right = None

    def __init__(self, val):
        self.val = val


    def insert(self, val):
        if self.val is not None:
            if val < self.val:
                if self.left is not None:
                    self.left.insert(val)
                else:
                    self.left = Tree(val)
            elif val > self.val:
                if self.right is not None:
                    self.right.insert(val)
                else:
                    self.right = Tree(val)
            else:
                return
        else:
            self.val = val
            print("new node added")

    def showTree(self):
        if self.left is not None:
            self.left.showTree()
        print(self.val, end = ' ')
        if self.right is not None:
            self.right.showTree()

回答 7

多一点“ Pythonic”?

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def __repr__(self):
        return str(self.value)



class BST:
    def __init__(self):
        self.root = None

    def __repr__(self):
        self.sorted = []
        self.get_inorder(self.root)
        return str(self.sorted)

    def get_inorder(self, node):
        if node:
            self.get_inorder(node.left)
            self.sorted.append(str(node.value))
            self.get_inorder(node.right)

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self._add(self.root, value)

    def _add(self, node, value):
        if value <= node.value:
            if node.left:
                self._add(node.left, value)
            else:
                node.left = Node(value)
        else:
            if node.right:
                self._add(node.right, value)
            else:
                node.right = Node(value)



from random import randint

bst = BST()

for i in range(100):
    bst.add(randint(1, 1000))
print (bst)

A little more “Pythonic” ?

class Node:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

    def __repr__(self):
        return str(self.value)



class BST:
    def __init__(self):
        self.root = None

    def __repr__(self):
        self.sorted = []
        self.get_inorder(self.root)
        return str(self.sorted)

    def get_inorder(self, node):
        if node:
            self.get_inorder(node.left)
            self.sorted.append(str(node.value))
            self.get_inorder(node.right)

    def add(self, value):
        if not self.root:
            self.root = Node(value)
        else:
            self._add(self.root, value)

    def _add(self, node, value):
        if value <= node.value:
            if node.left:
                self._add(node.left, value)
            else:
                node.left = Node(value)
        else:
            if node.right:
                self._add(node.right, value)
            else:
                node.right = Node(value)



from random import randint

bst = BST()

for i in range(100):
    bst.add(randint(1, 1000))
print (bst)

回答 8

#!/usr/bin/python

class BinaryTree:
    def __init__(self, left, right, data):
        self.left = left
        self.right = right
        self.data = data


    def pre_order_traversal(root):
        print(root.data, end=' ')

        if root.left != None:
            pre_order_traversal(root.left)

        if root.right != None:
            pre_order_traversal(root.right)

    def in_order_traversal(root):
        if root.left != None:
            in_order_traversal(root.left)
        print(root.data, end=' ')
        if root.right != None:
            in_order_traversal(root.right)

    def post_order_traversal(root):
        if root.left != None:
            post_order_traversal(root.left)
        if root.right != None:
            post_order_traversal(root.right)
        print(root.data, end=' ')
#!/usr/bin/python

class BinaryTree:
    def __init__(self, left, right, data):
        self.left = left
        self.right = right
        self.data = data


    def pre_order_traversal(root):
        print(root.data, end=' ')

        if root.left != None:
            pre_order_traversal(root.left)

        if root.right != None:
            pre_order_traversal(root.right)

    def in_order_traversal(root):
        if root.left != None:
            in_order_traversal(root.left)
        print(root.data, end=' ')
        if root.right != None:
            in_order_traversal(root.right)

    def post_order_traversal(root):
        if root.left != None:
            post_order_traversal(root.left)
        if root.right != None:
            post_order_traversal(root.right)
        print(root.data, end=' ')

回答 9

一个 Node基类连接的节点的是一个标准的做法。这些可能很难想象。

摘自有关Python模式-实现图形文章,请考虑一个简单的字典:

给定

二叉树

               a
              / \
             b   c
            / \   \
           d   e   f

制作一个唯一节点的字典:

tree = {
   "a": ["b", "c"],
   "b": ["d", "e"],
   "c": [None, "f"],
   "d": [None, None],
   "e": [None, None],
   "f": [None, None],
}

细节

  • 每个键值对都是一个唯一的节点指向其子级。
  • 列表(或元组)包含一对有序的左/右子级。
  • 有命令命令插入字典,假定第一个条目为根。
  • 常用方法可以是使dict变异或遍历的函数(请参阅参考资料find_all_paths())。

基于树的功能通常包括以下常见操作:

  • 遍历:以给定的顺序产生每个节点(通常从左到右)
    • 广度优先搜索(BFS):遍历级别
    • 深度优先搜索(DFS):先进行遍历分支(前/后/后顺序)
  • insert:根据子节点数将节点添加到树中
  • remove:根据子节点数删除节点
  • 更新:将丢失的节点从一棵树合并到另一棵树
  • visit:得出遍历节点的值

尝试实施所有这些操作。在这里,我们演示这些功能之一 -BFS遍历:

import collections as ct


def traverse(tree):
    """Yield nodes from a tree via BFS."""
    q = ct.deque()                                         # 1
    root = next(iter(tree))                                # 2
    q.append(root)

    while q:
        node = q.popleft()
        children = filter(None, tree.get(node))
        for n in children:                                 # 3 
            q.append(n)
        yield node

list(traverse(tree))
# ['a', 'b', 'c', 'd', 'e', 'f']

这是应用于节点和子字典的广度优先搜索(级别顺序)算法

  1. 初始化FIFO队列。我们使用deque,但使用queuelist作品(后者效率低下)。
  2. 获取并排队根节点(假设根是字典中的第一个条目,Python 3.6+)。
  3. 迭代地使一个节点出队,使其子节点入队并产生节点值。

另请参阅此有关树的深入教程


洞察力

一般而言,遍历有很多好处,我们只需将队列替换为堆栈即可轻松地将后者的迭代方法更改为深度优先搜索(DFS)(即LIFO队列))。这仅表示我们从排队的同一侧出队。DFS允许我们搜索每个分支。

怎么样?由于我们使用deque,我们可以通过更改node = q.popleft()node = q.pop()(右)来模拟堆栈。结果是正确的,预购的DFS['a', 'c', 'f', 'b', 'e', 'd']

A Node-based class of connected nodes is a standard approach. These can be hard to visualize.

Motivated from an essay on Python Patterns – Implementing Graphs, consider a simple dictionary:

Given

A binary tree

               a
              / \
             b   c
            / \   \
           d   e   f

Code

Make a dictionary of unique nodes:

tree = {
   "a": ["b", "c"],
   "b": ["d", "e"],
   "c": [None, "f"],
   "d": [None, None],
   "e": [None, None],
   "f": [None, None],
}

Details

  • Each key-value pair is a unique node pointing to its children.
  • A list (or tuple) holds an ordered pair of left/right children.
  • With a dict having ordered insertion, assume the first entry is the root.
  • Common methods can be functions that mutate or traverse the dict (see find_all_paths()).

Tree-based functions often include the following common operations:

  • traverse: yield each node in a given order (usually left-to-right)
    • breadth-first search (BFS): traverse levels
    • depth-first search (DFS): traverse branches first (pre-/in-/post-order)
  • insert: add a node to the tree depending on the number of children
  • remove: remove a node depending on the number of children
  • update: merge missing nodes from one tree to the other
  • visit: yield the value of a traversed node

Try implementing all of these operations. Here we demonstrate one of these functions – a BFS traversal:

Example

import collections as ct


def traverse(tree):
    """Yield nodes from a tree via BFS."""
    q = ct.deque()                                         # 1
    root = next(iter(tree))                                # 2
    q.append(root)

    while q:
        node = q.popleft()
        children = filter(None, tree.get(node))
        for n in children:                                 # 3 
            q.append(n)
        yield node

list(traverse(tree))
# ['a', 'b', 'c', 'd', 'e', 'f']

This is a breadth-first search (level-order) algorithm applied to a dict of nodes and children.

  1. Initialize a FIFO queue. We use a deque, but a queue or a list works (the latter is inefficient).
  2. Get and enqueue the root node (assumes the root is the first entry in the dict, Python 3.6+).
  3. Iteratively dequeue a node, enqueue its children and yield the node value.

See also this in-depth tutorial on trees.


Insight

Something great about traversals in general, we can easily alter the latter iterative approach to depth-first search (DFS) by simply replacing the queue with a stack (a.k.a LIFO Queue). This simply means we dequeue from the same side that we enqueue. DFS allows us to search each branch.

How? Since we are using a deque, we can emulate a stack by changing node = q.popleft() to node = q.pop() (right). The result is a right-favored, pre-ordered DFS: ['a', 'c', 'f', 'b', 'e', 'd'].


回答 10

import random

class TreeNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.p = None

class BinaryTree:
    def __init__(self):
        self.root = None

    def length(self):
        return self.size

    def inorder(self, node):
        if node == None:
            return None
        else:
            self.inorder(node.left)
            print node.key,
            self.inorder(node.right)

    def search(self, k):
        node = self.root
        while node != None:
            if node.key == k:
                return node
            if node.key > k:
                node = node.left
            else:
                node = node.right
        return None

    def minimum(self, node):
        x = None
        while node.left != None:
            x = node.left
            node = node.left
        return x

    def maximum(self, node):
        x = None
        while node.right != None:
            x = node.right
            node = node.right
        return x

    def successor(self, node):
        parent = None
        if node.right != None:
            return self.minimum(node.right)
        parent = node.p
        while parent != None and node == parent.right:
            node = parent
            parent = parent.p
        return parent

    def predecessor(self, node):
        parent = None
        if node.left != None:
            return self.maximum(node.left)
        parent = node.p
        while parent != None and node == parent.left:
            node = parent
            parent = parent.p
        return parent

    def insert(self, k):
        t = TreeNode(k)
        parent = None
        node = self.root
        while node != None:
            parent = node
            if node.key > t.key:
                node = node.left
            else:
                node = node.right
        t.p = parent
        if parent == None:
            self.root = t
        elif t.key < parent.key:
            parent.left = t
        else:
            parent.right = t
        return t


    def delete(self, node):
        if node.left == None:
            self.transplant(node, node.right)
        elif node.right == None:
            self.transplant(node, node.left)
        else:
            succ = self.minimum(node.right)
            if succ.p != node:
                self.transplant(succ, succ.right)
                succ.right = node.right
                succ.right.p = succ
            self.transplant(node, succ)
            succ.left = node.left
            succ.left.p = succ

    def transplant(self, node, newnode):
        if node.p == None:
            self.root = newnode
        elif node == node.p.left:
            node.p.left = newnode
        else:
            node.p.right = newnode
        if newnode != None:
            newnode.p = node.p
import random

class TreeNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.p = None

class BinaryTree:
    def __init__(self):
        self.root = None

    def length(self):
        return self.size

    def inorder(self, node):
        if node == None:
            return None
        else:
            self.inorder(node.left)
            print node.key,
            self.inorder(node.right)

    def search(self, k):
        node = self.root
        while node != None:
            if node.key == k:
                return node
            if node.key > k:
                node = node.left
            else:
                node = node.right
        return None

    def minimum(self, node):
        x = None
        while node.left != None:
            x = node.left
            node = node.left
        return x

    def maximum(self, node):
        x = None
        while node.right != None:
            x = node.right
            node = node.right
        return x

    def successor(self, node):
        parent = None
        if node.right != None:
            return self.minimum(node.right)
        parent = node.p
        while parent != None and node == parent.right:
            node = parent
            parent = parent.p
        return parent

    def predecessor(self, node):
        parent = None
        if node.left != None:
            return self.maximum(node.left)
        parent = node.p
        while parent != None and node == parent.left:
            node = parent
            parent = parent.p
        return parent

    def insert(self, k):
        t = TreeNode(k)
        parent = None
        node = self.root
        while node != None:
            parent = node
            if node.key > t.key:
                node = node.left
            else:
                node = node.right
        t.p = parent
        if parent == None:
            self.root = t
        elif t.key < parent.key:
            parent.left = t
        else:
            parent.right = t
        return t


    def delete(self, node):
        if node.left == None:
            self.transplant(node, node.right)
        elif node.right == None:
            self.transplant(node, node.left)
        else:
            succ = self.minimum(node.right)
            if succ.p != node:
                self.transplant(succ, succ.right)
                succ.right = node.right
                succ.right.p = succ
            self.transplant(node, succ)
            succ.left = node.left
            succ.left.p = succ

    def transplant(self, node, newnode):
        if node.p == None:
            self.root = newnode
        elif node == node.p.left:
            node.p.left = newnode
        else:
            node.p.right = newnode
        if newnode != None:
            newnode.p = node.p

回答 11

此实现支持插入,查找和删除操作,而不会破坏树的结构。这不是平衡树。

# Class for construct the nodes of the tree. (Subtrees)
class Node:
def __init__(self, key, parent_node = None):
    self.left = None
    self.right = None
    self.key = key
    if parent_node == None:
        self.parent = self
    else:
        self.parent = parent_node

# Class with the  structure of the tree. 
# This Tree is not balanced.
class Tree:
def __init__(self):
    self.root = None

# Insert a single element
def insert(self, x):
    if(self.root == None):
        self.root = Node(x)
    else:
        self._insert(x, self.root)

def _insert(self, x, node):
    if(x < node.key):
        if(node.left == None):
            node.left = Node(x, node)
        else:
            self._insert(x, node.left)
    else:
        if(node.right == None):
            node.right = Node(x, node)
        else:
            self._insert(x, node.right)

# Given a element, return a node in the tree with key x. 
def find(self, x):
    if(self.root == None):
        return None
    else:
        return self._find(x, self.root)
def _find(self, x, node):
    if(x == node.key):
        return node
    elif(x < node.key):
        if(node.left == None):
            return None
        else:
            return self._find(x, node.left)
    elif(x > node.key):
        if(node.right == None):
            return None
        else:
            return self._find(x, node.right)

# Given a node, return the node in the tree with the next largest element.
def next(self, node):
    if node.right != None:
        return self._left_descendant(node.right)
    else:
        return self._right_ancestor(node)

def _left_descendant(self, node):
    if node.left == None:
        return node
    else:
        return self._left_descendant(node.left)

def _right_ancestor(self, node):
    if node.key <= node.parent.key:
        return node.parent
    else:
        return self._right_ancestor(node.parent)

# Delete an element of the tree
def delete(self, x):
    node = self.find(x)
    if node == None:
        print(x, "isn't in the tree")
    else:
        if node.right == None:
            if node.left == None:
                if node.key < node.parent.key:
                    node.parent.left = None
                    del node # Clean garbage
                else:
                    node.parent.right = None
                    del Node # Clean garbage
            else:
                node.key = node.left.key
                node.left = None
        else:
            x = self.next(node)
            node.key = x.key
            x = None


# tests
t = Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)

t.delete(8)
t.delete(5)

t.insert(9)
t.insert(1)

t.delete(2)
t.delete(100)

# Remember: Find method return the node object. 
# To return a number use t.find(nº).key
# But it will cause an error if the number is not in the tree.
print(t.find(5)) 
print(t.find(8))
print(t.find(4))
print(t.find(6))
print(t.find(9))

This implementation supports insert, find and delete operations without destroy the structure of the tree. This is not a banlanced tree.

# Class for construct the nodes of the tree. (Subtrees)
class Node:
def __init__(self, key, parent_node = None):
    self.left = None
    self.right = None
    self.key = key
    if parent_node == None:
        self.parent = self
    else:
        self.parent = parent_node

# Class with the  structure of the tree. 
# This Tree is not balanced.
class Tree:
def __init__(self):
    self.root = None

# Insert a single element
def insert(self, x):
    if(self.root == None):
        self.root = Node(x)
    else:
        self._insert(x, self.root)

def _insert(self, x, node):
    if(x < node.key):
        if(node.left == None):
            node.left = Node(x, node)
        else:
            self._insert(x, node.left)
    else:
        if(node.right == None):
            node.right = Node(x, node)
        else:
            self._insert(x, node.right)

# Given a element, return a node in the tree with key x. 
def find(self, x):
    if(self.root == None):
        return None
    else:
        return self._find(x, self.root)
def _find(self, x, node):
    if(x == node.key):
        return node
    elif(x < node.key):
        if(node.left == None):
            return None
        else:
            return self._find(x, node.left)
    elif(x > node.key):
        if(node.right == None):
            return None
        else:
            return self._find(x, node.right)

# Given a node, return the node in the tree with the next largest element.
def next(self, node):
    if node.right != None:
        return self._left_descendant(node.right)
    else:
        return self._right_ancestor(node)

def _left_descendant(self, node):
    if node.left == None:
        return node
    else:
        return self._left_descendant(node.left)

def _right_ancestor(self, node):
    if node.key <= node.parent.key:
        return node.parent
    else:
        return self._right_ancestor(node.parent)

# Delete an element of the tree
def delete(self, x):
    node = self.find(x)
    if node == None:
        print(x, "isn't in the tree")
    else:
        if node.right == None:
            if node.left == None:
                if node.key < node.parent.key:
                    node.parent.left = None
                    del node # Clean garbage
                else:
                    node.parent.right = None
                    del Node # Clean garbage
            else:
                node.key = node.left.key
                node.left = None
        else:
            x = self.next(node)
            node.key = x.key
            x = None


# tests
t = Tree()
t.insert(5)
t.insert(8)
t.insert(3)
t.insert(4)
t.insert(6)
t.insert(2)

t.delete(8)
t.delete(5)

t.insert(9)
t.insert(1)

t.delete(2)
t.delete(100)

# Remember: Find method return the node object. 
# To return a number use t.find(nº).key
# But it will cause an error if the number is not in the tree.
print(t.find(5)) 
print(t.find(8))
print(t.find(4))
print(t.find(6))
print(t.find(9))

回答 12

我知道已经发布了许多好的解决方案,但是对于二叉树,我通常采用不同的方法:使用某些Node类并直接实现它更具可读性,但是当您有很多节点时,对于内存可能会变得非常贪婪,所以我建议增加一层复杂性并将节点存储在python列表中,然后仅使用该列表来模拟树的行为。

您仍然可以定义Node类,以在需要时最终表示树中的节点,但是将它们以简单的形式[value,left,right]保留在列表中将使用一半的内存或更少的内存!

这是二进制搜索树类的快速示例,该类将节点存储在数组中。它提供了基本功能,例如添加,删除,查找…

"""
Basic Binary Search Tree class without recursion...
"""

__author__ = "@fbparis"

class Node(object):
    __slots__ = "value", "parent", "left", "right"
    def __init__(self, value, parent=None, left=None, right=None):
        self.value = value
        self.parent = parent
        self.left = left
        self.right = right

    def __repr__(self):
        return "<%s object at %s: parent=%s, left=%s, right=%s, value=%s>" % (self.__class__.__name__, hex(id(self)), self.parent, self.left, self.right, self.value)

class BinarySearchTree(object):
    __slots__ = "_tree"
    def __init__(self, *args):
        self._tree = []
        if args:
            for x in args[0]:
                self.add(x)

    def __len__(self):
        return len(self._tree)

    def __repr__(self):
        return "<%s object at %s with %d nodes>" % (self.__class__.__name__, hex(id(self)), len(self))

    def __str__(self, nodes=None, level=0):
        ret = ""
        if nodes is None:
            if len(self):
                nodes = [0]
            else:
                nodes = []
        for node in nodes:
            if node is None:
                continue
            ret += "-" * level + " %s\n" % self._tree[node][0]
            ret += self.__str__(self._tree[node][2:4], level + 1)
        if level == 0:
            ret = ret.strip()
        return ret

    def __contains__(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return False
            return True
        return False

    def __eq__(self, other):
        return self._tree == other._tree

    def add(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    b = self._tree[node_index][2]
                    k = 2
                else:
                    b = self._tree[node_index][3]
                    k = 3
                if b is None:
                    self._tree[node_index][k] = len(self)
                    self._tree.append([value, node_index, None, None])
                    break
                node_index = b
        else:
            self._tree.append([value, None, None, None])

    def remove(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    raise KeyError
            if self._tree[node_index][2] is not None:
                b, d = 2, 3
            elif self._tree[node_index][3] is not None:
                b, d = 3, 2
            else:
                i = node_index
                b = None
            if b is not None:
                i = self._tree[node_index][b]
                while self._tree[i][d] is not None:
                    i = self._tree[i][d]
                p = self._tree[i][1]
                b = self._tree[i][b]
                if p == node_index:
                    self._tree[p][5-d] = b
                else:
                    self._tree[p][d] = b
                if b is not None:
                    self._tree[b][1] = p
                self._tree[node_index][0] = self._tree[i][0]
            else:
                p = self._tree[i][1]
                if p is not None:
                    if self._tree[p][2] == i:
                        self._tree[p][2] = None
                    else:
                        self._tree[p][3] = None
            last = self._tree.pop()
            n = len(self)
            if i < n:
                self._tree[i] = last[:]
                if last[2] is not None:
                    self._tree[last[2]][1] = i
                if last[3] is not None:
                    self._tree[last[3]][1] = i
                if self._tree[last[1]][2] == n:
                    self._tree[last[1]][2] = i
                else:
                    self._tree[last[1]][3] = i
        else:
            raise KeyError

    def find(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return None
            return Node(*self._tree[node_index])
        return None

我添加了一个父属性,以便您可以删除任何节点并维护BST结构。

抱歉,为了便于阅读,尤其是对于“删除”功能。基本上,当一个节点被删除时,我们弹出树数组并用最后一个元素替换它(除非我们想删除最后一个节点)。为了维持BST结构,将删除的节点替换为其左侧子节点的最大值或右侧子节点的最小值,并且必须执行一些操作才能使索引有效,但它必须足够快。

我将这种技术用于更高级的东西,用内部基数trie构建了一些大单词字典,并且我能够将内存消耗除以7-8(您可以在此处看到示例:https : //gist.github.com/fbparis / b3ddd5673b603b42c880974b23db7cda

I know many good solutions have already been posted but I usually have a different approach for binary trees: going with some Node class and implementing it directly is more readable but when you have a lot of nodes it can become very greedy regarding memory, so I suggest adding one layer of complexity and storing the nodes in a python list, and then simulating a tree behavior using only the list.

You can still define a Node class to finally represent the nodes in the tree when needed, but keeping them in a simple form [value, left, right] in a list will use half the memory or less!

Here is a quick example of a Binary Search Tree class storing the nodes in an array. It provides basic fonctions such as add, remove, find…

"""
Basic Binary Search Tree class without recursion...
"""

__author__ = "@fbparis"

class Node(object):
    __slots__ = "value", "parent", "left", "right"
    def __init__(self, value, parent=None, left=None, right=None):
        self.value = value
        self.parent = parent
        self.left = left
        self.right = right

    def __repr__(self):
        return "<%s object at %s: parent=%s, left=%s, right=%s, value=%s>" % (self.__class__.__name__, hex(id(self)), self.parent, self.left, self.right, self.value)

class BinarySearchTree(object):
    __slots__ = "_tree"
    def __init__(self, *args):
        self._tree = []
        if args:
            for x in args[0]:
                self.add(x)

    def __len__(self):
        return len(self._tree)

    def __repr__(self):
        return "<%s object at %s with %d nodes>" % (self.__class__.__name__, hex(id(self)), len(self))

    def __str__(self, nodes=None, level=0):
        ret = ""
        if nodes is None:
            if len(self):
                nodes = [0]
            else:
                nodes = []
        for node in nodes:
            if node is None:
                continue
            ret += "-" * level + " %s\n" % self._tree[node][0]
            ret += self.__str__(self._tree[node][2:4], level + 1)
        if level == 0:
            ret = ret.strip()
        return ret

    def __contains__(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return False
            return True
        return False

    def __eq__(self, other):
        return self._tree == other._tree

    def add(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    b = self._tree[node_index][2]
                    k = 2
                else:
                    b = self._tree[node_index][3]
                    k = 3
                if b is None:
                    self._tree[node_index][k] = len(self)
                    self._tree.append([value, node_index, None, None])
                    break
                node_index = b
        else:
            self._tree.append([value, None, None, None])

    def remove(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    raise KeyError
            if self._tree[node_index][2] is not None:
                b, d = 2, 3
            elif self._tree[node_index][3] is not None:
                b, d = 3, 2
            else:
                i = node_index
                b = None
            if b is not None:
                i = self._tree[node_index][b]
                while self._tree[i][d] is not None:
                    i = self._tree[i][d]
                p = self._tree[i][1]
                b = self._tree[i][b]
                if p == node_index:
                    self._tree[p][5-d] = b
                else:
                    self._tree[p][d] = b
                if b is not None:
                    self._tree[b][1] = p
                self._tree[node_index][0] = self._tree[i][0]
            else:
                p = self._tree[i][1]
                if p is not None:
                    if self._tree[p][2] == i:
                        self._tree[p][2] = None
                    else:
                        self._tree[p][3] = None
            last = self._tree.pop()
            n = len(self)
            if i < n:
                self._tree[i] = last[:]
                if last[2] is not None:
                    self._tree[last[2]][1] = i
                if last[3] is not None:
                    self._tree[last[3]][1] = i
                if self._tree[last[1]][2] == n:
                    self._tree[last[1]][2] = i
                else:
                    self._tree[last[1]][3] = i
        else:
            raise KeyError

    def find(self, value):
        if len(self):
            node_index = 0
            while self._tree[node_index][0] != value:
                if value < self._tree[node_index][0]:
                    node_index = self._tree[node_index][2]
                else:
                    node_index = self._tree[node_index][3]
                if node_index is None:
                    return None
            return Node(*self._tree[node_index])
        return None

I’ve added a parent attribute so that you can remove any node and maintain the BST structure.

Sorry for the readability, especially for the “remove” function. Basically, when a node is removed, we pop the tree array and replace it with the last element (except if we wanted to remove the last node). To maintain the BST structure, the removed node is replaced with the max of its left children or the min of its right children and some operations have to be done in order to keep the indexes valid but it’s fast enough.

I used this technique for more advanced stuff to build some big words dictionaries with an internal radix trie and I was able to divide memory consumption by 7-8 (you can see an example here: https://gist.github.com/fbparis/b3ddd5673b603b42c880974b23db7cda)


回答 13

二进制搜索树的良好实现,取自此处

'''
A binary search Tree
'''
from __future__ import print_function
class Node:

    def __init__(self, label, parent):
        self.label = label
        self.left = None
        self.right = None
        #Added in order to delete a node easier
        self.parent = parent

    def getLabel(self):
        return self.label

    def setLabel(self, label):
        self.label = label

    def getLeft(self):
        return self.left

    def setLeft(self, left):
        self.left = left

    def getRight(self):
        return self.right

    def setRight(self, right):
        self.right = right

    def getParent(self):
        return self.parent

    def setParent(self, parent):
        self.parent = parent

class BinarySearchTree:

    def __init__(self):
        self.root = None

    def insert(self, label):
        # Create a new Node
        new_node = Node(label, None)
        # If Tree is empty
        if self.empty():
            self.root = new_node
        else:
            #If Tree is not empty
            curr_node = self.root
            #While we don't get to a leaf
            while curr_node is not None:
                #We keep reference of the parent node
                parent_node = curr_node
                #If node label is less than current node
                if new_node.getLabel() < curr_node.getLabel():
                #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
            #We insert the new node in a leaf
            if new_node.getLabel() < parent_node.getLabel():
                parent_node.setLeft(new_node)
            else:
                parent_node.setRight(new_node)
            #Set parent to the new node
            new_node.setParent(parent_node)      

    def delete(self, label):
        if (not self.empty()):
            #Look for the node with that label
            node = self.getNode(label)
            #If the node exists
            if(node is not None):
                #If it has no children
                if(node.getLeft() is None and node.getRight() is None):
                    self.__reassignNodes(node, None)
                    node = None
                #Has only right children
                elif(node.getLeft() is None and node.getRight() is not None):
                    self.__reassignNodes(node, node.getRight())
                #Has only left children
                elif(node.getLeft() is not None and node.getRight() is None):
                    self.__reassignNodes(node, node.getLeft())
                #Has two children
                else:
                    #Gets the max value of the left branch
                    tmpNode = self.getMax(node.getLeft())
                    #Deletes the tmpNode
                    self.delete(tmpNode.getLabel())
                    #Assigns the value to the node to delete and keesp tree structure
                    node.setLabel(tmpNode.getLabel())

    def getNode(self, label):
        curr_node = None
        #If the tree is not empty
        if(not self.empty()):
            #Get tree root
            curr_node = self.getRoot()
            #While we don't find the node we look for
            #I am using lazy evaluation here to avoid NoneType Attribute error
            while curr_node is not None and curr_node.getLabel() is not label:
                #If node label is less than current node
                if label < curr_node.getLabel():
                    #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
        return curr_node

    def getMax(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the right branch
            curr_node = self.getRoot()
        if(not self.empty()):
            while(curr_node.getRight() is not None):
                curr_node = curr_node.getRight()
        return curr_node

    def getMin(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the left branch
            curr_node = self.getRoot()
        if(not self.empty()):
            curr_node = self.getRoot()
            while(curr_node.getLeft() is not None):
                curr_node = curr_node.getLeft()
        return curr_node

    def empty(self):
        if self.root is None:
            return True
        return False

    def __InOrderTraversal(self, curr_node):
        nodeList = []
        if curr_node is not None:
            nodeList.insert(0, curr_node)
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())
        return nodeList

    def getRoot(self):
        return self.root

    def __isRightChildren(self, node):
        if(node == node.getParent().getRight()):
            return True
        return False

    def __reassignNodes(self, node, newChildren):
        if(newChildren is not None):
            newChildren.setParent(node.getParent())
        if(node.getParent() is not None):
            #If it is the Right Children
            if(self.__isRightChildren(node)):
                node.getParent().setRight(newChildren)
            else:
                #Else it is the left children
                node.getParent().setLeft(newChildren)

    #This function traversal the tree. By default it returns an
    #In order traversal list. You can pass a function to traversal
    #The tree as needed by client code
    def traversalTree(self, traversalFunction = None, root = None):
        if(traversalFunction is None):
            #Returns a list of nodes in preOrder by default
            return self.__InOrderTraversal(self.root)
        else:
            #Returns a list of nodes in the order that the users wants to
            return traversalFunction(self.root)

    #Returns an string of all the nodes labels in the list 
    #In Order Traversal
    def __str__(self):
        list = self.__InOrderTraversal(self.root)
        str = ""
        for x in list:
            str = str + " " + x.getLabel().__str__()
        return str

def InPreOrder(curr_node):
    nodeList = []
    if curr_node is not None:
        nodeList = nodeList + InPreOrder(curr_node.getLeft())
        nodeList.insert(0, curr_node.getLabel())
        nodeList = nodeList + InPreOrder(curr_node.getRight())
    return nodeList

def testBinarySearchTree():
    r'''
    Example
                  8
                 / \
                3   10
               / \    \
              1   6    14
                 / \   /
                4   7 13 
    '''

    r'''
    Example After Deletion
                  7
                 / \
                1   4

    '''
    t = BinarySearchTree()
    t.insert(8)
    t.insert(3)
    t.insert(6)
    t.insert(1)
    t.insert(10)
    t.insert(14)
    t.insert(13)
    t.insert(4)
    t.insert(7)

    #Prints all the elements of the list in order traversal
    print(t.__str__())

    if(t.getNode(6) is not None):
        print("The label 6 exists")
    else:
        print("The label 6 doesn't exist")

    if(t.getNode(-1) is not None):
        print("The label -1 exists")
    else:
        print("The label -1 doesn't exist")

    if(not t.empty()):
        print(("Max Value: ", t.getMax().getLabel()))
        print(("Min Value: ", t.getMin().getLabel()))

    t.delete(13)
    t.delete(10)
    t.delete(8)
    t.delete(3)
    t.delete(6)
    t.delete(14)

    #Gets all the elements of the tree In pre order
    #And it prints them
    list = t.traversalTree(InPreOrder, t.root)
    for x in list:
        print(x)

if __name__ == "__main__":
    testBinarySearchTree()

A good implementation of binary search tree, taken from here:

'''
A binary search Tree
'''
from __future__ import print_function
class Node:

    def __init__(self, label, parent):
        self.label = label
        self.left = None
        self.right = None
        #Added in order to delete a node easier
        self.parent = parent

    def getLabel(self):
        return self.label

    def setLabel(self, label):
        self.label = label

    def getLeft(self):
        return self.left

    def setLeft(self, left):
        self.left = left

    def getRight(self):
        return self.right

    def setRight(self, right):
        self.right = right

    def getParent(self):
        return self.parent

    def setParent(self, parent):
        self.parent = parent

class BinarySearchTree:

    def __init__(self):
        self.root = None

    def insert(self, label):
        # Create a new Node
        new_node = Node(label, None)
        # If Tree is empty
        if self.empty():
            self.root = new_node
        else:
            #If Tree is not empty
            curr_node = self.root
            #While we don't get to a leaf
            while curr_node is not None:
                #We keep reference of the parent node
                parent_node = curr_node
                #If node label is less than current node
                if new_node.getLabel() < curr_node.getLabel():
                #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
            #We insert the new node in a leaf
            if new_node.getLabel() < parent_node.getLabel():
                parent_node.setLeft(new_node)
            else:
                parent_node.setRight(new_node)
            #Set parent to the new node
            new_node.setParent(parent_node)      

    def delete(self, label):
        if (not self.empty()):
            #Look for the node with that label
            node = self.getNode(label)
            #If the node exists
            if(node is not None):
                #If it has no children
                if(node.getLeft() is None and node.getRight() is None):
                    self.__reassignNodes(node, None)
                    node = None
                #Has only right children
                elif(node.getLeft() is None and node.getRight() is not None):
                    self.__reassignNodes(node, node.getRight())
                #Has only left children
                elif(node.getLeft() is not None and node.getRight() is None):
                    self.__reassignNodes(node, node.getLeft())
                #Has two children
                else:
                    #Gets the max value of the left branch
                    tmpNode = self.getMax(node.getLeft())
                    #Deletes the tmpNode
                    self.delete(tmpNode.getLabel())
                    #Assigns the value to the node to delete and keesp tree structure
                    node.setLabel(tmpNode.getLabel())

    def getNode(self, label):
        curr_node = None
        #If the tree is not empty
        if(not self.empty()):
            #Get tree root
            curr_node = self.getRoot()
            #While we don't find the node we look for
            #I am using lazy evaluation here to avoid NoneType Attribute error
            while curr_node is not None and curr_node.getLabel() is not label:
                #If node label is less than current node
                if label < curr_node.getLabel():
                    #We go left
                    curr_node = curr_node.getLeft()
                else:
                    #Else we go right
                    curr_node = curr_node.getRight()
        return curr_node

    def getMax(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the right branch
            curr_node = self.getRoot()
        if(not self.empty()):
            while(curr_node.getRight() is not None):
                curr_node = curr_node.getRight()
        return curr_node

    def getMin(self, root = None):
        if(root is not None):
            curr_node = root
        else:
            #We go deep on the left branch
            curr_node = self.getRoot()
        if(not self.empty()):
            curr_node = self.getRoot()
            while(curr_node.getLeft() is not None):
                curr_node = curr_node.getLeft()
        return curr_node

    def empty(self):
        if self.root is None:
            return True
        return False

    def __InOrderTraversal(self, curr_node):
        nodeList = []
        if curr_node is not None:
            nodeList.insert(0, curr_node)
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getLeft())
            nodeList = nodeList + self.__InOrderTraversal(curr_node.getRight())
        return nodeList

    def getRoot(self):
        return self.root

    def __isRightChildren(self, node):
        if(node == node.getParent().getRight()):
            return True
        return False

    def __reassignNodes(self, node, newChildren):
        if(newChildren is not None):
            newChildren.setParent(node.getParent())
        if(node.getParent() is not None):
            #If it is the Right Children
            if(self.__isRightChildren(node)):
                node.getParent().setRight(newChildren)
            else:
                #Else it is the left children
                node.getParent().setLeft(newChildren)

    #This function traversal the tree. By default it returns an
    #In order traversal list. You can pass a function to traversal
    #The tree as needed by client code
    def traversalTree(self, traversalFunction = None, root = None):
        if(traversalFunction is None):
            #Returns a list of nodes in preOrder by default
            return self.__InOrderTraversal(self.root)
        else:
            #Returns a list of nodes in the order that the users wants to
            return traversalFunction(self.root)

    #Returns an string of all the nodes labels in the list 
    #In Order Traversal
    def __str__(self):
        list = self.__InOrderTraversal(self.root)
        str = ""
        for x in list:
            str = str + " " + x.getLabel().__str__()
        return str

def InPreOrder(curr_node):
    nodeList = []
    if curr_node is not None:
        nodeList = nodeList + InPreOrder(curr_node.getLeft())
        nodeList.insert(0, curr_node.getLabel())
        nodeList = nodeList + InPreOrder(curr_node.getRight())
    return nodeList

def testBinarySearchTree():
    r'''
    Example
                  8
                 / \
                3   10
               / \    \
              1   6    14
                 / \   /
                4   7 13 
    '''

    r'''
    Example After Deletion
                  7
                 / \
                1   4

    '''
    t = BinarySearchTree()
    t.insert(8)
    t.insert(3)
    t.insert(6)
    t.insert(1)
    t.insert(10)
    t.insert(14)
    t.insert(13)
    t.insert(4)
    t.insert(7)

    #Prints all the elements of the list in order traversal
    print(t.__str__())

    if(t.getNode(6) is not None):
        print("The label 6 exists")
    else:
        print("The label 6 doesn't exist")

    if(t.getNode(-1) is not None):
        print("The label -1 exists")
    else:
        print("The label -1 doesn't exist")

    if(not t.empty()):
        print(("Max Value: ", t.getMax().getLabel()))
        print(("Min Value: ", t.getMin().getLabel()))

    t.delete(13)
    t.delete(10)
    t.delete(8)
    t.delete(3)
    t.delete(6)
    t.delete(14)

    #Gets all the elements of the tree In pre order
    #And it prints them
    list = t.traversalTree(InPreOrder, t.root)
    for x in list:
        print(x)

if __name__ == "__main__":
    testBinarySearchTree()

回答 14

我想展示@apadana方法的一种变体,当有大量节点时,它会更有用:

'''
Suppose we have the following tree
      10
    /    \
  11      9
 /  \     / \
7   12  15   8
'''
# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist = [10,11,7,9,15,8,12]; n = []
for i in range(len(nlist)): n.append(Node(nlist[i]))

# Step 2 - Set each node position
n[0].left  = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]

I want to show a variation of @apadana’s method, which is more useful when there is a considerable number of nodes:

'''
Suppose we have the following tree
      10
    /    \
  11      9
 /  \     / \
7   12  15   8
'''
# Step 1 - Create nodes - Use a list instead of defining each node separately
nlist = [10,11,7,9,15,8,12]; n = []
for i in range(len(nlist)): n.append(Node(nlist[i]))

# Step 2 - Set each node position
n[0].left  = n[1]
n[1].left = n[2]
n[0].right = n[3]
n[3].left = n[4]
n[3].right = n[5]
n[1].right = n[6]

回答 15

class Node:
    """
    single Node for tree
    """

    def __init__(self, data):
        self.data = data
        self.right = None
        self.left = None


class binaryTree:
    """
    binary tree implementation
    """

    def __init__(self):
        self.root = None

    def push(self, element, node=None):
        if node is None:
            node = self.root

        if self.root is None:
            self.root = Node(element)

        else:
            if element < node.data:
                if node.left is not None:
                    self.push(element, node.left)
                else:
                    node.left = Node(element)
            else:
                if node.right is not None:
                    self.push(element, node.right)
                else:
                    node.right = Node(element)

    def __str__(self):
        self.printInorder(self.root)
        return "\n"

    def printInorder(self, node):
        """
        print tree in inorder
        """
        if node is not None:
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)


def main():
    """
    Main code and logic comes here
    """
    tree = binaryTree()
    tree.push(5)
    tree.push(3)
    tree.push(1)
    tree.push(3)
    tree.push(0)
    tree.push(2)
    tree.push(9)
    tree.push(10)
    print(tree)


if __name__ == "__main__":
    main()
class Node:
    """
    single Node for tree
    """

    def __init__(self, data):
        self.data = data
        self.right = None
        self.left = None


class binaryTree:
    """
    binary tree implementation
    """

    def __init__(self):
        self.root = None

    def push(self, element, node=None):
        if node is None:
            node = self.root

        if self.root is None:
            self.root = Node(element)

        else:
            if element < node.data:
                if node.left is not None:
                    self.push(element, node.left)
                else:
                    node.left = Node(element)
            else:
                if node.right is not None:
                    self.push(element, node.right)
                else:
                    node.right = Node(element)

    def __str__(self):
        self.printInorder(self.root)
        return "\n"

    def printInorder(self, node):
        """
        print tree in inorder
        """
        if node is not None:
            self.printInorder(node.left)
            print(node.data)
            self.printInorder(node.right)


def main():
    """
    Main code and logic comes here
    """
    tree = binaryTree()
    tree.push(5)
    tree.push(3)
    tree.push(1)
    tree.push(3)
    tree.push(0)
    tree.push(2)
    tree.push(9)
    tree.push(10)
    print(tree)


if __name__ == "__main__":
    main()

回答 16

Python中的二叉树

 class Tree(object):
    def __init__(self):
        self.data=None
        self.left=None
        self.right=None
    def insert(self, x, root):
        if root==None:
            t=node(x)
            t.data=x
            t.right=None
            t.left=None
            root=t
            return root
        elif x<root.data:
            root.left=self.insert(x, root.left)
        else:
            root.right=self.insert(x, root.right)
        return root

    def printTree(self, t):
        if t==None:
            return

        self.printTree(t.left)
        print t.data
        self.printTree(t.right)

class node(object):
    def __init__(self, x):
        self.x=x

bt=Tree()
root=None
n=int(raw_input())
a=[]
for i in range(n):
    a.append(int(raw_input()))
for i in range(n):
    root=bt.insert(a[i], root)
bt.printTree(root)

Binary Tree in Python

 class Tree(object):
    def __init__(self):
        self.data=None
        self.left=None
        self.right=None
    def insert(self, x, root):
        if root==None:
            t=node(x)
            t.data=x
            t.right=None
            t.left=None
            root=t
            return root
        elif x<root.data:
            root.left=self.insert(x, root.left)
        else:
            root.right=self.insert(x, root.right)
        return root

    def printTree(self, t):
        if t==None:
            return

        self.printTree(t.left)
        print t.data
        self.printTree(t.right)

class node(object):
    def __init__(self, x):
        self.x=x

bt=Tree()
root=None
n=int(raw_input())
a=[]
for i in range(n):
    a.append(int(raw_input()))
for i in range(n):
    root=bt.insert(a[i], root)
bt.printTree(root)

回答 17

这是一个简单的解决方案,可以使用递归方法来构建二叉树,以在下面的代码中使用遍历顺序来显示树。

class Node(object):

    def __init__(self):
        self.left = None
        self.right = None
        self.value = None
    @property
    def get_value(self):
        return self.value

    @property
    def get_left(self):
        return self.left

    @property
    def get_right(self):
        return self.right

    @get_left.setter
    def set_left(self, left_node):
        self.left = left_node
    @get_value.setter
    def set_value(self, value):
        self.value = value
    @get_right.setter
    def set_right(self, right_node):
        self.right = right_node



    def create_tree(self):
        _node = Node() #creating new node.
        _x = input("Enter the node data(-1 for null)")
        if(_x == str(-1)): #for defining no child.
            return None
        _node.set_value = _x #setting the value of the node.
        print("Enter the left child of {}".format(_x))
        _node.set_left = self.create_tree() #setting the left subtree
        print("Enter the right child of {}".format(_x))
        _node.set_right = self.create_tree() #setting the right subtree.

        return _node

    def pre_order(self, root):
        if root is not None:
            print(root.get_value)
            self.pre_order(root.get_left)
            self.pre_order(root.get_right)

if __name__ == '__main__':
    node = Node()
    root_node = node.create_tree()
    node.pre_order(root_node)

代码取自:Python中的二叉树

Here is a simple solution which can be used to build a binary tree using a recursive approach to display the tree in order traversal has been used in the below code.

class Node(object):

    def __init__(self):
        self.left = None
        self.right = None
        self.value = None
    @property
    def get_value(self):
        return self.value

    @property
    def get_left(self):
        return self.left

    @property
    def get_right(self):
        return self.right

    @get_left.setter
    def set_left(self, left_node):
        self.left = left_node
    @get_value.setter
    def set_value(self, value):
        self.value = value
    @get_right.setter
    def set_right(self, right_node):
        self.right = right_node



    def create_tree(self):
        _node = Node() #creating new node.
        _x = input("Enter the node data(-1 for null)")
        if(_x == str(-1)): #for defining no child.
            return None
        _node.set_value = _x #setting the value of the node.
        print("Enter the left child of {}".format(_x))
        _node.set_left = self.create_tree() #setting the left subtree
        print("Enter the right child of {}".format(_x))
        _node.set_right = self.create_tree() #setting the right subtree.

        return _node

    def pre_order(self, root):
        if root is not None:
            print(root.get_value)
            self.pre_order(root.get_left)
            self.pre_order(root.get_right)

if __name__ == '__main__':
    node = Node()
    root_node = node.create_tree()
    node.pre_order(root_node)

Code taken from : Binary Tree in Python


在元组列表中查找元素

问题:在元组列表中查找元素

我有一个清单“ a”

a= [(1,2),(1,4),(3,5),(5,7)]

我需要找到一个特定数字的所有元组。说1

result = [(1,2),(1,4)]

我怎么做?

I have a list ‘a’

a= [(1,2),(1,4),(3,5),(5,7)]

I need to find all the tuples for a particular number. say for 1 it will be

result = [(1,2),(1,4)]

How do I do that?


回答 0

如果只希望第一个数字匹配,则可以这样操作:

[item for item in a if item[0] == 1]

如果您仅搜索其中包含1的元组:

[item for item in a if 1 in item]

If you just want the first number to match you can do it like this:

[item for item in a if item[0] == 1]

If you are just searching for tuples with 1 in them:

[item for item in a if 1 in item]

回答 1

实际上,有一种聪明的方法可以用于任何元组列表,其中每个元组的大小为2:您可以将列表转换成一个字典。

例如,

test = [("hi", 1), ("there", 2)]
test = dict(test)
print test["hi"] # prints 1

There is actually a clever way to do this that is useful for any list of tuples where the size of each tuple is 2: you can convert your list into a single dictionary.

For example,

test = [("hi", 1), ("there", 2)]
test = dict(test)
print test["hi"] # prints 1

回答 2

阅读列表理解

[ (x,y) for x, y in a if x  == 1 ]

还要阅读生成器函数yield语句。

def filter_value( someList, value ):
    for x, y in someList:
        if x == value :
            yield x,y

result= list( filter_value( a, 1 ) )

Read up on List Comprehensions

[ (x,y) for x, y in a if x  == 1 ]

Also read up up generator functions and the yield statement.

def filter_value( someList, value ):
    for x, y in someList:
        if x == value :
            yield x,y

result= list( filter_value( a, 1 ) )

回答 3

[tup for tup in a if tup[0] == 1]
[tup for tup in a if tup[0] == 1]

回答 4

for item in a:
   if 1 in item:
       print item
for item in a:
   if 1 in item:
       print item

回答 5

>>> [i for i in a if 1 in i]

[(1,2),(1,4)]

>>> [i for i in a if 1 in i]

[(1, 2), (1, 4)]


回答 6

filter函数还可以提供一个有趣的解决方案:

result = list(filter(lambda x: x.count(1) > 0, a))

它会在列表中的元组中搜索是否出现1。如果搜索仅限于第一个元素,则可以将解决方案修改为:

result = list(filter(lambda x: x[0] == 1, a))

The filter function can also provide an interesting solution:

result = list(filter(lambda x: x.count(1) > 0, a))

which searches the tuples in the list a for any occurrences of 1. If the search is limited to the first element, the solution can be modified into:

result = list(filter(lambda x: x[0] == 1, a))

回答 7

使用过滤功能:

>>> def get_values(iterables,key_to_find):
返回列表(过滤器(lambda x:x中的key_to_find,可迭代)) >>> a = [(1,2 ,,(1,4),(3,5),(5,7)] >>> get_values(a,1) >>> [(1,2),(1,4)]

Using filter function:

>>> def get_values(iterables, key_to_find):
return list(filter(lambda x:key_to_find in x, iterables)) >>> a = [(1,2),(1,4),(3,5),(5,7)] >>> get_values(a, 1) >>> [(1, 2), (1, 4)]

回答 8

takewhile,(此外,还会显示更多值的示例):

>>> a= [(1,2),(1,4),(3,5),(5,7),(0,2)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,a))
[(1, 2), (1, 4)]
>>> 

如果未排序,例如:

>>> a= [(1,2),(3,5),(1,4),(5,7)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,sorted(a,key=lambda x: x[0]==1)))
[(1, 2), (1, 4)]
>>> 

Or takewhile, ( addition to this, example of more values is shown ):

>>> a= [(1,2),(1,4),(3,5),(5,7),(0,2)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,a))
[(1, 2), (1, 4)]
>>> 

if unsorted, like:

>>> a= [(1,2),(3,5),(1,4),(5,7)]
>>> import itertools
>>> list(itertools.takewhile(lambda x: x[0]==1,sorted(a,key=lambda x: x[0]==1)))
[(1, 2), (1, 4)]
>>> 

回答 9

如果要在元组中搜索元组中存在的任何数字,则可以使用

a= [(1,2),(1,4),(3,5),(5,7)]
i=1
result=[]
for j in a:
    if i in j:
        result.append(j)

print(result)

if i==j[0] or i==j[index]如果要搜索特定索引中的数字,也可以使用

if you want to search tuple for any number which is present in tuple then you can use

a= [(1,2),(1,4),(3,5),(5,7)]
i=1
result=[]
for j in a:
    if i in j:
        result.append(j)

print(result)

You can also use if i==j[0] or i==j[index] if you want to search a number in particular index


如何在Python中找到与正则表达式的所有匹配项?

问题:如何在Python中找到与正则表达式的所有匹配项?

在我编写的程序中,我使用Python re.search()函数在文本块中查找匹配项并打印结果。但是,一旦找到文本块中的第一个匹配项,程序就会退出。

在找到所有匹配项之前程序不停止的情况下,如何重复执行此操作?是否有单独的功能来执行此操作?

In a program I’m writing I have Python use the re.search() function to find matches in a block of text and print the results. However, the program exits once it finds the first match in the block of text.

How do I do this repeatedly where the program doesn’t stop until ALL matches have been found? Is there a separate function to do this?


回答 0

使用re.findallre.finditer代替。

re.findall(pattern, string) 返回匹配字符串的列表。

re.finditer(pattern, string)返回MatchObject对象上的迭代器。

例:

re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']

[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']

Use re.findall or re.finditer instead.

re.findall(pattern, string) returns a list of matching strings.

re.finditer(pattern, string) returns an iterator over MatchObject objects.

Example:

re.findall( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')
# Output: ['cats', 'dogs']

[x.group() for x in re.finditer( r'all (.*?) are', 'all cats are smarter than dogs, all dogs are dumber than cats')]
# Output: ['all cats are', 'all dogs are']

在numpy数组中查找最接近的值

问题:在numpy数组中查找最接近的值

是否有numpy-thonic方法(例如函数)在数组中查找最接近的值

例:

np.find_nearest( array, value )

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

回答 0

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261
import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261

回答 1

如果您的数组已排序并且非常大,则这是一个更快的解决方案:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

这可以扩展到非常大的阵列。如果您不能假定数组已经排序,则可以轻松修改上面的内容以对方法进行排序。对于小型阵列而言,这是过大的杀伤力,但是一旦阵列变大,速度就会更快。

IF your array is sorted and is very large, this is a much faster solution:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

This scales to very large arrays. You can easily modify the above to sort in the method if you can’t assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.


回答 2

稍作修改,上面的答案就可以用于任意维数(1d,2d,3d等)的数组:

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

或者,写成一行:

a.flat[np.abs(a - a0).argmin()]

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, …):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]

回答 3

答案摘要:如果已排序,array则二等分代码(如下所示)执行最快。大型阵列的速度要快〜100-1000倍,小型阵列的速度要快〜2-100倍。它也不需要numpy。如果您有一个未排序的,array则如果array为大,则应首先考虑使用O(n logn)排序,然后再按等分;如果array为小,则方法2似乎是最快的。

首先,您应该弄清最近值的含义。通常人们想要一个横坐标的间隔,例如array = [0,0.7,2.1],value = 1.95,答案将是idx = 1。我怀疑您是这种情况(否则,一旦找到间隔,可以使用后续条件语句很容易地修改以下内容)。我将注意到,执行此操作的最佳方法是使用二分法(我将首先提供它-请注意,它根本不需要numpy,并且比使用numpy函数要快,因为它们执行冗余操作)。然后,我将与其他用户在此处介绍的其他项目进行时间比较。

二等分:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

现在,我将从其他答案中定义代码,它们每个都返回一个索引:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

现在,我将对代码进行计时: 注意方法1,2,4,5没有正确给出间隔。方法1,2,4舍入到数组中的最近点(例如> = 1.5-> 2),方法5始终舍入(例如1.45-> 2)。只有方法3和6,当然还有二等分,才能正确给出间隔。

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

对于大型阵列,二等分与次优的180us和最长的1.21ms相比较给出4us(约快100-1000倍)。对于较小的阵列,速度要快2到100倍。

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first – note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I’ll define the code from the other answers, they each return an index:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I’ll time the codes: Note methods 1,2,4,5 don’t correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 – 1000 times faster). For smaller arrays it’s ~2-100 times faster.


回答 4

这是在向量数组中查找最近的向量的扩展。

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

Here’s an extension to find the nearest vector in an array of vectors.

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

回答 5

如果您不想使用numpy,可以这样做:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

If you don’t want to use numpy this will do it:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

回答 6

这是将处理非标量“值”数组的版本:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

如果输入是标量,则返回一个数字类型(例如,int,float)的版本:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

Here’s a version that will handle a non-scalar “values” array:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

回答 7

这是@Ari Onasafari的scipy版本,请回答“ 在向量数组中查找最近的向量

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

Here is a version with scipy for @Ari Onasafari, answer “to find the nearest vector in an array of vectors

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

回答 8

如果您有很多values要搜索的东西,这是@Dimitri解决方案的快速向量化版本(values可以是多维数组):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

基准测试

比使用for@Demitri解决方案的循环快100倍以上

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

Here is a fast vectorized version of @Dimitri’s solution if you have many values to search for (values can be multi-dimensional array):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri’s solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

回答 9

对于大型数组,@ Demitri给出的(出色)答案远远快于当前标记为最佳的答案。我通过以下两种方式调整了他的确切算法:

  1. 无论输入数组是否已排序,下面的函数均有效。

  2. 下面的函数返回与最接近的值相对应的输入数组的索引,该值更为通用。

请注意,下面的函数还处理特定的边缘情况,这会导致@Demitri编写的原始函数存在错误。否则,我的算法与他的算法相同。

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

For large arrays, the (excellent) answer given by @Demitri is far faster than the answer currently marked as best. I’ve adapted his exact algorithm in the following two ways:

  1. The function below works whether or not the input array is sorted.

  2. The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.

Note that the function below also handles a specific edge case that would lead to a bug in the original function written by @Demitri. Otherwise, my algorithm is identical to his.

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

回答 10

这是unutbu答案的矢量化版本:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

This is a vectorized version of unutbu’s answer:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

回答 11

我认为最Python化的方式是:

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本代码。您可以根据需要将其用作功能

I think the most pythonic way would be:

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

This is the basic code. You can use it as a function if you want


回答 12

所有答案都有助于收集信息以编写高效的代码。但是,我编写了一个小的Python脚本来针对各种情况进行优化。如果对提供的数组进行了排序,那将是最好的情况。如果搜索指定值最近点的索引,则bisect模块是最省时的。当一个搜索索引对应于一个数组时,numpy searchsorted效率最高。

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

在[63]中:%time bisect.bisect_left(xlist,0.3)CPU时间:用户0 ns,sys:0 ns,总计:0 ns墙壁时间:22.2 µs

np.searchsorted(xar, 0.3, side="left")

在[64]中:%time np.searchsorted(xar,0.3,side =“ left”)CPU时间:用户0 ns,sys:0 ns,总计:0 ns挂墙时间:98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar,randpts,side =“ left”)CPU时间:用户4 ms,sys:0 ns,总计:4 ms挂墙时间:1.2 ms

如果我们遵循乘法规则,那么numpy应该花费〜100毫秒,这意味着〜83X更快。

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

np.searchsorted(xar, 0.3, side="left")

In [64]: %time np.searchsorted(xar, 0.3, side=”left”) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side=”left”) CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.


回答 13

对于2d数组,确定最近元素的i,j位置:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

For 2d array, to determine the i, j position of nearest element:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

回答 14

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))
import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

回答 15

可能对ndarrays

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

Maybe helpful for ndarrays:

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

re.search和re.match有什么区别?

问题:re.search和re.match有什么区别?

search()Python 模块中的match()函数和有什么区别?re

我已经阅读了文档当前文档),但是我似乎从未记得它。我一直在查找并重新学习它。我希望有人会用示例清楚地回答它,以便(也许)它会贴在我的头上。或者至少我将有一个更好的地方来回答我的问题,并且重新学习它所花的时间会更少。

What is the difference between the search() and match() functions in the Python re module?

I’ve read the documentation (current documentation), but I never seem to remember it. I keep having to look it up and re-learn it. I’m hoping that someone will answer it clearly with examples so that (perhaps) it will stick in my head. Or at least I’ll have a better place to return with my question and it will take less time to re-learn it.


回答 0

re.match锚定在字符串的开头。这与换行无关,因此它与^在模式中使用的方式不同。

重新匹配文档所述

如果字符串开头的零个或多个字符 与正则表达式模式匹配,则返回相应的MatchObject实例。None如果字符串与模式不匹配,则返回;否则返回false。请注意,这与零长度匹配不同。

注意:如果要在字符串中的任何位置找到匹配项,请search() 改用。

re.search搜索整个字符串,如文档所述

扫描字符串以查找正则表达式模式产生匹配项的位置,然后返回相应的MatchObject实例。None如果字符串中没有位置与模式匹配,则返回;否则返回false。请注意,这与在字符串中的某个点找到零长度匹配不同。

因此,如果您需要匹配字符串的开头,或者匹配整个字符串,请使用match。它更快。否则使用search

该文档中有一个专门针对matchvs.的部分search,还涵盖了多行字符串:

基于正则表达式的Python提供两种不同的基本操作:match检查是否有比赛 才刚刚开始的字符串,同时search检查是否有匹配 的任何地方的字符串(这是Perl并默认情况下)。

请注意, 即使使用以:开头的正则表达式match也可能仅在字符串的开头,或者在模式下也紧接换行符之后才匹配 。仅当模式在字符串开头 无论模式如何)或在可选 参数给定的起始位置匹配(无论换行符是否在其前面)时,“ ”操作才会成功。search'^''^'MULTILINEmatchpos

现在,足够的讨论。现在来看一些示例代码:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

re.match is anchored at the beginning of the string. That has nothing to do with newlines, so it is not the same as using ^ in the pattern.

As the re.match documentation says:

If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

Note: If you want to locate a match anywhere in string, use search() instead.

re.search searches the entire string, as the documentation says:

Scan through string looking for a location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So if you need to match at the beginning of the string, or to match the entire string use match. It is faster. Otherwise use search.

The documentation has a specific section for match vs. search that also covers multiline strings:

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Note that match may differ from search even when using a regular expression beginning with '^': '^' matches only at the start of the string, or in MULTILINE mode also immediately following a newline. The “match” operation succeeds only if the pattern matches at the start of the string regardless of mode, or at the starting position given by the optional pos argument regardless of whether a newline precedes it.

Now, enough talk. Time to see some example code:

# example code:
string_with_newlines = """something
someotherthing"""

import re

print re.match('some', string_with_newlines) # matches
print re.match('someother', 
               string_with_newlines) # won't match
print re.match('^someother', string_with_newlines, 
               re.MULTILINE) # also won't match
print re.search('someother', 
                string_with_newlines) # finds something
print re.search('^someother', string_with_newlines, 
                re.MULTILINE) # also finds something

m = re.compile('thing$', re.MULTILINE)

print m.match(string_with_newlines) # no match
print m.match(string_with_newlines, pos=4) # matches
print m.search(string_with_newlines, 
               re.MULTILINE) # also matches

回答 1

search ⇒在字符串中的任何地方找到东西,然后返回一个匹配对象。

match⇒ 在字符串的开头找到一些内容,然后返回匹配对象。

search ⇒ find something anywhere in the string and return a match object.

match ⇒ find something at the beginning of the string and return a match object.


回答 2

re.search 搜索 ES该模式在整个字符串,而re.match没有搜索不到的格局; 如果没有,则除了在字符串开头匹配外,别无选择。

re.search searches for the pattern throughout the string, whereas re.match does not search the pattern; if it does not, it has no other choice than to match it at start of the string.


回答 3

match比搜索快得多,因此如果不使用regex.search(“ word”),则可以执行regex.match((。*?)word(。*?)),如果您使用数百万个样品。

@ivan_bilan在上面接受的答案下的评论让我开始思考是否有这样的hack是否真的在加速任何事情,所以让我们找出您将真正获得多少性能。

我准备了以下测试套件:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

我进行了10次测量(1M,2M,…,10M个单词),得出以下图表:

匹配与搜索正则表达式速度测试线图

产生的直线令人惊讶地(实际上并不那么令人惊讶)直线。鉴于这种特定的模式组合,该search功能(略)更快。该测试的实质是避免过度优化代码。

match is much faster than search, so instead of doing regex.search(“word”) you can do regex.match((.*?)word(.*?)) and gain tons of performance if you are working with millions of samples.

This comment from @ivan_bilan under the accepted answer above got me thinking if such hack is actually speeding anything up, so let’s find out how many tons of performance you will really gain.

I prepared the following test suite:

import random
import re
import string
import time

LENGTH = 10
LIST_SIZE = 1000000

def generate_word():
    word = [random.choice(string.ascii_lowercase) for _ in range(LENGTH)]
    word = ''.join(word)
    return word

wordlist = [generate_word() for _ in range(LIST_SIZE)]

start = time.time()
[re.search('python', word) for word in wordlist]
print('search:', time.time() - start)

start = time.time()
[re.match('(.*?)python(.*?)', word) for word in wordlist]
print('match:', time.time() - start)

I made 10 measurements (1M, 2M, …, 10M words) which gave me the following plot:

match vs. search regex speedtest line plot

The resulting lines are surprisingly (actually not that surprisingly) straight. And the search function is (slightly) faster given this specific pattern combination. The moral of this test: Avoid overoptimizing your code.


回答 4

您可以参考以下示例以了解re.matchand.search 的工作原理

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match会回来none,但是re.search会回来abc

You can refer the below example to understand the working of re.match and re.search

a = "123abc"
t = re.match("[a-z]+",a)
t = re.search("[a-z]+",a)

re.match will return none, but re.search will return abc.


回答 5

区别在于,re.match()误导习惯于Perlgrepsed正则表达式匹配的任何人,并且re.search()不会。:-)

正如约翰·D·库克(John D. Cook)所说,要更加清醒一些,re.match()“表现得好像每个模式都以^开头。” 换句话说,re.match('pattern')等于re.search('^pattern')。因此,它锚定了图案的左侧。但这也没有固定模式的右侧:仍然需要终止$

坦白地说,我认为re.match()应该弃用。我很想知道应该保留它的原因。

The difference is, re.match() misleads anyone accustomed to Perl, grep, or sed regular expression matching, and re.search() does not. :-)

More soberly, As John D. Cook remarks, re.match() “behaves as if every pattern has ^ prepended.” In other words, re.match('pattern') equals re.search('^pattern'). So it anchors a pattern’s left side. But it also doesn’t anchor a pattern’s right side: that still requires a terminating $.

Frankly given the above, I think re.match() should be deprecated. I would be interested to know reasons it should be retained.


回答 6

re.match尝试匹配字符串开头的模式。re.search尝试在整个字符串中匹配模式直到找到匹配项。

re.match attempts to match a pattern at the beginning of the string. re.search attempts to match the pattern throughout the string until it finds a match.


回答 7

矮得多:

  • search 扫描整个字符串。

  • match 仅扫描字符串的开头。

以下Ex表示:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

Much shorter:

  • search scans through the whole string.

  • match scans only the beginning of the string.

Following Ex says it:

>>> a = "123abc"
>>> re.match("[a-z]+",a)
None
>>> re.search("[a-z]+",a)
abc

列表是否包含简短的包含功能?

问题:列表是否包含简短的包含功能?

我看到人们正在使用any另一个列表来查看列表中是否存在某项,但是有一种快速的方法吗?

if list.contains(myItem):
    # do something

I see people are using any to gather another list to see if an item exists in a list, but is there a quick way to just do?:

if list.contains(myItem):
    # do something

回答 0

您可以使用以下语法:

if myItem in list:
    # do something

同样,逆运算符:

if myItem not in list:
    # do something

它适用于列表,元组,集合和字典(检查键)。

请注意,这是列表和元组中的O(n)操作,而集合和字典中是O(1)操作。

You can use this syntax:

if myItem in list:
    # do something

Also, inverse operator:

if myItem not in list:
    # do something

It’s work fine for lists, tuples, sets and dicts (check keys).

Note that this is an O(n) operation in lists and tuples, but an O(1) operation in sets and dicts.


回答 1

除了别人说过的话,您可能还想知道什么in是调用list.__contains__方法,您可以在编写的任何类上定义该方法,并且可以非常方便地全面使用python。  

愚蠢的用途可能是:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

In addition to what other have said, you may also be interested to know that what in does is to call the list.__contains__ method, that you can define on any class you write and can get extremely handy to use python at his full extent.  

A dumb use may be:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

回答 2

我最近想出了这条衬垫,用于获取True列表中是否包含任何数量的项目,或者该列表中不包含任何项目或False根本不包含任何项目。使用next(...)会给它提供默认的返回值(False),这意味着它的运行速度应比运行整个列表理解的速度快得多。

list_does_contain = next((True for item in list_to_test if item == test_item), False)

I came up with this one liner recently for getting True if a list contains any number of occurrences of an item, or False if it contains no occurrences or nothing at all. Using next(...) gives this a default return value (False) and means it should run significantly faster than running the whole list comprehension.

list_does_contain = next((True for item in list_to_test if item == test_item), False)


回答 3

如果该项目不存在,则list方法index将返回,-1如果该项目存在,则将返回该项目在列表中的索引。或者,if您可以在语句中执行以下操作:

if myItem in list:
    #do things

您还可以使用以下if语句检查元素是否不在列表中:

if myItem not in list:
    #do things

The list method index will return -1 if the item is not present, and will return the index of the item in the list if it is present. Alternatively in an if statement you can do the following:

if myItem in list:
    #do things

You can also check if an element is not in a list with the following if statement:

if myItem not in list:
    #do things

字典搜索的Python列表

问题:字典搜索的Python列表

假设我有这个:

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

并通过搜索“ Pam”作为名称,我想检索相关的字典: {name: "Pam", age: 7}

如何实现呢?

Assume I have this:

[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

and by searching “Pam” as name, I want to retrieve the related dictionary: {name: "Pam", age: 7}

How to achieve this ?


回答 0

您可以使用生成器表达式

>>> dicts = [
...     { "name": "Tom", "age": 10 },
...     { "name": "Mark", "age": 5 },
...     { "name": "Pam", "age": 7 },
...     { "name": "Dick", "age": 12 }
... ]

>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

如果您需要处理不存在的项目,则可以执行用户Matt 在其注释中建议的操作,并使用略有不同的API提供默认值:

next((item for item in dicts if item["name"] == "Pam"), None)

为了找到项目的索引,而不是项目本身,可以枚举()列表:

next((i for i, item in enumerate(dicts) if item["name"] == "Pam"), None)

You can use a generator expression:

>>> dicts = [
...     { "name": "Tom", "age": 10 },
...     { "name": "Mark", "age": 5 },
...     { "name": "Pam", "age": 7 },
...     { "name": "Dick", "age": 12 }
... ]

>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

If you need to handle the item not being there, then you can do what user Matt suggested in his comment and provide a default using a slightly different API:

next((item for item in dicts if item["name"] == "Pam"), None)

And to find the index of the item, rather than the item itself, you can enumerate() the list:

next((i for i, item in enumerate(dicts) if item["name"] == "Pam"), None)

回答 1

在我看来,这是最Python的方式:

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]

filter(lambda person: person['name'] == 'Pam', people)

结果(在Python 2中作为列表返回):

[{'age': 7, 'name': 'Pam'}]

注意:在Python 3中,将返回一个过滤器对象。因此,python3解决方案将是:

list(filter(lambda person: person['name'] == 'Pam', people))

This looks to me the most pythonic way:

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]

filter(lambda person: person['name'] == 'Pam', people)

result (returned as a list in Python 2):

[{'age': 7, 'name': 'Pam'}]

Note: In Python 3, a filter object is returned. So the python3 solution would be:

list(filter(lambda person: person['name'] == 'Pam', people))

回答 2

@FrédéricHamidi的回答很好。在Python 3.x中,语法.next()略有变化。因此稍作修改:

>>> dicts = [
     { "name": "Tom", "age": 10 },
     { "name": "Mark", "age": 5 },
     { "name": "Pam", "age": 7 },
     { "name": "Dick", "age": 12 }
 ]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

如@Matt的评论中所述,您可以这样添加默认值:

>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

@Frédéric Hamidi’s answer is great. In Python 3.x the syntax for .next() changed slightly. Thus a slight modification:

>>> dicts = [
     { "name": "Tom", "age": 10 },
     { "name": "Mark", "age": 5 },
     { "name": "Pam", "age": 7 },
     { "name": "Dick", "age": 12 }
 ]
>>> next(item for item in dicts if item["name"] == "Pam")
{'age': 7, 'name': 'Pam'}

As mentioned in the comments by @Matt, you can add a default value as such:

>>> next((item for item in dicts if item["name"] == "Pam"), False)
{'name': 'Pam', 'age': 7}
>>> next((item for item in dicts if item["name"] == "Sam"), False)
False
>>>

回答 3

您可以使用列表推导

def search(name, people):
    return [element for element in people if element['name'] == name]

You can use a list comprehension:

def search(name, people):
    return [element for element in people if element['name'] == name]

回答 4

people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]

def search(name):
    for p in people:
        if p['name'] == name:
            return p

search("Pam")
people = [
{'name': "Tom", 'age': 10},
{'name': "Mark", 'age': 5},
{'name': "Pam", 'age': 7}
]

def search(name):
    for p in people:
        if p['name'] == name:
            return p

search("Pam")

回答 5

我测试了各种方法来浏览字典列表,然后返回键x具有特定值的字典。

结果:

  • 速度:列表理解>生成器表达式>>普通列表迭代>>>过滤器。
  • 全部缩放与列表中的字典数量成线性关系(10倍列表大小-> 10倍时间)。
  • 对于大量(数千)键,每个词典的键不会显着影响速度。请查看我计算出的以下图表:https : //imgur.com/a/quQzv(方法名称请参见下文)。

所有测试均使用Python 3.6 .4,W7x64完成。

from random import randint
from timeit import timeit


list_dicts = []
for _ in range(1000):     # number of dicts in the list
    dict_tmp = {}
    for i in range(10):   # number of keys for each dict
        dict_tmp[f"key{i}"] = randint(0,50)
    list_dicts.append( dict_tmp )



def a():
    # normal iteration over all elements
    for dict_ in list_dicts:
        if dict_["key3"] == 20:
            pass

def b():
    # use 'generator'
    for dict_ in (x for x in list_dicts if x["key3"] == 20):
        pass

def c():
    # use 'list'
    for dict_ in [x for x in list_dicts if x["key3"] == 20]:
        pass

def d():
    # use 'filter'
    for dict_ in filter(lambda x: x['key3'] == 20, list_dicts):
        pass

结果:

1.7303 # normal list iteration 
1.3849 # generator expression 
1.3158 # list comprehension 
7.7848 # filter

I tested various methods to go through a list of dictionaries and return the dictionaries where key x has a certain value.

Results:

  • Speed: list comprehension > generator expression >> normal list iteration >>> filter.
  • All scale linear with the number of dicts in the list (10x list size -> 10x time).
  • The keys per dictionary does not affect speed significantly for large amounts (thousands) of keys. Please see this graph I calculated: https://imgur.com/a/quQzv (method names see below).

All tests done with Python 3.6.4, W7x64.

from random import randint
from timeit import timeit


list_dicts = []
for _ in range(1000):     # number of dicts in the list
    dict_tmp = {}
    for i in range(10):   # number of keys for each dict
        dict_tmp[f"key{i}"] = randint(0,50)
    list_dicts.append( dict_tmp )



def a():
    # normal iteration over all elements
    for dict_ in list_dicts:
        if dict_["key3"] == 20:
            pass

def b():
    # use 'generator'
    for dict_ in (x for x in list_dicts if x["key3"] == 20):
        pass

def c():
    # use 'list'
    for dict_ in [x for x in list_dicts if x["key3"] == 20]:
        pass

def d():
    # use 'filter'
    for dict_ in filter(lambda x: x['key3'] == 20, list_dicts):
        pass

Results:

1.7303 # normal list iteration 
1.3849 # generator expression 
1.3158 # list comprehension 
7.7848 # filter

回答 6

向@FrédéricHamidi添加一点点。

如果您不确定某个键是否在字典列表中,可以使用以下方法:

next((item for item in dicts if item.get("name") and item["name"] == "Pam"), None)

To add just a tiny bit to @FrédéricHamidi.

In case you are not sure a key is in the the list of dicts, something like this would help:

next((item for item in dicts if item.get("name") and item["name"] == "Pam"), None)

回答 7

您是否尝试过熊猫包装?它非常适合此类搜索任务,并且也进行了优化。

import pandas as pd

listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)

# The pandas dataframe allows you to pick out specific values like so:

df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]

# Alternate syntax, same thing

df2 = df[ (df.name == 'Pam') & (df.age == 7) ]

我在下面添加了一些基准测试,以大范围地(即100k +项)说明熊猫的运行时间:

setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'

import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))

t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))

#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

Have you ever tried out the pandas package? It’s perfect for this kind of search task and optimized too.

import pandas as pd

listOfDicts = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

# Create a data frame, keys are used as column headers.
# Dict items with the same key are entered into the same respective column.
df = pd.DataFrame(listOfDicts)

# The pandas dataframe allows you to pick out specific values like so:

df2 = df[ (df['name'] == 'Pam') & (df['age'] == 7) ]

# Alternate syntax, same thing

df2 = df[ (df.name == 'Pam') & (df.age == 7) ]

I’ve added a little bit of benchmarking below to illustrate pandas’ faster runtimes on a larger scale i.e. 100k+ entries:

setup_large = 'dicts = [];\
[dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 })) for _ in range(25000)];\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

setup_small = 'dicts = [];\
dicts.extend(({ "name": "Tom", "age": 10 },{ "name": "Mark", "age": 5 },\
{ "name": "Pam", "age": 7 },{ "name": "Dick", "age": 12 }));\
from operator import itemgetter;import pandas as pd;\
df = pd.DataFrame(dicts);'

method1 = '[item for item in dicts if item["name"] == "Pam"]'
method2 = 'df[df["name"] == "Pam"]'

import timeit
t = timeit.Timer(method1, setup_small)
print('Small Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_small)
print('Small Method Pandas: ' + str(t.timeit(100)))

t = timeit.Timer(method1, setup_large)
print('Large Method LC: ' + str(t.timeit(100)))
t = timeit.Timer(method2, setup_large)
print('Large Method Pandas: ' + str(t.timeit(100)))

#Small Method LC: 0.000191926956177
#Small Method Pandas: 0.044392824173
#Large Method LC: 1.98827004433
#Large Method Pandas: 0.324505090714

回答 8

这是在字典列表中搜索值的一般方法:

def search_dictionaries(key, value, list_of_dictionaries):
    return [element for element in list_of_dictionaries if element[key] == value]

This is a general way of searching a value in a list of dictionaries:

def search_dictionaries(key, value, list_of_dictionaries):
    return [element for element in list_of_dictionaries if element[key] == value]

回答 9

names = [{'name':'Tom', 'age': 10}, {'name': 'Mark', 'age': 5}, {'name': 'Pam', 'age': 7}]
resultlist = [d    for d in names     if d.get('name', '') == 'Pam']
first_result = resultlist[0]

这是一种方法

names = [{'name':'Tom', 'age': 10}, {'name': 'Mark', 'age': 5}, {'name': 'Pam', 'age': 7}]
resultlist = [d    for d in names     if d.get('name', '') == 'Pam']
first_result = resultlist[0]

This is one way…


回答 10

只需使用列表推导:

[i for i in dct if i['name'] == 'Pam'][0]

样例代码:

dct = [
    {'name': 'Tom', 'age': 10},
    {'name': 'Mark', 'age': 5},
    {'name': 'Pam', 'age': 7}
]

print([i for i in dct if i['name'] == 'Pam'][0])

> {'age': 7, 'name': 'Pam'}

Simply using list comprehension:

[i for i in dct if i['name'] == 'Pam'][0]

Sample code:

dct = [
    {'name': 'Tom', 'age': 10},
    {'name': 'Mark', 'age': 5},
    {'name': 'Pam', 'age': 7}
]

print([i for i in dct if i['name'] == 'Pam'][0])

> {'age': 7, 'name': 'Pam'}

回答 11

您可以通过在Python中使用filter和next方法来实现。

filter方法过滤给定的序列并返回一个迭代器。 next方法接受迭代器,并返回列表中的下一个元素。

因此,您可以通过以下方式找到元素

my_dict = [
    {"name": "Tom", "age": 10},
    {"name": "Mark", "age": 5},
    {"name": "Pam", "age": 7}
]

next(filter(lambda obj: obj.get('name') == 'Pam', my_dict), None)

输出是

{'name': 'Pam', 'age': 7}

注意:None如果找不到我们正在搜索的名称,上述代码将返回以防万一。

You can achieve this with the usage of filter and next methods in Python.

filter method filters the given sequence and returns an iterator. next method accepts an iterator and returns the next element in the list.

So you can find the element by,

my_dict = [
    {"name": "Tom", "age": 10},
    {"name": "Mark", "age": 5},
    {"name": "Pam", "age": 7}
]

next(filter(lambda obj: obj.get('name') == 'Pam', my_dict), None)

and the output is,

{'name': 'Pam', 'age': 7}

Note: The above code will return None incase if the name we are searching is not found.


回答 12

我的第一个想法是,您可能要考虑创建一个包含这些词典的字典…例如,如果您要搜索的词典次数不止一次。

但是,这可能是过早的优化。有什么问题:

def get_records(key, store=dict()):
    '''Return a list of all records containing name==key from our store
    '''
    assert key is not None
    return [d for d in store if d['name']==key]

My first thought would be that you might want to consider creating a dictionary of these dictionaries … if, for example, you were going to be searching it more a than small number of times.

However that might be a premature optimization. What would be wrong with:

def get_records(key, store=dict()):
    '''Return a list of all records containing name==key from our store
    '''
    assert key is not None
    return [d for d in store if d['name']==key]

回答 13

dicts=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

from collections import defaultdict
dicts_by_name=defaultdict(list)
for d in dicts:
    dicts_by_name[d['name']]=d

print dicts_by_name['Tom']

#output
#>>>
#{'age': 10, 'name': 'Tom'}
dicts=[
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

from collections import defaultdict
dicts_by_name=defaultdict(list)
for d in dicts:
    dicts_by_name[d['name']]=d

print dicts_by_name['Tom']

#output
#>>>
#{'age': 10, 'name': 'Tom'}

回答 14

使用列表推导的一种简单方法是,如果 l是列表

l = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

然后

[d['age'] for d in l if d['name']=='Tom']

One simple way using list comprehensions is , if l is the list

l = [
{"name": "Tom", "age": 10},
{"name": "Mark", "age": 5},
{"name": "Pam", "age": 7}
]

then

[d['age'] for d in l if d['name']=='Tom']

回答 15

您可以尝试以下方法:

''' lst: list of dictionaries '''
lst = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]

search = raw_input("What name: ") #Input name that needs to be searched (say 'Pam')

print [ lst[i] for i in range(len(lst)) if(lst[i]["name"]==search) ][0] #Output
>>> {'age': 7, 'name': 'Pam'} 

You can try this:

''' lst: list of dictionaries '''
lst = [{"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7}]

search = raw_input("What name: ") #Input name that needs to be searched (say 'Pam')

print [ lst[i] for i in range(len(lst)) if(lst[i]["name"]==search) ][0] #Output
>>> {'age': 7, 'name': 'Pam'} 

回答 16

这是一个使用迭代遍历列表的比较,使用filter + lambda或重构(如果需要或对您的情况有效)的代码将您的代码用于命令,而不是命令列表

import time

# Build list of dicts
list_of_dicts = list()
for i in range(100000):
    list_of_dicts.append({'id': i, 'name': 'Tom'})

# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
    dict_of_dicts[i] = {'name': 'Tom'}


# Find the one with ID of 99

# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
    if elem['id'] == 99999:
        break
lod_tf = time.time()
lod_td = lod_tf - lod_ts

# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts

# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts


print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td

输出是这样的:

List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06

结论: 在这些情况下,显然拥有字典词典是最有效的搜索方式,在这种情况下,您知道您将仅通过id进行搜索。有趣的是,使用过滤器是最慢的解决方案。

Here is a comparison using iterating throuhg list, using filter+lambda or refactoring(if needed or valid to your case) your code to dict of dicts rather than list of dicts

import time

# Build list of dicts
list_of_dicts = list()
for i in range(100000):
    list_of_dicts.append({'id': i, 'name': 'Tom'})

# Build dict of dicts
dict_of_dicts = dict()
for i in range(100000):
    dict_of_dicts[i] = {'name': 'Tom'}


# Find the one with ID of 99

# 1. iterate through the list
lod_ts = time.time()
for elem in list_of_dicts:
    if elem['id'] == 99999:
        break
lod_tf = time.time()
lod_td = lod_tf - lod_ts

# 2. Use filter
f_ts = time.time()
x = filter(lambda k: k['id'] == 99999, list_of_dicts)
f_tf = time.time()
f_td = f_tf- f_ts

# 3. find it in dict of dicts
dod_ts = time.time()
x = dict_of_dicts[99999]
dod_tf = time.time()
dod_td = dod_tf - dod_ts


print 'List of Dictionries took: %s' % lod_td
print 'Using filter took: %s' % f_td
print 'Dict of Dicts took: %s' % dod_td

And the output is this:

List of Dictionries took: 0.0099310874939
Using filter took: 0.0121960639954
Dict of Dicts took: 4.05311584473e-06

Conclusion: Clearly having a dictionary of dicts is the most efficient way to be able to search in those cases, where you know say you will be searching by id’s only. interestingly using filter is the slowest solution.


回答 17

您必须遍历列表的所有元素。没有捷径!

除非在其他地方保留了指向列表项的名称字典,否则您必须注意从列表中弹出元素的后果。

You have to go through all elements of the list. There is not a shortcut!

Unless somewhere else you keep a dictionary of the names pointing to the items of the list, but then you have to take care of the consequences of popping an element from your list.


回答 18

我在寻找同一问题的答案时找到了这个线程。虽然我意识到这是一个迟来的答案,但我认为我会做出贡献,以防它对其他人有用:

def find_dict_in_list(dicts, default=None, **kwargs):
    """Find first matching :obj:`dict` in :obj:`list`.

    :param list dicts: List of dictionaries.
    :param dict default: Optional. Default dictionary to return.
        Defaults to `None`.
    :param **kwargs: `key=value` pairs to match in :obj:`dict`.

    :returns: First matching :obj:`dict` from `dicts`.
    :rtype: dict

    """

    rval = default
    for d in dicts:
        is_found = False

        # Search for keys in dict.
        for k, v in kwargs.items():
            if d.get(k, None) == v:
                is_found = True

            else:
                is_found = False
                break

        if is_found:
            rval = d
            break

    return rval


if __name__ == '__main__':
    # Tests
    dicts = []
    keys = 'spam eggs shrubbery knight'.split()

    start = 0
    for _ in range(4):
        dct = {k: v for k, v in zip(keys, range(start, start+4))}
        dicts.append(dct)
        start += 4

    # Find each dict based on 'spam' key only.  
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam) == dicts[x]

    # Find each dict based on 'spam' and 'shrubbery' keys.
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+2) == dicts[x]

    # Search for one correct key, one incorrect key:
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+1) is None

    # Search for non-existent dict.
    for x in range(len(dicts)):
        spam = x+100
        assert find_dict_in_list(dicts, spam=spam) is None

I found this thread when I was searching for an answer to the same question. While I realize that it’s a late answer, I thought I’d contribute it in case it’s useful to anyone else:

def find_dict_in_list(dicts, default=None, **kwargs):
    """Find first matching :obj:`dict` in :obj:`list`.

    :param list dicts: List of dictionaries.
    :param dict default: Optional. Default dictionary to return.
        Defaults to `None`.
    :param **kwargs: `key=value` pairs to match in :obj:`dict`.

    :returns: First matching :obj:`dict` from `dicts`.
    :rtype: dict

    """

    rval = default
    for d in dicts:
        is_found = False

        # Search for keys in dict.
        for k, v in kwargs.items():
            if d.get(k, None) == v:
                is_found = True

            else:
                is_found = False
                break

        if is_found:
            rval = d
            break

    return rval


if __name__ == '__main__':
    # Tests
    dicts = []
    keys = 'spam eggs shrubbery knight'.split()

    start = 0
    for _ in range(4):
        dct = {k: v for k, v in zip(keys, range(start, start+4))}
        dicts.append(dct)
        start += 4

    # Find each dict based on 'spam' key only.  
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam) == dicts[x]

    # Find each dict based on 'spam' and 'shrubbery' keys.
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+2) == dicts[x]

    # Search for one correct key, one incorrect key:
    for x in range(len(dicts)):
        spam = x*4
        assert find_dict_in_list(dicts, spam=spam, shrubbery=spam+1) is None

    # Search for non-existent dict.
    for x in range(len(dicts)):
        spam = x+100
        assert find_dict_in_list(dicts, spam=spam) is None

回答 19

这里提出的大多数(如果不是全部)实现都有两个缺陷:

  • 他们假定只传递一个键来进行搜索,而对于复杂的字典有更多键可能很有趣
  • 他们假定传递给搜索的所有键都存在于字典中,因此当键错误不存在时,它们将无法正确处理。

更新的主张:

def find_first_in_list(objects, **kwargs):
    return next((obj for obj in objects if
                 len(set(obj.keys()).intersection(kwargs.keys())) > 0 and
                 all([obj[k] == v for k, v in kwargs.items() if k in obj.keys()])),
                None)

也许不是最Python的,但至少具有更多的故障保护功能。

用法:

>>> obj1 = find_first_in_list(list_of_dict, name='Pam', age=7)
>>> obj2 = find_first_in_list(list_of_dict, name='Pam', age=27)
>>> obj3 = find_first_in_list(list_of_dict, name='Pam', address='nowhere')
>>> 
>>> print(obj1, obj2, obj3)
{"name": "Pam", "age": 7}, None, {"name": "Pam", "age": 7}

要点

Most (if not all) implementations proposed here have two flaws:

  • They assume only one key to be passed for searching, while it may be interesting to have more for complex dict
  • They assume all keys passed for searching exist in the dicts, hence they don’t deal correctly with KeyError occuring when it is not.

An updated proposition:

def find_first_in_list(objects, **kwargs):
    return next((obj for obj in objects if
                 len(set(obj.keys()).intersection(kwargs.keys())) > 0 and
                 all([obj[k] == v for k, v in kwargs.items() if k in obj.keys()])),
                None)

Maybe not the most pythonic, but at least a bit more failsafe.

Usage:

>>> obj1 = find_first_in_list(list_of_dict, name='Pam', age=7)
>>> obj2 = find_first_in_list(list_of_dict, name='Pam', age=27)
>>> obj3 = find_first_in_list(list_of_dict, name='Pam', address='nowhere')
>>> 
>>> print(obj1, obj2, obj3)
{"name": "Pam", "age": 7}, None, {"name": "Pam", "age": 7}

The gist.


Jina-面向任何类别数据的云原生神经搜索框架

Jina logo: Jina is a cloud-native neural search framework

云-本地神经搜索[?]适用于以下方面的框架任何数据类型

Jina 允许您在短短几分钟内构建以深度学习为动力的搜索即服务

🌌所有数据类型-大规模索引和查询任何类型的非结构化数据:视频、图像、长/短文本、音乐、源代码、PDF等

🌩️FAST和本机云-从第一天开始的分布式架构,可扩展且设计为本地云:享受集装箱化、流式处理、并行、分片、异步调度、HTTP/GRPC/WebSocket协议

⏱️节省时间这个神经搜索系统的设计模式,从零到生产准备就绪的系统只需几分钟

🍱拥有您的堆栈-保持解决方案的端到端堆栈所有权,避免使用零散的、多供应商的通用旧式工具带来的集成陷阱

运行快速演示

安装

  • 通过PyPI:pip install -U "jina[standard]"
  • 通过Docker:docker run jinaai/jina:latest
更多安装选项
x86/64、arm64、v6、v7 Linux/MacOS和Python 3.7/3.8/3.9 Docker用户
最低要求
(不支持HTTP、WebSocket、Docker)
pip install jina docker run jinaai/jina:latest
Daemon pip install "jina[daemon]" docker run --network=host jinaai/jina:latest-daemon
使用附加服务 pip install "jina[devel]" docker run jinaai/jina:latest-devel

版本标识符are explained here吉娜可以继续奔跑Windows Subsystem for Linux我们欢迎社会各界帮助我们native Windows support

开始使用

文档、执行者和流是JINA中的三个基本概念

1个️⃣复制-粘贴下面的最小示例并运行它:

💡预赛:character embeddingpoolingEuclidean distance

The architecture of a simple neural search system powered by Jina

import numpy as np
from jina import Document, DocumentArray, Executor, Flow, requests

class CharEmbed(Executor):  # a simple character embedding with mean-pooling
    offset = 32  # letter `a`
    dim = 127 - offset + 1  # last pos reserved for `UNK`
    char_embd = np.eye(dim) * 1  # one-hot embedding for all chars

    @requests
    def foo(self, docs: DocumentArray, **kwargs):
        for d in docs:
            r_emb = [ord(c) - self.offset if self.offset <= ord(c) <= 127 else (self.dim - 1) for c in d.text]
            d.embedding = self.char_embd[r_emb, :].mean(axis=0)  # average pooling

class Indexer(Executor):
    _docs = DocumentArray()  # for storing all documents in memory

    @requests(on='/index')
    def foo(self, docs: DocumentArray, **kwargs):
        self._docs.extend(docs)  # extend stored `docs`

    @requests(on='/search')
    def bar(self, docs: DocumentArray, **kwargs):
        q = np.stack(docs.get_attributes('embedding'))  # get all embeddings from query docs
        d = np.stack(self._docs.get_attributes('embedding'))  # get all embeddings from stored docs
        euclidean_dist = np.linalg.norm(q[:, None, :] - d[None, :, :], axis=-1)  # pairwise euclidean distance
        for dist, query in zip(euclidean_dist, docs):  # add & sort match
            query.matches = [Document(self._docs[int(idx)], copy=True, scores={'euclid': d}) for idx, d in enumerate(dist)]
            query.matches.sort(key=lambda m: m.scores['euclid'].value)  # sort matches by their values

f = Flow(port_expose=12345, protocol='http', cors=True).add(uses=CharEmbed, parallel=2).add(uses=Indexer)  # build a Flow, with 2 parallel CharEmbed, tho unnecessary
with f:
    f.post('/index', (Document(text=t.strip()) for t in open(__file__) if t.strip()))  # index all lines of _this_ file
    f.block()  # block for listening request

2个️⃣打开http://localhost:12345/docs(扩展的Swagger UI)在浏览器中,单击/搜索制表符和输入:

{"data": [{"text": "@requests(on=something)"}]}

也就是说,我们希望从上面的代码片段中找到与以下内容最相似的行@request(on=something)现在单击执行巴顿!

Jina Swagger UI extension on visualizing neural search results

3个️⃣不是图形用户界面的人?那就让我们用Python来做吧!保持上述服务器运行,并启动一个简单的客户端:

from jina import Client, Document
from jina.types.request import Response


def print_matches(resp: Response):  # the callback function invoked when task is done
    for idx, d in enumerate(resp.docs[0].matches[:3]):  # print top-3 matches
        print(f'[{idx}]{d.scores["euclid"].value:2f}: "{d.text}"')


c = Client(protocol='http', port_expose=12345)  # connect to localhost:12345
c.post('/search', Document(text='request(on=something)'), on_done=print_matches)

,它打印以下结果:

         Client@1608[S]:connected to the gateway at localhost:12345!
[0]0.168526: "@requests(on='/index')"
[1]0.181676: "@requests(on='/search')"
[2]0.192049: "query.matches = [Document(self._docs[int(idx)], copy=True, score=d) for idx, d in enumerate(dist)]"

😔不管用吗?我们的错!Please report it here.

阅读教程

支持

加入我们吧

吉娜的后盾是Jina AIWe are actively hiring全栈开发人员、解决方案工程师在开源领域构建下一个神经搜索生态系统

贡献

我们欢迎来自开源社区、个人和合作伙伴的各种贡献。我们的成功归功于你的积极参与

All Contributors