标签归档:dictionary

Python-创建具有初始容量的列表

问题:Python-创建具有初始容量的列表

像这样的代码经常发生:

l = []
while foo:
    #baz
    l.append(bar)
    #qux

如果您要将数千个元素添加到列表中,这真的很慢,因为必须不断调整列表的大小以适应新元素。

在Java中,您可以创建具有初始容量的ArrayList。如果您知道列表的大小,这将大大提高效率。

我知道这样的代码通常可以重构为列表理解。但是,如果for / while循环非常复杂,则这是不可行的。我们的Python程序员有什么对等的地方吗?

Code like this often happens:

l = []
while foo:
    #baz
    l.append(bar)
    #qux

This is really slow if you’re about to append thousands of elements to your list, as the list will have to be constantly resized to fit the new elements.

In Java, you can create an ArrayList with an initial capacity. If you have some idea how big your list will be, this will be a lot more efficient.

I understand that code like this can often be re-factored into a list comprehension. If the for/while loop is very complicated, though, this is unfeasible. Is there any equivalent for us Python programmers?


回答 0

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

结果。(评估每个功能144次并平均持续时间)

simple append 0.0102
pre-allocate  0.0098

结论。没关系。

过早的优化是万恶之源。

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102
pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.


回答 1

Python列表没有内置的预分配。如果您确实需要列出清单,并且需要避免附加的开销(并且您应该验证这样做),则可以执行以下操作:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

也许您可以通过使用生成器来避免使用此列表:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

这样,列表并不会全部存储在内存中,而只是根据需要生成。

Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

This way, the list isn’t every stored all in memory at all, merely generated as needed.


回答 2

短版:使用

pre_allocated_list = [None] * size

预先分配一个列表(也就是说,能够解决列表的“大小”元素,而不是通过追加逐步形成列表)。即使在大型列表上,此操作也非常快。分配新对象,这些对象以后将分配给列表元素,将花费更长的时间,并且在性能方面将成为程序中的瓶颈。

长版:

我认为应该考虑初始化时间。由于在python中所有内容都是引用,因此将每个元素设置为None还是某个字符串都没关系-两种方式都只是引用。如果要为每个要引用的元素创建新对象,则将花费更长的时间。

对于Python 3.2:

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time ()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod=copy.deepcopy, cpargs=(), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

评价:

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

如您所见,仅列出对同一None对象的大量引用就花费很少的时间。

前置或扩展花费的时间更长(我没有平均任何东西,但是运行几次后,我可以告诉您扩展和附加花费的时间大致相同)。

为每个元素分配新对象-这是最耗时的时间。S.Lott的答案就是这样做-每次都格式化一个新字符串。这不是严格要求的-如果要预分配一些空间,只需列出一个None,然后随意将数据分配给list元素。不管是在创建列表时还是在创建列表之后生成数据,用哪种方式生成数据都比添加/扩展列表花费更多的时间。但是,如果您想要一个人口稀少的列表,那么从“无”列表开始肯定会更快。

Short version: use

pre_allocated_list = [None] * size

to pre-allocate a list (that is, to be able to address ‘size’ elements of the list instead of gradually forming the list by appending). This operation is VERY fast, even on big lists. Allocating new objects that will be later assigned to list elements will take MUCH longer and will be THE bottleneck in your program, performance-wise.

Long version:

I think that initialization time should be taken into account. Since in python everything is a reference, it doesn’t matter whether you set each element into None or some string – either way it’s only a reference. Though it will take longer if you want to create new object for each element to reference.

For Python 3.2:

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time ()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod=copy.deepcopy, cpargs=(), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

Evaluation:

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

As you can see, just making a big list of references to the same None object takes very little time.

Prepending or extending takes longer (i didn’t average anything, but after running this a few times i can tell you that extending and appending take roughly the same time).

Allocating new object for each element – that is what takes the most time. And S.Lott’s answer does that – formats a new string every time. Which is not strictly required – if you want to pre-allocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.


回答 3

使用Python的方式是:

x = [None] * numElements

或您希望预先弹出的任何默认值,例如

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

[编辑:买者自负[Beer()] * 99语法创建一个 Beer,然后填充与99次的引用数组相同的单个实例]

Python的默认方法可能非常有效,尽管随着您增加元素数量,效率会下降。

比较

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

在我的Windows 7 i7上,64位Python提供

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

尽管C ++提供(使用MSVC构建,64位,启用了优化)

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C ++调试版本生成:

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

这里的要点是,使用Python可以将性能提高7-8%,并且如果您认为自己正在编写高性能的应用程序(或者正在编写用于Web服务的内容或其他内容),那么这是不容小at的,但是您可能需要重新考虑您选择的语言。

另外,这里的Python代码并不是真正的Python代码。在此切换到真正的Pythonesque代码可提供更好的性能:

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

这使

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

(在32位中,doGenerator比doAllocate做得更好)。

在这里,doAppend和doAllocate之间的差距明显更大。

显然,这里的差异仅适用于您执行多次以上的操作,或者在负载较重的系统上执行的操作,这些数据将按数量级进行扩展,或者正在处理更大的清单。

这里的重点是:用pythonic方式获得最佳性能。

但是,如果您担心常规的高级性能,则Python是错误的语言。最根本的问题是,由于装饰器等(https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation#Data_Aggregation)等Python功能,Python函数调用传统上比其他语言慢300倍。

The Pythonic way for this is:

x = [None] * numElements

or whatever default value you wish to prepop with, e.g.

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

[EDIT: Caveat Emptor The [Beer()] * 99 syntax creates one Beer and then populates an array with 99 references to the same single instance]

Python’s default approach can be pretty efficient, although that efficiency decays as you increase the number of elements.

Compare

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

with

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

On my Windows 7 i7, 64-bit Python gives

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

While C++ gives (built with MSVC, 64-bit, Optimizations enabled)

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C++ debug build produces:

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

The point here is that with Python you can achieve a 7-8% performance improvement, and if you think you’re writing a high-performance app (or if you’re writing something that is used in a web service or something) then that isn’t to be sniffed at, but you may need to rethink your choice of language.

Also, the Python code here isn’t really Python code. Switching to truly Pythonesque code here gives better performance:

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

Which gives

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

(in 32-bit doGenerator does better than doAllocate).

Here the gap between doAppend and doAllocate is significantly larger.

Obviously, the differences here really only apply if you are doing this more than a handful of times or if you are doing this on a heavily loaded system where those numbers are going to get scaled out by orders of magnitude, or if you are dealing with considerably larger lists.

The point here: Do it the pythonic way for the best performance.

But if you are worrying about general, high-level performance, Python is the wrong language. The most fundamental problem being that Python function calls has traditionally been upto 300x slower than other languages due to Python features like decorators etc (https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation#Data_Aggregation).


回答 4

正如其他人提到的那样,最简单的方法是预先植入带有NoneType对象的列表。

话虽这么说,您应该先了解Python列表的实际工作方式,然后再决定是否需要这样做。在列表的CPython实现中,始终在开销空间的情况下创建底层数组,并且数组的大小逐渐变大( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc),因此几乎不经常调整列表的大小。

由于这种行为,大多数 list.append()功能O(1)对于追加而言都是复杂的,仅当跨越这些边界之一时才具有增加的复杂度,此时复杂度将为O(n)。这种行为导致S. Lott的答案中执行时间的最小增加。

资料来源:http : //www.laurentluce.com/posts/python-list-implementation/

As others have mentioned, the simplest way to pre-seed a list with NoneType objects.

That being said, you should understand the way Python lists actually work before deciding this is necessary. In the CPython implementation of a list, the underlying array is always created with overhead room, in progressively larger sizes ( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc), so that resizing the list does not happen nearly so often.

Because of this behavior, most list.append() functions are O(1) complexity for appends, only having increased complexity when crossing one of these boundaries, at which point the complexity will be O(n). This behavior is what leads to the minimal increase in execution time in S. Lott’s answer.

Source: http://www.laurentluce.com/posts/python-list-implementation/


回答 5

我运行@ s.lott的代码,并通过预分配产生了10%的相同性能提升。使用生成器尝试了@jeremy的想法,并且能够比doAllocate更好地看到gen的性能。对于我的项目,10%的改善很重要,因此感谢大家,因为这对我们有很大帮助。

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

def doGen( size=10000 ):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size=1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

i ran @s.lott’s code and produced the same 10% perf increase by pre-allocating. tried @jeremy’s idea using a generator and was able to see the perf of the gen better than that of the doAllocate. For my proj the 10% improvement matters, so thanks to everyone as this helps a bunch.

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

def doGen( size=10000 ):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size=1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

回答 6

如果您使用的numpy具有更多类似C的数组,则会引起对Python中预分配的担忧。在这种情况下,预分配问题与数据的形状和默认值有关。

如果要在大量列表上进行数值计算并需要性能,请考虑使用numpy。

Concerns about pre-allocation in Python arise if you’re working with numpy, which has more C-like arrays. In this instance, pre-allocation concerns are about the shape of the data and the default value.

Consider numpy if you’re doing numerical computation on massive lists and want performance.


回答 7

对于某些应用程序,词典可能是您所需要的。例如,在find_totient方法中,我发现使用字典更方便,因为我没有零索引。

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

这个问题也可以通过预先分配的列表来解决:

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

我觉得这不是那么优雅,而且容易出错,因为我存储的None会在我不小心使用错误的情况下引发异常,并且因为我需要考虑地图可以避免的边缘情况。

确实,字典的效率不高,但是正如其他人所评论的那样,速度的微小差异并不总会带来重大的维护风险。

For some applications, a dictionary may be what you are looking for. For example, in the find_totient method, I found it more convenient to use a dictionary since I didn’t have a zero index.

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

This problem could also be solved with a preallocated list:

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

I feel that this is not as elegant and prone to bugs because I’m storing None which could throw an exception if I accidentally use them wrong, and because I need to think about edge cases that the map lets me avoid.

It’s true the dictionary won’t be as efficient, but as others have commented, small differences in speed are not always worth significant maintenance hazards.


回答 8

据我了解,python列表已经与ArrayLists非常相似。但是,如果您想调整这些参数,我会在网上发现这篇帖子可能很有趣(基本上,只需创建自己的ScalableList扩展名即可):

http://mail.python.org/pipermail/python-list/2000-May/035082.html

From what I understand, python lists are already quite similar to ArrayLists. But if you want to tweak those parameters I found this post on the net that may be interesting (basically, just create your own ScalableList extension):

http://mail.python.org/pipermail/python-list/2000-May/035082.html


Python字典是哈希表的示例吗?

问题:Python字典是哈希表的示例吗?

字典是Python中的一种基本数据结构,它允许记录“键”以查找任何类型的“值”。这在内部实现为哈希表吗?如果没有,那是什么?

One of the basic data structures in Python is the dictionary, which allows one to record “keys” for looking up “values” of any type. Is this implemented internally as a hash table? If not, what is it?


回答 0

是的,它是一个哈希映射或哈希表。您可以在此处阅读由蒂姆·彼得斯(Tim Peters)编写的有关python dict的实现的描述。

这就是为什么您不能使用“不可散列”的东西作为字典键(例如列表)的原因:

>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

您可以阅读有关散列表的更多信息,查看它如何在python中实现以及为什么以这种方式实现

Yes, it is a hash mapping or hash table. You can read a description of python’s dict implementation, as written by Tim Peters, here.

That’s why you can’t use something ‘not hashable’ as a dict key, like a list:

>>> a = {}
>>> b = ['some', 'list']
>>> hash(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable
>>> a[b] = 'some'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

You can read more about hash tables or check how it has been implemented in python and why it is implemented that way.


回答 1

除了在hash()上进行表查找之外,Python词典还必须有更多内容。通过残酷的实验,我发现了这种哈希冲突

>>> hash(1.1)
2040142438
>>> hash(4504.1)
2040142438

但这并没有破坏字典:

>>> d = { 1.1: 'a', 4504.1: 'b' }
>>> d[1.1]
'a'
>>> d[4504.1]
'b'

完整性检查:

>>> for k,v in d.items(): print(hash(k))
2040142438
2040142438

可能除了hash()之外还有另一种查找级别,可以避免字典键之间的冲突。也许dict()使用不同的哈希值。

(顺便说一句,在python 2.7.10中是这样。在Python 3.4.3和3.5.0中是相同的,但在处发生了冲突hash(1.1) == hash(214748749.8)。)

There must be more to a Python dictionary than a table lookup on hash(). By brute experimentation I found this hash collision:

>>> hash(1.1)
2040142438
>>> hash(4504.1)
2040142438

Yet it doesn’t break the dictionary:

>>> d = { 1.1: 'a', 4504.1: 'b' }
>>> d[1.1]
'a'
>>> d[4504.1]
'b'

Sanity check:

>>> for k,v in d.items(): print(hash(k))
2040142438
2040142438

Possibly there’s another lookup level beyond hash() that avoids collisions between dictionary keys. Or maybe dict() uses a different hash.

(By the way, this in Python 2.7.10. Same story in Python 3.4.3 and 3.5.0 with a collision at hash(1.1) == hash(214748749.8).)


回答 2

是。在内部,它被实现为基于Z / 2()上的原始多项式的开放式哈希。

Yes. Internally it is implemented as open hashing based on a primitive polynomial over Z/2 (source).


回答 3

为了扩展nosklo的解释:

a = {}
b = ['some', 'list']
a[b] = 'some' # this won't work
a[tuple(b)] = 'some' # this will, same as a['some', 'list']

To expand upon nosklo’s explanation:

a = {}
b = ['some', 'list']
a[b] = 'some' # this won't work
a[tuple(b)] = 'some' # this will, same as a['some', 'list']

自定义类型的对象作为字典键

问题:自定义类型的对象作为字典键

如何将自定义类型的对象用作Python字典中的键(我不希望“对象id”用作键),例如

class MyThing:
    def __init__(self,name,location,length):
            self.name = name
            self.location = location
            self.length = length

如果名称和位置相同,我想将MyThing用作相同的键。从C#/ Java开始,我习惯于重写并提供一个equals和hashcode方法,并保证不会突变该hashcode依赖的任何内容。

我必须在Python中做什么才能做到这一点?我应该吗?

(在一个简单的例子中,就像这里一样,也许最好将一个(名称,位置)元组放置为键-但考虑到我希望键成为一个对象)

What must I do to use my objects of a custom type as keys in a Python dictionary (where I don’t want the “object id” to act as the key) , e.g.

class MyThing:
    def __init__(self,name,location,length):
            self.name = name
            self.location = location
            self.length = length

I’d want to use MyThing’s as keys that are considered the same if name and location are the same. From C#/Java I’m used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.

What must I do in Python to accomplish this ? Should I even ?

(In a simple case, like here, perhaps it’d be better to just place a (name,location) tuple as key – but consider I’d want the key to be an object)


回答 0

您需要添加2种方法,注意__hash____eq__

class MyThing:
    def __init__(self,name,location,length):
        self.name = name
        self.location = location
        self.length = length

    def __hash__(self):
        return hash((self.name, self.location))

    def __eq__(self, other):
        return (self.name, self.location) == (other.name, other.location)

    def __ne__(self, other):
        # Not strictly necessary, but to avoid having both x==y and x!=y
        # True at the same time
        return not(self == other)

Python dict文档对关键对象定义了这些要求,即它们必须是可哈希的

You need to add 2 methods, note __hash__ and __eq__:

class MyThing:
    def __init__(self,name,location,length):
        self.name = name
        self.location = location
        self.length = length

    def __hash__(self):
        return hash((self.name, self.location))

    def __eq__(self, other):
        return (self.name, self.location) == (other.name, other.location)

    def __ne__(self, other):
        # Not strictly necessary, but to avoid having both x==y and x!=y
        # True at the same time
        return not(self == other)

The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.


回答 1

使用python 2.6或更高版本的替代collections.namedtuple()方法-它可以节省编写任何特殊方法的时间:

from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
    def __new__(cls, name, location, length):
        obj = MyThingBase.__new__(cls, name, location)
        obj.length = length
        return obj

a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False

An alternative in Python 2.6 or above is to use collections.namedtuple() — it saves you writing any special methods:

from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
    def __new__(cls, name, location, length):
        obj = MyThingBase.__new__(cls, name, location)
        obj.length = length
        return obj

a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False

回答 2

__hash__如果需要特殊的哈希语义,则可以覆盖,__cmp__或者__eq__为了使您的类可用作键。比较相等的对象需要具有相同的哈希值。

Python期望__hash__返回一个整数,Banana()不建议返回:)

如您所述,用户定义的类在__hash__默认情况下会调用id(self)

文档中还有一些其他技巧:

__hash__() 从父类继承方法但更改的含义__cmp__()__eq__() 使得返回的哈希值不再合适的类(例如,通过切换到基于值的相等性概念,而不是基于默认的基于身份的相等性),这些类可以显式地标记为可以通过__hash__ = None 在类定义中进行设置来取消哈希。这样做意味着,在程序尝试检索其哈希值时,该类的实例不仅会引发适当的TypeError,而且在检查时它们也将被正确标识为不可哈希 isinstance(obj, collections.Hashable) (与定义自己__hash__()明确地引发TypeError的类不同 )。

You override __hash__ if you want special hash-semantics, and __cmp__ or __eq__ in order to make your class usable as a key. Objects who compare equal need to have the same hash value.

Python expects __hash__ to return an integer, returning Banana() is not recommended :)

User defined classes have __hash__ by default that calls id(self), as you noted.

There is some extra tips from the documentation.:

Classes which inherit a __hash__() method from a parent class but change the meaning of __cmp__() or __eq__() such that the hash value returned is no longer appropriate (e.g. by switching to a value-based concept of equality instead of the default identity based equality) can explicitly flag themselves as being unhashable by setting __hash__ = None in the class definition. Doing so means that not only will instances of the class raise an appropriate TypeError when a program attempts to retrieve their hash value, but they will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable) (unlike classes which define their own __hash__() to explicitly raise TypeError).


如何在python中打印字典的键值对

问题:如何在python中打印字典的键值对

我想像这样从python字典输出我的键值对:

key1 \t value1
key2 \t value2

我以为我可以这样做:

for i in d:
    print d.keys(i), d.values(i)

但是很明显,这不是keys()values()不参数的方式。

I want to output my key value pairs from a python dictionary as such:

key1 \t value1
key2 \t value2

I thought I could maybe do it like this:

for i in d:
    print d.keys(i), d.values(i)

but obviously that’s not how it goes as the keys() and values() don’t take an argument.


回答 0

您现有的代码只需进行一些调整。i 关键,因此您只需要使用它:

for i in d:
    print i, d[i]

您还可以获得包含键和值的迭代器。在Python 2中,d.items()返回(键,值)元组的列表,同时d.iteritems()返回提供相同值的迭代器:

for k, v in d.iteritems():
    print k, v

在Python 3中,d.items()返回迭代器;要获得列表,您需要将迭代器传递给list()自己。

for k, v in d.items():
    print(k, v)

Your existing code just needs a little tweak. i is the key, so you would just need to use it:

for i in d:
    print i, d[i]

You can also get an iterator that contains both keys and values. In Python 2, d.items() returns a list of (key, value) tuples, while d.iteritems() returns an iterator that provides the same:

for k, v in d.iteritems():
    print k, v

In Python 3, d.items() returns the iterator; to get a list, you need to pass the iterator to list() yourself.

for k, v in d.items():
    print(k, v)

回答 1

字典简介

d={'a':'apple','b':'ball'}
d.keys()  # displays all keys in list
['a','b']
d.values() # displays your values in list
['apple','ball']
d.items() # displays your pair tuple of key and value
[('a','apple'),('b','ball')

打印键,取值方法一

for x in d.keys():
    print x +" => " + d[x]

另一种方法

for key,value in d.items():
    print key + " => " + value

您可以使用以下方式获取密钥 iter

>>> list(iter(d))
['a', 'b']

您可以使用以下命令获取字典键的值get(key, [value])

d.get('a')
'apple'

如果字典中不存在键,则在给定默认值时将返回值。

d.get('c', 'Cat')
'Cat'

A little intro to dictionary

d={'a':'apple','b':'ball'}
d.keys()  # displays all keys in list
['a','b']
d.values() # displays your values in list
['apple','ball']
d.items() # displays your pair tuple of key and value
[('a','apple'),('b','ball')

Print keys,values method one

for x in d.keys():
    print x +" => " + d[x]

Another method

for key,value in d.items():
    print key + " => " + value

You can get keys using iter

>>> list(iter(d))
['a', 'b']

You can get value of key of dictionary using get(key, [value]):

d.get('a')
'apple'

If key is not present in dictionary,when default value given, will return value.

d.get('c', 'Cat')
'Cat'

回答 2

或者,对于Python 3:

for k,v in dict.items():
    print(k, v)

Or, for Python 3:

for k,v in dict.items():
    print(k, v)

回答 3

词典:

d={'key1':'value1','key2':'value2','key3':'value3'}

另一种解决方案:

print(*d.items(), sep='\n')

输出:

('key1', 'value1')
('key2', 'value2')
('key3', 'value3')

(但是,由于以前没有人提出过这样的建议,所以我认为这不是好习惯)

The dictionary:

d={'key1':'value1','key2':'value2','key3':'value3'}

Another one line solution:

print(*d.items(), sep='\n')

Output:

('key1', 'value1')
('key2', 'value2')
('key3', 'value3')

(but, since no one has suggested something like this before, I suspect it is not good practice)


回答 4

for key, value in d.iteritems():
    print key, '\t', value
for key, value in d.iteritems():
    print key, '\t', value

回答 5

您可以通过调用字典上的items()来访问键和/或值。

for key, value in d.iteritems():
    print(key, value)

You can access your keys and/or values by calling items() on your dictionary.

for key, value in d.iteritems():
    print(key, value)

回答 6

如果要按dict键对输出进行排序,则可以使用收集包。

import collections
for k, v in collections.OrderedDict(sorted(d.items())).items():
    print(k, v)

它适用于python 3

If you want to sort the output by dict key you can use the collection package.

import collections
for k, v in collections.OrderedDict(sorted(d.items())).items():
    print(k, v)

It works on python 3


回答 7

>>> d={'a':1,'b':2,'c':3}
>>> for kv in d.items():
...     print kv[0],'\t',kv[1]
... 
a   1
c   3
b   2
>>> d={'a':1,'b':2,'c':3}
>>> for kv in d.items():
...     print kv[0],'\t',kv[1]
... 
a   1
c   3
b   2

回答 8

除了已经提到的方法之外,还可以使用“ viewitems”,“ viewkeys”,“ viewvalues”

>>> d = {320: 1, 321: 0, 322: 3}
>>> list(d.viewitems())
[(320, 1), (321, 0), (322, 3)]
>>> list(d.viewkeys())
[320, 321, 322]
>>> list(d.viewvalues())
[1, 0, 3]

要么

>>> list(d.iteritems())
[(320, 1), (321, 0), (322, 3)]
>>> list(d.iterkeys())
[320, 321, 322]
>>> list(d.itervalues())
[1, 0, 3]

或使用itemgetter

>>> from operator import itemgetter
>>> map(itemgetter(0), dd.items())     ####  for keys
['323', '332']
>>> map(itemgetter(1), dd.items())     ####  for values
['3323', 232]

In addition to ways already mentioned.. can use ‘viewitems’, ‘viewkeys’, ‘viewvalues’

>>> d = {320: 1, 321: 0, 322: 3}
>>> list(d.viewitems())
[(320, 1), (321, 0), (322, 3)]
>>> list(d.viewkeys())
[320, 321, 322]
>>> list(d.viewvalues())
[1, 0, 3]

Or

>>> list(d.iteritems())
[(320, 1), (321, 0), (322, 3)]
>>> list(d.iterkeys())
[320, 321, 322]
>>> list(d.itervalues())
[1, 0, 3]

or using itemgetter

>>> from operator import itemgetter
>>> map(itemgetter(0), dd.items())     ####  for keys
['323', '332']
>>> map(itemgetter(1), dd.items())     ####  for values
['3323', 232]

回答 9

一个简单的字典:

x = {'X':"yes", 'Y':"no", 'Z':"ok"}

要在Python 3中打印特定的(键,值)对(在此示例中为索引1的对):

for e in range(len(x)):
    print(([x for x in x.keys()][e], [x for x in x.values()][e]))

输出:

('X', 'yes')
('Y', 'no')
('Z', 'ok')

这是一种将所有对打印在一个元组中的一种线性解决方案:

print(tuple(([x for x in x.keys()][i], [x for x in x.values()][i]) for i in range(len(x))))

输出:

(('X', 'yes'), ('Y', 'no'), ('Z', 'ok'))

A simple dictionary:

x = {'X':"yes", 'Y':"no", 'Z':"ok"}

To print a specific (key, value) pair in Python 3 (pair at index 1 in this example):

for e in range(len(x)):
    print(([x for x in x.keys()][e], [x for x in x.values()][e]))

Output:

('X', 'yes')
('Y', 'no')
('Z', 'ok')

Here is a one liner solution to print all pairs in a tuple:

print(tuple(([x for x in x.keys()][i], [x for x in x.values()][i]) for i in range(len(x))))

Output:

(('X', 'yes'), ('Y', 'no'), ('Z', 'ok'))

如何从Python中的字典中提取所有值?

问题:如何从Python中的字典中提取所有值?

我有字典d = {1:-0.3246, 2:-0.9185, 3:-3985, ...}

如何将的所有值提取d到列表中l

I have a dictionary d = {1:-0.3246, 2:-0.9185, 3:-3985, ...}.

How do I extract all of the values of d into a list l?


回答 0

如果你只需要字典的键123使用:your_dict.keys()

如果你只需要在字典中的值-0.3246-0.9185-3985使用:your_dict.values()

如果您想同时使用键和值,请使用:your_dict.items()返回一个元组列表[(key1, value1), (key2, value2), ...]

If you only need the dictionary keys 1, 2, and 3 use: your_dict.keys().

If you only need the dictionary values -0.3246, -0.9185, and -3985 use: your_dict.values().

If you want both keys and values use: your_dict.items() which returns a list of tuples [(key1, value1), (key2, value2), ...].


回答 1

使用 values()

>>> d = {1:-0.3246, 2:-0.9185, 3:-3985}

>>> d.values()
<<< [-0.3246, -0.9185, -3985]

Use values()

>>> d = {1:-0.3246, 2:-0.9185, 3:-3985}

>>> d.values()
<<< [-0.3246, -0.9185, -3985]

回答 2

如果需要所有值,请使用以下命令:

dict_name_goes_here.values()

如果需要所有键,请使用以下命令:

dict_name_goes_here.keys()

如果您想要所有项目(键和值),则可以使用以下命令:

dict_name_goes_here.items()

If you want all of the values, use this:

dict_name_goes_here.values()

If you want all of the keys, use this:

dict_name_goes_here.keys()

IF you want all of the items (both keys and values), I would use this:

dict_name_goes_here.items()

回答 3

values()在dict上调用方法。

Call the values() method on the dict.


回答 4

对于Python 3,您需要:

list_of_dict_values = list(dict_name.values())

For Python 3, you need:

list_of_dict_values = list(dict_name.values())

回答 5

对于嵌套字典,字典列表和列出字典的字典,…您可以使用

def get_all_values(d):
    if isinstance(d, dict):
        for v in d.values():
            yield from get_all_values(v)
    elif isinstance(d, list):
        for v in d:
            yield from get_all_values(v)
    else:
        yield d 

一个例子:

d = {'a': 1, 'b': {'c': 2, 'd': [3, 4]}, 'e': [{'f': 5}, {'g': 6}]}

list(get_all_values(d)) # returns [1, 2, 3, 4, 5, 6]

PS:我爱yield。;-)

For nested dicts, lists of dicts, and dicts of listed dicts, … you can use

def get_all_values(d):
    if isinstance(d, dict):
        for v in d.values():
            yield from get_all_values(v)
    elif isinstance(d, list):
        for v in d:
            yield from get_all_values(v)
    else:
        yield d 

An example:

d = {'a': 1, 'b': {'c': 2, 'd': [3, 4]}, 'e': [{'f': 5}, {'g': 6}]}

list(get_all_values(d)) # returns [1, 2, 3, 4, 5, 6]

PS: I love yield. ;-)


回答 6

如果需要所有值,请使用以下命令:

dict_name_goes_here.values()

If you want all of the values, use this:

dict_name_goes_here.values()

回答 7

d = <dict>
values = d.values()
d = <dict>
values = d.values()

回答 8

要查看按键:

for key in d.keys():
    print(key)

要获取每个键所引用的值:

for key in d.keys():
    print(d[key])

添加到列表:

for key in d.keys():
    mylist.append(d[key])

To see the keys:

for key in d.keys():
    print(key)

To get the values that each key is referencing:

for key in d.keys():
    print(d[key])

Add to a list:

for key in d.keys():
    mylist.append(d[key])

回答 9

Python的鸭式输入原则上应确定对象可以执行的操作,即其属性和方法。通过查看字典对象,可以尝试猜测它是否具有以下至少一项:dict.keys()dict.values()方法。您应该尝试将这种方法用于将来在运行时会进行类型检查的编程语言,尤其是具有鸭式语言的语言。

Pythonic duck-typing should in principle determine what an object can do, i.e., its properties and methods. By looking at a dictionary object one may try to guess it has at least one of the following: dict.keys() or dict.values() methods. You should try to use this approach for future work with programming languages whose type checking occurs at runtime, especially those with the duck-typing nature.


回答 10

dictionary_name={key1:value1,key2:value2,key3:value3}
dictionary_name.values()
dictionary_name={key1:value1,key2:value2,key3:value3}
dictionary_name.values()

Python字典:获取键列表的值列表

问题:Python字典:获取键列表的值列表

是否有内置/快速的方法来使用字典的键列表来获取对应项的列表?

例如,我有:

>>> mydict = {'one': 1, 'two': 2, 'three': 3}
>>> mykeys = ['three', 'one']

如何使用mykeys字典中的相应值作为列表?

>>> mydict.WHAT_GOES_HERE(mykeys)
[3, 1]

Is there a built-in/quick way to use a list of keys to a dictionary to get a list of corresponding items?

For instance I have:

>>> mydict = {'one': 1, 'two': 2, 'three': 3}
>>> mykeys = ['three', 'one']

How can I use mykeys to get the corresponding values in the dictionary as a list?

>>> mydict.WHAT_GOES_HERE(mykeys)
[3, 1]

回答 0

列表理解似乎是执行此操作的好方法:

>>> [mydict[x] for x in mykeys]
[3, 1]

A list comprehension seems to be a good way to do this:

>>> [mydict[x] for x in mykeys]
[3, 1]

回答 1

除list-comp外,还有几种其他方法:

  • 如果找不到密钥,则生成列表并引发异常: map(mydict.__getitem__, mykeys)
  • 带有Noneif键的构建列表:map(mydict.get, mykeys)

另外,使用operator.itemgetter可以返回一个元组:

from operator import itemgetter
myvalues = itemgetter(*mykeys)(mydict)
# use `list(...)` if list is required

注意:在Python3中,map返回迭代器而不是列表。使用list(map(...))了列表。

A couple of other ways than list-comp:

  • Build list and throw exception if key not found: map(mydict.__getitem__, mykeys)
  • Build list with None if key not found: map(mydict.get, mykeys)

Alternatively, using operator.itemgetter can return a tuple:

from operator import itemgetter
myvalues = itemgetter(*mykeys)(mydict)
# use `list(...)` if list is required

Note: in Python3, map returns an iterator rather than a list. Use list(map(...)) for a list.


回答 2

速度比较:

Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)] on win32
In[1]: l = [0,1,2,3,2,3,1,2,0]
In[2]: m = {0:10, 1:11, 2:12, 3:13}
In[3]: %timeit [m[_] for _ in l]  # list comprehension
1000000 loops, best of 3: 762 ns per loop
In[4]: %timeit map(lambda _: m[_], l)  # using 'map'
1000000 loops, best of 3: 1.66 µs per loop
In[5]: %timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
1000000 loops, best of 3: 1.65 µs per loop
In[6]: %timeit map(m.__getitem__, l)
The slowest run took 4.01 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 853 ns per loop
In[7]: %timeit map(m.get, l)
1000000 loops, best of 3: 908 ns per loop
In[33]: from operator import itemgetter
In[34]: %timeit list(itemgetter(*l)(m))
The slowest run took 9.26 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 739 ns per loop

因此,列表理解和itemgetter是最快的方法。

更新:对于大型随机列表和地图,我得到了一些不同的结果:

Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)] on win32
In[2]: import numpy.random as nprnd
l = nprnd.randint(1000, size=10000)
m = dict([(_, nprnd.rand()) for _ in range(1000)])
from operator import itemgetter
import operator
f = operator.itemgetter(*l)
%timeit f(m)
%timeit list(itemgetter(*l)(m))
%timeit [m[_] for _ in l]  # list comprehension
%timeit map(m.__getitem__, l)
%timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
%timeit map(m.get, l)
%timeit map(lambda _: m[_], l)
1000 loops, best of 3: 1.14 ms per loop
1000 loops, best of 3: 1.68 ms per loop
100 loops, best of 3: 2 ms per loop
100 loops, best of 3: 2.05 ms per loop
100 loops, best of 3: 2.19 ms per loop
100 loops, best of 3: 2.53 ms per loop
100 loops, best of 3: 2.9 ms per loop

因此,在这种情况下,明确的获胜者是f = operator.itemgetter(*l); f(m),而明确的局外人:map(lambda _: m[_], l)

适用于Python 3.6.4的更新:

import numpy.random as nprnd
l = nprnd.randint(1000, size=10000)
m = dict([(_, nprnd.rand()) for _ in range(1000)])
from operator import itemgetter
import operator
f = operator.itemgetter(*l)
%timeit f(m)
%timeit list(itemgetter(*l)(m))
%timeit [m[_] for _ in l]  # list comprehension
%timeit list(map(m.__getitem__, l))
%timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
%timeit list(map(m.get, l))
%timeit list(map(lambda _: m[_], l)
1.66 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.1 ms ± 93.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.58 ms ± 88.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.36 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.98 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.7 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.14 ms ± 62.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

因此,Python 3.6.4的结果几乎相同。

A little speed comparison:

Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)] on win32
In[1]: l = [0,1,2,3,2,3,1,2,0]
In[2]: m = {0:10, 1:11, 2:12, 3:13}
In[3]: %timeit [m[_] for _ in l]  # list comprehension
1000000 loops, best of 3: 762 ns per loop
In[4]: %timeit map(lambda _: m[_], l)  # using 'map'
1000000 loops, best of 3: 1.66 µs per loop
In[5]: %timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
1000000 loops, best of 3: 1.65 µs per loop
In[6]: %timeit map(m.__getitem__, l)
The slowest run took 4.01 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 853 ns per loop
In[7]: %timeit map(m.get, l)
1000000 loops, best of 3: 908 ns per loop
In[33]: from operator import itemgetter
In[34]: %timeit list(itemgetter(*l)(m))
The slowest run took 9.26 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 739 ns per loop

So list comprehension and itemgetter are the fastest ways to do this.

UPDATE: For large random lists and maps I had a bit different results:

Python 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)] on win32
In[2]: import numpy.random as nprnd
l = nprnd.randint(1000, size=10000)
m = dict([(_, nprnd.rand()) for _ in range(1000)])
from operator import itemgetter
import operator
f = operator.itemgetter(*l)
%timeit f(m)
%timeit list(itemgetter(*l)(m))
%timeit [m[_] for _ in l]  # list comprehension
%timeit map(m.__getitem__, l)
%timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
%timeit map(m.get, l)
%timeit map(lambda _: m[_], l)
1000 loops, best of 3: 1.14 ms per loop
1000 loops, best of 3: 1.68 ms per loop
100 loops, best of 3: 2 ms per loop
100 loops, best of 3: 2.05 ms per loop
100 loops, best of 3: 2.19 ms per loop
100 loops, best of 3: 2.53 ms per loop
100 loops, best of 3: 2.9 ms per loop

So in this case the clear winner is f = operator.itemgetter(*l); f(m), and clear outsider: map(lambda _: m[_], l).

UPDATE for Python 3.6.4:

import numpy.random as nprnd
l = nprnd.randint(1000, size=10000)
m = dict([(_, nprnd.rand()) for _ in range(1000)])
from operator import itemgetter
import operator
f = operator.itemgetter(*l)
%timeit f(m)
%timeit list(itemgetter(*l)(m))
%timeit [m[_] for _ in l]  # list comprehension
%timeit list(map(m.__getitem__, l))
%timeit list(m[_] for _ in l)  # a generator expression passed to a list constructor.
%timeit list(map(m.get, l))
%timeit list(map(lambda _: m[_], l)
1.66 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.1 ms ± 93.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.58 ms ± 88.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.36 ms ± 60.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.98 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.7 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.14 ms ± 62.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So, results for Python 3.6.4 is almost the same.


回答 3

这是三种方式。

KeyError未找到密钥时引发:

result = [mapping[k] for k in iterable]

缺少键的默认值。

result = [mapping.get(k, default_value) for k in iterable]

跳过丢失的键。

result = [mapping[k] for k in iterable if k in mapping]

Here are three ways.

Raising KeyError when key is not found:

result = [mapping[k] for k in iterable]

Default values for missing keys.

result = [mapping.get(k, default_value) for k in iterable]

Skipping missing keys.

result = [mapping[k] for k in iterable if k in mapping]

回答 4

试试这个:

mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one','ten']
newList=[mydict[k] for k in mykeys if k in mydict]
print newList
[3, 1]

Try This:

mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one','ten']
newList=[mydict[k] for k in mykeys if k in mydict]
print newList
[3, 1]

回答 5

试试这个:

mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one'] # if there are many keys, use a set

[mydict[k] for k in mykeys]
=> [3, 1]

Try this:

mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one'] # if there are many keys, use a set

[mydict[k] for k in mykeys]
=> [3, 1]

回答 6

new_dict = {x: v for x, v in mydict.items() if x in mykeys}
new_dict = {x: v for x, v in mydict.items() if x in mykeys}

回答 7

Pandas非常优雅地做到了这一点,尽管通常对列表的理解在技术上总是Python风格的。我现在没有时间进行速度比较(我稍后会再放入):

import pandas as pd
mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one']
temp_df = pd.DataFrame().append(mydict)
# You can export DataFrames to a number of formats, using a list here. 
temp_df[mykeys].values[0]
# Returns: array([ 3.,  1.])

# If you want a dict then use this instead:
# temp_df[mykeys].to_dict(orient='records')[0]
# Returns: {'one': 1.0, 'three': 3.0}

Pandas does this very elegantly, though ofc list comprehensions will always be more technically Pythonic. I don’t have time to put in a speed comparison right now (I’ll come back later and put it in):

import pandas as pd
mydict = {'one': 1, 'two': 2, 'three': 3}
mykeys = ['three', 'one']
temp_df = pd.DataFrame().append(mydict)
# You can export DataFrames to a number of formats, using a list here. 
temp_df[mykeys].values[0]
# Returns: array([ 3.,  1.])

# If you want a dict then use this instead:
# temp_df[mykeys].to_dict(orient='records')[0]
# Returns: {'one': 1.0, 'three': 3.0}

回答 8

或仅仅是mydict.keys()那是对字典的内置方法调用。也探索mydict.values()mydict.items()

//哦,OP帖子让我感到困惑。

Or just mydict.keys() That’s a builtin method call for dictionaries. Also explore mydict.values() and mydict.items().

//Ah, OP post confused me.


回答 9

关闭 Python之后:从给定顺序的字典值创建列表的有效方法

检索密钥而不建立列表:

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import collections


class DictListProxy(collections.Sequence):
    def __init__(self, klist, kdict, *args, **kwargs):
        super(DictListProxy, self).__init__(*args, **kwargs)
        self.klist = klist
        self.kdict = kdict

    def __len__(self):
        return len(self.klist)

    def __getitem__(self, key):
        return self.kdict[self.klist[key]]


myDict = {'age': 'value1', 'size': 'value2', 'weigth': 'value3'}
order_list = ['age', 'weigth', 'size']

dlp = DictListProxy(order_list, myDict)

print(','.join(dlp))
print()
print(dlp[1])

输出:

value1,value3,value2

value3

哪个与列表给出的顺序匹配

Following closure of Python: efficient way to create a list from dict values with a given order

Retrieving the keys without building the list:

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import collections


class DictListProxy(collections.Sequence):
    def __init__(self, klist, kdict, *args, **kwargs):
        super(DictListProxy, self).__init__(*args, **kwargs)
        self.klist = klist
        self.kdict = kdict

    def __len__(self):
        return len(self.klist)

    def __getitem__(self, key):
        return self.kdict[self.klist[key]]


myDict = {'age': 'value1', 'size': 'value2', 'weigth': 'value3'}
order_list = ['age', 'weigth', 'size']

dlp = DictListProxy(order_list, myDict)

print(','.join(dlp))
print()
print(dlp[1])

The output:

value1,value3,value2

value3

Which matches the order given by the list


回答 10

reduce(lambda x,y: mydict.get(y) and x.append(mydict[y]) or x, mykeys,[])

万一有字典中没有的键。

reduce(lambda x,y: mydict.get(y) and x.append(mydict[y]) or x, mykeys,[])

incase there are keys not in dict.


如何在Python中使用列表中的键和空值初始化字典?

问题:如何在Python中使用列表中的键和空值初始化字典?

我想从中得到:

keys = [1,2,3]

对此:

{1: None, 2: None, 3: None}

有pythonic的方法吗?

这是一个丑陋的方法:

>>> keys = [1,2,3]
>>> dict([(1,2)])
{1: 2}
>>> dict(zip(keys, [None]*len(keys)))
{1: None, 2: None, 3: None}

I’d like to get from this:

keys = [1,2,3]

to this:

{1: None, 2: None, 3: None}

Is there a pythonic way of doing it?

This is an ugly way to do it:

>>> keys = [1,2,3]
>>> dict([(1,2)])
{1: 2}
>>> dict(zip(keys, [None]*len(keys)))
{1: None, 2: None, 3: None}

回答 0

dict.fromkeys([1, 2, 3, 4])

这实际上是一个类方法,因此它也适用于dict子类(如collections.defaultdict)。可选的第二个参数指定用于键的值(默认为)None

dict.fromkeys([1, 2, 3, 4])

This is actually a classmethod, so it works for dict-subclasses (like collections.defaultdict) as well. The optional second argument specifies the value to use for the keys (defaults to None.)


回答 1

没有人愿意提供字典理解的解决方案吗?

>>> keys = [1,2,3,5,6,7]
>>> {key: None for key in keys}
{1: None, 2: None, 3: None, 5: None, 6: None, 7: None}

nobody cared to give a dict-comprehension solution ?

>>> keys = [1,2,3,5,6,7]
>>> {key: None for key in keys}
{1: None, 2: None, 3: None, 5: None, 6: None, 7: None}

回答 2

dict.fromkeys(keys, None)
dict.fromkeys(keys, None)

回答 3

>>> keyDict = {"a","b","c","d"}

>>> dict([(key, []) for key in keyDict])

输出:

{'a': [], 'c': [], 'b': [], 'd': []}
>>> keyDict = {"a","b","c","d"}

>>> dict([(key, []) for key in keyDict])

Output:

{'a': [], 'c': [], 'b': [], 'd': []}

回答 4

d = {}
for i in keys:
    d[i] = None
d = {}
for i in keys:
    d[i] = None

回答 5

许多地方要附加任意键默认/初始值的工作流程,你没有需要到哈希每个键单独的时间提前。您可以使用collections.defaultdict。例如:

from collections import defaultdict

d = defaultdict(lambda: None)

print(d[1])  # None
print(d[2])  # None
print(d[3])  # None

这样效率更高,它省去了在实例化时散列所有键的麻烦。此外,defaultdict是的子类dict,因此通常无需转换回常规字典。

对于需要对许可键进行控制的工作流,可以dict.fromkeys按照已接受的答案使用:

d = dict.fromkeys([1, 2, 3, 4])

In many workflows where you want to attach a default / initial value for arbitrary keys, you don’t need to hash each key individually ahead of time. You can use collections.defaultdict. For example:

from collections import defaultdict

d = defaultdict(lambda: None)

print(d[1])  # None
print(d[2])  # None
print(d[3])  # None

This is more efficient, it saves having to hash all your keys at instantiation. Moreover, defaultdict is a subclass of dict, so there’s usually no need to convert back to a regular dictionary.

For workflows where you require controls on permissible keys, you can use dict.fromkeys as per the accepted answer:

d = dict.fromkeys([1, 2, 3, 4])

如何检查字典中是否存在值(Python)

问题:如何检查字典中是否存在值(Python)

我在python中有以下字典:

d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}

我需要一种方法来查找此字典中是否存在诸如“一个”或“两个”的值。

例如,如果我想知道索引“ 1”是否存在,我只需要键入:

"1" in d

然后python会告诉我这是对还是错,但是我需要做同样的事情,只是要找到一个值是否存在。

I have the following dictionary in python:

d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}

I need a way to find if a value such as “one” or “two” exists in this dictionary.

For example, if I wanted to know if the index “1” existed I would simply have to type:

"1" in d

And then python would tell me if that is true or false, however I need to do that same exact thing except to find if a value exists.


回答 0

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> 'one' in d.values()
True

出于好奇,一些比较时机:

>>> T(lambda : 'one' in d.itervalues()).repeat()
[0.28107285499572754, 0.29107213020324707, 0.27941107749938965]
>>> T(lambda : 'one' in d.values()).repeat()
[0.38303399085998535, 0.37257885932922363, 0.37096405029296875]
>>> T(lambda : 'one' in d.viewvalues()).repeat()
[0.32004380226135254, 0.31716084480285645, 0.3171098232269287]

编辑:并且,如果您想知道为什么…原因是上述每种方法返回的对象类型不同,则该对象可能适合或可能不适合于查找操作:

>>> type(d.viewvalues())
<type 'dict_values'>
>>> type(d.values())
<type 'list'>
>>> type(d.itervalues())
<type 'dictionary-valueiterator'>

EDIT2:根据注释中的请求…

>>> T(lambda : 'four' in d.itervalues()).repeat()
[0.41178202629089355, 0.3959040641784668, 0.3970959186553955]
>>> T(lambda : 'four' in d.values()).repeat()
[0.4631338119506836, 0.43541407585144043, 0.4359898567199707]
>>> T(lambda : 'four' in d.viewvalues()).repeat()
[0.43414998054504395, 0.4213531017303467, 0.41684913635253906]
>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> 'one' in d.values()
True

Out of curiosity, some comparative timing:

>>> T(lambda : 'one' in d.itervalues()).repeat()
[0.28107285499572754, 0.29107213020324707, 0.27941107749938965]
>>> T(lambda : 'one' in d.values()).repeat()
[0.38303399085998535, 0.37257885932922363, 0.37096405029296875]
>>> T(lambda : 'one' in d.viewvalues()).repeat()
[0.32004380226135254, 0.31716084480285645, 0.3171098232269287]

EDIT: And in case you wonder why… the reason is that each of the above returns a different type of object, which may or may not be well suited for lookup operations:

>>> type(d.viewvalues())
<type 'dict_values'>
>>> type(d.values())
<type 'list'>
>>> type(d.itervalues())
<type 'dictionary-valueiterator'>

EDIT2: As per request in comments…

>>> T(lambda : 'four' in d.itervalues()).repeat()
[0.41178202629089355, 0.3959040641784668, 0.3970959186553955]
>>> T(lambda : 'four' in d.values()).repeat()
[0.4631338119506836, 0.43541407585144043, 0.4359898567199707]
>>> T(lambda : 'four' in d.viewvalues()).repeat()
[0.43414998054504395, 0.4213531017303467, 0.41684913635253906]

回答 1

您可以使用

"one" in d.itervalues()

测试是否"one"在字典的值中。

You can use

"one" in d.itervalues()

to test if "one" is among the values of your dictionary.


回答 2

Python字典具有get(key)函数

>>> d.get(key)

例如,

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> d.get('3')
'three'
>>> d.get('10')
none

如果您的密钥不存在,将返回none值。

foo = d[key] # raise error if key doesn't exist
foo = d.get(key) # return none if key doesn't exist

与小于3.0和大于5.0的版本相关的内容。。

Python dictionary has get(key) funcion

>>> d.get(key)

For Example,

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> d.get('3')
'three'
>>> d.get('10')
none

If your key does’nt exist, will return none value.

foo = d[key] # raise error if key doesn't exist
foo = d.get(key) # return none if key doesn't exist

Content relevant to versions less than 3.0 and greater than 5.0. .


回答 3

使用字典视图:

if x in d.viewvalues():
    dosomething()..

Use dictionary views:

if x in d.viewvalues():
    dosomething()..

回答 4

不同类型检查值是否存在

d = {"key1":"value1", "key2":"value2"}
"value10" in d.values() 
>> False

如果值列表怎么办

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6']}
"value4" in [x for v in test.values() for x in v]
>>True

如果带有字符串值的值列表怎么办

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6'], 'key5':'value10'}
values = test.values()
"value10" in [x for v in test.values() for x in v] or 'value10' in values
>>True

Different types to check the values exists

d = {"key1":"value1", "key2":"value2"}
"value10" in d.values() 
>> False

What if list of values

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6']}
"value4" in [x for v in test.values() for x in v]
>>True

What if list of values with string values

test = {'key1': ['value4', 'value5', 'value6'], 'key2': ['value9'], 'key3': ['value6'], 'key5':'value10'}
values = test.values()
"value10" in [x for v in test.values() for x in v] or 'value10' in values
>>True

回答 5

在Python 3中,您可以使用values()字典的功能。它返回值的视图对象。这又可以传递给iter返回迭代器对象的函数。可以使用来检查迭代器in,如下所示:

'one' in iter(d.values())

或者您可以直接使用视图对象,因为它类似于列表

'one' in d.values()

In Python 3 you can use the values() function of the dictionary. It returns a view object of the values. This, in turn, can be passed to the iter function which returns an iterator object. The iterator can be checked using in, like this,

'one' in iter(d.values())

Or you can use the view object directly since it is similar to a list

'one' in d.values()

比较两个字典并检查多少对(键,值)相等

问题:比较两个字典并检查多少对(键,值)相等

我有两个字典,为简单起见,我将采用以下两个:

>>> x = dict(a=1, b=2)
>>> y = dict(a=2, b=2)

现在,我想比较中的每一key, value对是否x具有相同的对应值y。所以我这样写:

>>> for x_values, y_values in zip(x.iteritems(), y.iteritems()):
        if x_values == y_values:
            print 'Ok', x_values, y_values
        else:
            print 'Not', x_values, y_values

而且它有效,因为tuple返回了a ,然后比较了相等性。

我的问题:

这样对吗?有更好的方法吗?最好不要提速,我是在讲代码优雅。

更新:我忘了提到我必须检查多少key, value对是相等的。

I have two dictionaries, but for simplification, I will take these two:

>>> x = dict(a=1, b=2)
>>> y = dict(a=2, b=2)

Now, I want to compare whether each key, value pair in x has the same corresponding value in y. So I wrote this:

>>> for x_values, y_values in zip(x.iteritems(), y.iteritems()):
        if x_values == y_values:
            print 'Ok', x_values, y_values
        else:
            print 'Not', x_values, y_values

And it works since a tuple is returned and then compared for equality.

My questions:

Is this correct? Is there a better way to do this? Better not in speed, I am talking about code elegance.

UPDATE: I forgot to mention that I have to check how many key, value pairs are equal.


回答 0

如果您想知道两个字典中有多少个值匹配,您应该说:)

也许是这样的:

shared_items = {k: x[k] for k in x if k in y and x[k] == y[k]}
print len(shared_items)

If you want to know how many values match in both the dictionaries, you should have said that :)

Maybe something like this:

shared_items = {k: x[k] for k in x if k in y and x[k] == y[k]}
print len(shared_items)

回答 1

您想要做的就是 x==y

您所做的并不是一个好主意,因为字典中的项目不应该有任何顺序。您可能正在[('a',1),('b',1)][('b',1), ('a',1)](相同的词典,不同的顺序)进行比较。

例如,请参见:

>>> x = dict(a=2, b=2,c=3, d=4)
>>> x
{'a': 2, 'c': 3, 'b': 2, 'd': 4}
>>> y = dict(b=2,c=3, d=4)
>>> y
{'c': 3, 'b': 2, 'd': 4}
>>> zip(x.iteritems(), y.iteritems())
[(('a', 2), ('c', 3)), (('c', 3), ('b', 2)), (('b', 2), ('d', 4))]

区别只是一个项目,但是您的算法将看到所有项目都是不同的

What you want to do is simply x==y

What you do is not a good idea, because the items in a dictionary are not supposed to have any order. You might be comparing [('a',1),('b',1)] with [('b',1), ('a',1)] (same dictionaries, different order).

For example, see this:

>>> x = dict(a=2, b=2,c=3, d=4)
>>> x
{'a': 2, 'c': 3, 'b': 2, 'd': 4}
>>> y = dict(b=2,c=3, d=4)
>>> y
{'c': 3, 'b': 2, 'd': 4}
>>> zip(x.iteritems(), y.iteritems())
[(('a', 2), ('c', 3)), (('c', 3), ('b', 2)), (('b', 2), ('d', 4))]

The difference is only one item, but your algorithm will see that all items are different


回答 2

def dict_compare(d1, d2):
    d1_keys = set(d1.keys())
    d2_keys = set(d2.keys())
    shared_keys = d1_keys.intersection(d2_keys)
    added = d1_keys - d2_keys
    removed = d2_keys - d1_keys
    modified = {o : (d1[o], d2[o]) for o in shared_keys if d1[o] != d2[o]}
    same = set(o for o in shared_keys if d1[o] == d2[o])
    return added, removed, modified, same

x = dict(a=1, b=2)
y = dict(a=2, b=2)
added, removed, modified, same = dict_compare(x, y)
def dict_compare(d1, d2):
    d1_keys = set(d1.keys())
    d2_keys = set(d2.keys())
    shared_keys = d1_keys.intersection(d2_keys)
    added = d1_keys - d2_keys
    removed = d2_keys - d1_keys
    modified = {o : (d1[o], d2[o]) for o in shared_keys if d1[o] != d2[o]}
    same = set(o for o in shared_keys if d1[o] == d2[o])
    return added, removed, modified, same

x = dict(a=1, b=2)
y = dict(a=2, b=2)
added, removed, modified, same = dict_compare(x, y)

回答 3

dic1 == dic2

python docs

为了说明这一点,下面的例子都将返回一个字典等于{"one": 1, "two": 2, "three": 3}

>>> a = dict(one=1, two=2, three=3)
>>> b = {'one': 1, 'two': 2, 'three': 3}
>>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
>>> d = dict([('two', 2), ('one', 1), ('three', 3)])
>>> e = dict({'three': 3, 'one': 1, 'two': 2})
>>> a == b == c == d == e
True

如第一个示例中那样,提供关键字参数仅适用于有效的Python标识符的键。否则,可以使用任何有效的密钥。

py2和有效py3

dic1 == dic2

From python docs:

The following examples all return a dictionary equal to {"one": 1, "two": 2, "three": 3}:

>>> a = dict(one=1, two=2, three=3)
>>> b = {'one': 1, 'two': 2, 'three': 3}
>>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
>>> d = dict([('two', 2), ('one', 1), ('three', 3)])
>>> e = dict({'three': 3, 'one': 1, 'two': 2})
>>> a == b == c == d == e
True

Providing keyword arguments as in the first example only works for keys that are valid Python identifiers. Otherwise, any valid keys can be used.

Valid for both py2 and py3.


回答 4

我是python的新手,但最终却做了类似于@mouad的操作

unmatched_item = set(dict_1.items()) ^ set(dict_2.items())
len(unmatched_item) # should be 0

^当两个字典中的所有元素相同时,XOR运算符()应该消除该字典中的所有元素。

I’m new to python but I ended up doing something similar to @mouad

unmatched_item = set(dict_1.items()) ^ set(dict_2.items())
len(unmatched_item) # should be 0

The XOR operator (^) should eliminate all elements of the dict when they are the same in both dicts.


回答 5

由于似乎没有人提及deepdiff,因此出于完整性考虑,我将在此处添加。我发现通常获取(嵌套)对象的差异非常方便:

安装

pip install deepdiff

样例代码

import deepdiff
import json

dict_1 = {
    "a": 1,
    "nested": {
        "b": 1,
    }
}

dict_2 = {
    "a": 2,
    "nested": {
        "b": 2,
    }
}

diff = deepdiff.DeepDiff(dict_1, dict_2)
print(json.dumps(diff, indent=4))

输出量

{
    "values_changed": {
        "root['a']": {
            "new_value": 2,
            "old_value": 1
        },
        "root['nested']['b']": {
            "new_value": 2,
            "old_value": 1
        }
    }
}

关于漂亮地打印结果以供检查的注意事项:如果两个字典具有相同的属性键(与示例中的属性值可能不同),则上述代码有效。但是,如果存在"extra"属性是dict之一,则json.dumps()失败并显示

TypeError: Object of type PrettyOrderedSet is not JSON serializable

解决方案:使用diff.to_json()json.loads()/ json.dumps()漂亮打印:

import deepdiff
import json

dict_1 = {
    "a": 1,
    "nested": {
        "b": 1,
    },
    "extra": 3
}

dict_2 = {
    "a": 2,
    "nested": {
        "b": 2,
    }
}

diff = deepdiff.DeepDiff(dict_1, dict_2)
print(json.dumps(json.loads(diff.to_json()), indent=4))  

输出:

{
    "dictionary_item_removed": [
        "root['extra']"
    ],
    "values_changed": {
        "root['a']": {
            "new_value": 2,
            "old_value": 1
        },
        "root['nested']['b']": {
            "new_value": 2,
            "old_value": 1
        }
    }
}

替代方案:use pprint,导致格式不同:

import pprint

# same code as above

pprint.pprint(diff, indent=4)

输出:

{   'dictionary_item_removed': [root['extra']],
    'values_changed': {   "root['a']": {   'new_value': 2,
                                           'old_value': 1},
                          "root['nested']['b']": {   'new_value': 2,
                                                     'old_value': 1}}}

Since it seems nobody mentioned deepdiff, I will add it here for completeness. I find it very convenient for getting diff of (nested) objects in general:

Installation

pip install deepdiff

Sample code

import deepdiff
import json

dict_1 = {
    "a": 1,
    "nested": {
        "b": 1,
    }
}

dict_2 = {
    "a": 2,
    "nested": {
        "b": 2,
    }
}

diff = deepdiff.DeepDiff(dict_1, dict_2)
print(json.dumps(diff, indent=4))

Output

{
    "values_changed": {
        "root['a']": {
            "new_value": 2,
            "old_value": 1
        },
        "root['nested']['b']": {
            "new_value": 2,
            "old_value": 1
        }
    }
}

Note about pretty-printing the result for inspection: The above code works if both dicts have the same attribute keys (with possibly different attribute values as in the example). However, if an "extra" attribute is present is one of the dicts, json.dumps() fails with

TypeError: Object of type PrettyOrderedSet is not JSON serializable

Solution: use diff.to_json() and json.loads() / json.dumps() to pretty-print:

import deepdiff
import json

dict_1 = {
    "a": 1,
    "nested": {
        "b": 1,
    },
    "extra": 3
}

dict_2 = {
    "a": 2,
    "nested": {
        "b": 2,
    }
}

diff = deepdiff.DeepDiff(dict_1, dict_2)
print(json.dumps(json.loads(diff.to_json()), indent=4))  

Output:

{
    "dictionary_item_removed": [
        "root['extra']"
    ],
    "values_changed": {
        "root['a']": {
            "new_value": 2,
            "old_value": 1
        },
        "root['nested']['b']": {
            "new_value": 2,
            "old_value": 1
        }
    }
}

Alternative: use pprint, results in a different formatting:

import pprint

# same code as above

pprint.pprint(diff, indent=4)

Output:

{   'dictionary_item_removed': [root['extra']],
    'values_changed': {   "root['a']": {   'new_value': 2,
                                           'old_value': 1},
                          "root['nested']['b']": {   'new_value': 2,
                                                     'old_value': 1}}}

回答 6

只需使用:

assert cmp(dict1, dict2) == 0

Just use:

assert cmp(dict1, dict2) == 0

回答 7

如果您假设两个字典都只包含简单值,则@mouad的答案很好。但是,如果您的字典中包含字典,则您将获得异常,因为字典不可哈希。

在我的头顶上,这样的事情可能会起作用:

def compare_dictionaries(dict1, dict2):
     if dict1 is None or dict2 is None:
        print('Nones')
        return False

     if (not isinstance(dict1, dict)) or (not isinstance(dict2, dict)):
        print('Not dict')
        return False

     shared_keys = set(dict1.keys()) & set(dict2.keys())

     if not ( len(shared_keys) == len(dict1.keys()) and len(shared_keys) == len(dict2.keys())):
        print('Not all keys are shared')
        return False


     dicts_are_equal = True
     for key in dict1.keys():
         if isinstance(dict1[key], dict) or isinstance(dict2[key], dict):
             dicts_are_equal = dicts_are_equal and compare_dictionaries(dict1[key], dict2[key])
         else:
             dicts_are_equal = dicts_are_equal and all(atleast_1d(dict1[key] == dict2[key]))

     return dicts_are_equal

@mouad ‘s answer is nice if you assume that both dictionaries contain simple values only. However, if you have dictionaries that contain dictionaries you’ll get an exception as dictionaries are not hashable.

Off the top of my head, something like this might work:

def compare_dictionaries(dict1, dict2):
     if dict1 is None or dict2 is None:
        print('Nones')
        return False

     if (not isinstance(dict1, dict)) or (not isinstance(dict2, dict)):
        print('Not dict')
        return False

     shared_keys = set(dict1.keys()) & set(dict2.keys())

     if not ( len(shared_keys) == len(dict1.keys()) and len(shared_keys) == len(dict2.keys())):
        print('Not all keys are shared')
        return False


     dicts_are_equal = True
     for key in dict1.keys():
         if isinstance(dict1[key], dict) or isinstance(dict2[key], dict):
             dicts_are_equal = dicts_are_equal and compare_dictionaries(dict1[key], dict2[key])
         else:
             dicts_are_equal = dicts_are_equal and all(atleast_1d(dict1[key] == dict2[key]))

     return dicts_are_equal

回答 8

直到OP的最后一个注释,还有另一种可能性是比较以JSON格式转储的字典的哈希值(SHAMD)。构造散列的方式可确保如果它们相等,则源字符串也将相等。这是非常快的,并且在数学上是合理的。

import json
import hashlib

def hash_dict(d):
    return hashlib.sha1(json.dumps(d, sort_keys=True)).hexdigest()

x = dict(a=1, b=2)
y = dict(a=2, b=2)
z = dict(a=1, b=2)

print(hash_dict(x) == hash_dict(y))
print(hash_dict(x) == hash_dict(z))

Yet another possibility, up to the last note of the OP, is to compare the hashes (SHA or MD) of the dicts dumped as JSON. The way hashes are constructed guarantee that if they are equal, the source strings are equal as well. This is very fast and mathematically sound.

import json
import hashlib

def hash_dict(d):
    return hashlib.sha1(json.dumps(d, sort_keys=True)).hexdigest()

x = dict(a=1, b=2)
y = dict(a=2, b=2)
z = dict(a=1, b=2)

print(hash_dict(x) == hash_dict(y))
print(hash_dict(x) == hash_dict(z))

回答 9

IMO的功能很好,清晰直观。但是,为了给您(另一个)答案,这是我的努力:

def compare_dict(dict1, dict2):
    for x1 in dict1.keys():
        z = dict1.get(x1) == dict2.get(x1)
        if not z:
            print('key', x1)
            print('value A', dict1.get(x1), '\nvalue B', dict2.get(x1))
            print('-----\n')

对您或其他任何人都可能有用。

编辑:

我已经创建了上面的一个递归版本。.在其他答案中还没有看到

def compare_dict(a, b):
    # Compared two dictionaries..
    # Posts things that are not equal..
    res_compare = []
    for k in set(list(a.keys()) + list(b.keys())):
        if isinstance(a[k], dict):
            z0 = compare_dict(a[k], b[k])
        else:
            z0 = a[k] == b[k]

        z0_bool = np.all(z0)
        res_compare.append(z0_bool)
        if not z0_bool:
            print(k, a[k], b[k])
    return np.all(res_compare)

The function is fine IMO, clear and intuitive. But just to give you (another) answer, here is my go:

def compare_dict(dict1, dict2):
    for x1 in dict1.keys():
        z = dict1.get(x1) == dict2.get(x1)
        if not z:
            print('key', x1)
            print('value A', dict1.get(x1), '\nvalue B', dict2.get(x1))
            print('-----\n')

Can be useful for you or for anyone else..

EDIT:

I have created a recursive version of the one above.. Have not seen that in the other answers

def compare_dict(a, b):
    # Compared two dictionaries..
    # Posts things that are not equal..
    res_compare = []
    for k in set(list(a.keys()) + list(b.keys())):
        if isinstance(a[k], dict):
            z0 = compare_dict(a[k], b[k])
        else:
            z0 = a[k] == b[k]

        z0_bool = np.all(z0)
        res_compare.append(z0_bool)
        if not z0_bool:
            print(k, a[k], b[k])
    return np.all(res_compare)

回答 10

要测试两个字典的键和值是否相等:

def dicts_equal(d1,d2):
    """ return True if all keys and values are the same """
    return all(k in d2 and d1[k] == d2[k]
               for k in d1) \
        and all(k in d1 and d1[k] == d2[k]
               for k in d2)

如果要返回不同的值,请以不同的方式编写:

def dict1_minus_d2(d1, d2):
    """ return the subset of d1 where the keys don't exist in d2 or
        the values in d2 are different, as a dict """
    return {k,v for k,v in d1.items() if k in d2 and v == d2[k]}

您将不得不调用两次,即

dict1_minus_d2(d1,d2).extend(dict1_minus_d2(d2,d1))

To test if two dicts are equal in keys and values:

def dicts_equal(d1,d2):
    """ return True if all keys and values are the same """
    return all(k in d2 and d1[k] == d2[k]
               for k in d1) \
        and all(k in d1 and d1[k] == d2[k]
               for k in d2)

If you want to return the values which differ, write it differently:

def dict1_minus_d2(d1, d2):
    """ return the subset of d1 where the keys don't exist in d2 or
        the values in d2 are different, as a dict """
    return {k,v for k,v in d1.items() if k in d2 and v == d2[k]}

You would have to call it twice i.e

dict1_minus_d2(d1,d2).extend(dict1_minus_d2(d2,d1))

回答 11

def equal(a, b):
    type_a = type(a)
    type_b = type(b)
    
    if type_a != type_b:
        return False
    
    if isinstance(a, dict):
        if len(a) != len(b):
            return False
        for key in a:
            if key not in b:
                return False
            if not equal(a[key], b[key]):
                return False
        return True

    elif isinstance(a, list):
        if len(a) != len(b):
            return False
        while len(a):
            x = a.pop()
            index = indexof(x, b)
            if index == -1:
                return False
            del b[index]
        return True
        
    else:
        return a == b

def indexof(x, a):
    for i in range(len(a)):
        if equal(x, a[i]):
            return i
    return -1

测试

>>> a = {
    'number': 1,
    'list': ['one', 'two']
}
>>> b = {
    'list': ['two', 'one'],
    'number': 1
}
>>> equal(a, b)
True

Code

def equal(a, b):
    type_a = type(a)
    type_b = type(b)
    
    if type_a != type_b:
        return False
    
    if isinstance(a, dict):
        if len(a) != len(b):
            return False
        for key in a:
            if key not in b:
                return False
            if not equal(a[key], b[key]):
                return False
        return True

    elif isinstance(a, list):
        if len(a) != len(b):
            return False
        while len(a):
            x = a.pop()
            index = indexof(x, b)
            if index == -1:
                return False
            del b[index]
        return True
        
    else:
        return a == b

def indexof(x, a):
    for i in range(len(a)):
        if equal(x, a[i]):
            return i
    return -1

Test

>>> a = {
    'number': 1,
    'list': ['one', 'two']
}
>>> b = {
    'list': ['two', 'one'],
    'number': 1
}
>>> equal(a, b)
True

回答 12

现在,与==进行简单比较就足够了(python 3.8)。即使您以不同的顺序比较相同的字典(最后一个示例)。最好的事情是,您不需要第三方程序包即可完成此操作。

a = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}
b = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}

c = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}
d = {'one': 'dog', 'two': 'cat', 'three': 'mouse', 'four': 'fish'}

e = {'one': 'cat', 'two': 'dog', 'three': 'mouse'}
f = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}

g = {'two': 'cat', 'one': 'dog', 'three': 'mouse'}
h = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}


print(a == b) # True
print(c == d) # False
print(e == f) # False
print(g == h) # True

A simple compare with == should be enough nowadays (python 3.8). Even when you compare the same dicts in a different order (last example). The best thing is, you don’t need a third-party package to accomplish this.

a = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}
b = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}

c = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}
d = {'one': 'dog', 'two': 'cat', 'three': 'mouse', 'four': 'fish'}

e = {'one': 'cat', 'two': 'dog', 'three': 'mouse'}
f = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}

g = {'two': 'cat', 'one': 'dog', 'three': 'mouse'}
h = {'one': 'dog', 'two': 'cat', 'three': 'mouse'}


print(a == b) # True
print(c == d) # False
print(e == f) # False
print(g == h) # True

回答 13

迟到总比没有好!

比较Not_Equal比比较Equal更有效。因此,如果在一个字典中没有找到任何键值,则两个字典不相等。下面的代码考虑到您可能会比较默认字典,因此使用get而不是getitem []。

在get调用中使用一种随机值作为默认值,该值等于要检索的键-以防万一dict在一个dict中具有None值,而在另一个dict中不存在该键。同样,在不使用条件之前检查get!=条件的效率,因为您正在同时检查双方的键和值。

def Dicts_Not_Equal(first,second):
    """ return True if both do not have same length or if any keys and values are not the same """
    if len(first) == len(second): 
        for k in first:
            if first.get(k) != second.get(k,k) or k not in second: return (True)
        for k in second:         
            if first.get(k,k) != second.get(k) or k not in first: return (True)
        return (False)   
    return (True)

Being late in my response is better than never!

Compare Not_Equal is more efficient than comparing Equal. As such two dicts are not equal if any key values in one dict is not found in the other dict. The code below takes into consideration that you maybe comparing default dict and thus uses get instead of getitem [].

Using a kind of random value as default in the get call equal to the key being retrieved – just in case the dicts has a None as value in one dict and that key does not exist in the other. Also the get != condition is checked before the not in condition for efficiency because you are doing the check on the keys and values from both sides at the same time.

def Dicts_Not_Equal(first,second):
    """ return True if both do not have same length or if any keys and values are not the same """
    if len(first) == len(second): 
        for k in first:
            if first.get(k) != second.get(k,k) or k not in second: return (True)
        for k in second:         
            if first.get(k,k) != second.get(k) or k not in first: return (True)
        return (False)   
    return (True)

回答 14

我正在使用此解决方案,该解决方案在Python 3中非常适合我


import logging
log = logging.getLogger(__name__)

...

    def deep_compare(self,left, right, level=0):
        if type(left) != type(right):
            log.info("Exit 1 - Different types")
            return False

        elif type(left) is dict:
            # Dict comparison
            for key in left:
                if key not in right:
                    log.info("Exit 2 - missing {} in right".format(key))
                    return False
                else:
                    if not deep_compare(left[str(key)], right[str(key)], level +1 ):
                        log.info("Exit 3 - different children")
                        return False
            return True
        elif type(left) is list:
            # List comparison
            for key in left:
                if key not in right:
                    log.info("Exit 4 - missing {} in right".format(key))
                    return False
                else:
                    if not deep_compare(left[left.index(key)], right[right.index(key)], level +1 ):
                        log.info("Exit 5 - different children")
                        return False
            return True
        else:
            # Other comparison
            return left == right

        return False

它比较dict,list和任何其他自己实现“ ==”运算符的类型。如果需要比较其他不同的内容,则需要在“ if tree”中添加一个新分支。

希望能有所帮助。

I am using this solution that works perfectly for me in Python 3


import logging
log = logging.getLogger(__name__)

...

    def deep_compare(self,left, right, level=0):
        if type(left) != type(right):
            log.info("Exit 1 - Different types")
            return False

        elif type(left) is dict:
            # Dict comparison
            for key in left:
                if key not in right:
                    log.info("Exit 2 - missing {} in right".format(key))
                    return False
                else:
                    if not deep_compare(left[str(key)], right[str(key)], level +1 ):
                        log.info("Exit 3 - different children")
                        return False
            return True
        elif type(left) is list:
            # List comparison
            for key in left:
                if key not in right:
                    log.info("Exit 4 - missing {} in right".format(key))
                    return False
                else:
                    if not deep_compare(left[left.index(key)], right[right.index(key)], level +1 ):
                        log.info("Exit 5 - different children")
                        return False
            return True
        else:
            # Other comparison
            return left == right

        return False

It compares dict, list and any other types that implements the “==” operator by themselves. If you need to compare something else different, you need to add a new branch in the “if tree”.

Hope that helps.


回答 15

对于python3:

data_set_a = dict_a.items()
data_set_b = dict_b.items()

difference_set = data_set_a ^ data_set_b

for python3:

data_set_a = dict_a.items()
data_set_b = dict_b.items()

difference_set = data_set_a ^ data_set_b

回答 16

>>> hash_1
{'a': 'foo', 'b': 'bar'}
>>> hash_2
{'a': 'foo', 'b': 'bar'}
>>> set_1 = set (hash_1.iteritems())
>>> set_1
set([('a', 'foo'), ('b', 'bar')])
>>> set_2 = set (hash_2.iteritems())
>>> set_2
set([('a', 'foo'), ('b', 'bar')])
>>> len (set_1.difference(set_2))
0
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...    print "The two hashes match."
...
The two hashes match.
>>> hash_2['c'] = 'baz'
>>> hash_2
{'a': 'foo', 'c': 'baz', 'b': 'bar'}
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...     print "The two hashes match."
...
>>>
>>> hash_2.pop('c')
'baz'

这是另一个选择:

>>> id(hash_1)
140640738806240
>>> id(hash_2)
140640738994848

因此,如您所见,这两个ID是不同的。但是丰富的比较运算符似乎可以解决问题:

>>> hash_1 == hash_2
True
>>>
>>> hash_2
{'a': 'foo', 'b': 'bar'}
>>> set_2 = set (hash_2.iteritems())
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...     print "The two hashes match."
...
The two hashes match.
>>>
>>> hash_1
{'a': 'foo', 'b': 'bar'}
>>> hash_2
{'a': 'foo', 'b': 'bar'}
>>> set_1 = set (hash_1.iteritems())
>>> set_1
set([('a', 'foo'), ('b', 'bar')])
>>> set_2 = set (hash_2.iteritems())
>>> set_2
set([('a', 'foo'), ('b', 'bar')])
>>> len (set_1.difference(set_2))
0
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...    print "The two hashes match."
...
The two hashes match.
>>> hash_2['c'] = 'baz'
>>> hash_2
{'a': 'foo', 'c': 'baz', 'b': 'bar'}
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...     print "The two hashes match."
...
>>>
>>> hash_2.pop('c')
'baz'

Here’s another option:

>>> id(hash_1)
140640738806240
>>> id(hash_2)
140640738994848

So as you see the two id’s are different. But the rich comparison operators seem to do the trick:

>>> hash_1 == hash_2
True
>>>
>>> hash_2
{'a': 'foo', 'b': 'bar'}
>>> set_2 = set (hash_2.iteritems())
>>> if (len(set_1.difference(set_2)) | len(set_2.difference(set_1))) == False:
...     print "The two hashes match."
...
The two hashes match.
>>>

回答 17

在PyUnit中,有一种方法可以很好地比较字典。我使用以下两个词典对其进行了测试,它确实可以满足您的需求。

d1 = {1: "value1",
      2: [{"subKey1":"subValue1",
           "subKey2":"subValue2"}]}
d2 = {1: "value1",
      2: [{"subKey2":"subValue2",
           "subKey1": "subValue1"}]
      }


def assertDictEqual(self, d1, d2, msg=None):
        self.assertIsInstance(d1, dict, 'First argument is not a dictionary')
        self.assertIsInstance(d2, dict, 'Second argument is not a dictionary')

        if d1 != d2:
            standardMsg = '%s != %s' % (safe_repr(d1, True), safe_repr(d2, True))
            diff = ('\n' + '\n'.join(difflib.ndiff(
                           pprint.pformat(d1).splitlines(),
                           pprint.pformat(d2).splitlines())))
            standardMsg = self._truncateMessage(standardMsg, diff)
            self.fail(self._formatMessage(msg, standardMsg))

我不建议导入unittest您的生产代码。我的想法是可以重新构建PyUnit中的源以在生产中运行。它使用pprint哪个“漂亮打印”的字典。似乎很容易使此代码适应“生产就绪”的要求。

In PyUnit there’s a method which compares dictionaries beautifully. I tested it using the following two dictionaries, and it does exactly what you’re looking for.

d1 = {1: "value1",
      2: [{"subKey1":"subValue1",
           "subKey2":"subValue2"}]}
d2 = {1: "value1",
      2: [{"subKey2":"subValue2",
           "subKey1": "subValue1"}]
      }


def assertDictEqual(self, d1, d2, msg=None):
        self.assertIsInstance(d1, dict, 'First argument is not a dictionary')
        self.assertIsInstance(d2, dict, 'Second argument is not a dictionary')

        if d1 != d2:
            standardMsg = '%s != %s' % (safe_repr(d1, True), safe_repr(d2, True))
            diff = ('\n' + '\n'.join(difflib.ndiff(
                           pprint.pformat(d1).splitlines(),
                           pprint.pformat(d2).splitlines())))
            standardMsg = self._truncateMessage(standardMsg, diff)
            self.fail(self._formatMessage(msg, standardMsg))

I’m not recommending importing unittest into your production code. My thought is the source in PyUnit could be re-tooled to run in production. It uses pprint which “pretty prints” the dictionaries. Seems pretty easy to adapt this code to be “production ready”.


回答 18

查看字典视图对象:https : //docs.python.org/2/library/stdtypes.html#dict

这样,您可以从dictView1中减去dictView2,它将返回一组在dictView2中不同的键/值对:

original = {'one':1,'two':2,'ACTION':'ADD'}
originalView=original.viewitems()
updatedDict = {'one':1,'two':2,'ACTION':'REPLACE'}
updatedDictView=updatedDict.viewitems()
delta=original | updatedDict
print delta
>>set([('ACTION', 'REPLACE')])

您可以相交,合并,差异(如上所示),对称差异这些字典视图对象。
更好?快点?-不确定,但是是标准库的一部分-这使其具有很大的可移植性

see dictionary view objects: https://docs.python.org/2/library/stdtypes.html#dict

This way you can subtract dictView2 from dictView1 and it will return a set of key/value pairs that are different in dictView2:

original = {'one':1,'two':2,'ACTION':'ADD'}
originalView=original.viewitems()
updatedDict = {'one':1,'two':2,'ACTION':'REPLACE'}
updatedDictView=updatedDict.viewitems()
delta=original | updatedDict
print delta
>>set([('ACTION', 'REPLACE')])

You can intersect, union, difference (shown above), symmetric difference these dictionary view objects.
Better? Faster? – not sure, but part of the standard library – which makes it a big plus for portability


回答 19

下面的代码将帮助您比较python中的字典列表

def compate_generic_types(object1, object2):
    if isinstance(object1, str) and isinstance(object2, str):
        return object1 == object2
    elif isinstance(object1, unicode) and isinstance(object2, unicode):
        return object1 == object2
    elif isinstance(object1, bool) and isinstance(object2, bool):
        return object1 == object2
    elif isinstance(object1, int) and isinstance(object2, int):
        return object1 == object2
    elif isinstance(object1, float) and isinstance(object2, float):
        return object1 == object2
    elif isinstance(object1, float) and isinstance(object2, int):
        return object1 == float(object2)
    elif isinstance(object1, int) and isinstance(object2, float):
        return float(object1) == object2

    return True

def deep_list_compare(object1, object2):
    retval = True
    count = len(object1)
    object1 = sorted(object1)
    object2 = sorted(object2)
    for x in range(count):
        if isinstance(object1[x], dict) and isinstance(object2[x], dict):
            retval = deep_dict_compare(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False
        elif isinstance(object1[x], list) and isinstance(object2[x], list):
            retval = deep_list_compare(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False
        else:
            retval = compate_generic_types(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False

    return retval

def deep_dict_compare(object1, object2):
    retval = True

    if len(object1) != len(object2):
        return False

    for k in object1.iterkeys():
        obj1 = object1[k]
        obj2 = object2[k]
        if isinstance(obj1, list) and isinstance(obj2, list):
            retval = deep_list_compare(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False

        elif isinstance(obj1, dict) and isinstance(obj2, dict):
            retval = deep_dict_compare(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False
        else:
            retval = compate_generic_types(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False

    return retval

Below code will help you to compare list of dict in python

def compate_generic_types(object1, object2):
    if isinstance(object1, str) and isinstance(object2, str):
        return object1 == object2
    elif isinstance(object1, unicode) and isinstance(object2, unicode):
        return object1 == object2
    elif isinstance(object1, bool) and isinstance(object2, bool):
        return object1 == object2
    elif isinstance(object1, int) and isinstance(object2, int):
        return object1 == object2
    elif isinstance(object1, float) and isinstance(object2, float):
        return object1 == object2
    elif isinstance(object1, float) and isinstance(object2, int):
        return object1 == float(object2)
    elif isinstance(object1, int) and isinstance(object2, float):
        return float(object1) == object2

    return True

def deep_list_compare(object1, object2):
    retval = True
    count = len(object1)
    object1 = sorted(object1)
    object2 = sorted(object2)
    for x in range(count):
        if isinstance(object1[x], dict) and isinstance(object2[x], dict):
            retval = deep_dict_compare(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False
        elif isinstance(object1[x], list) and isinstance(object2[x], list):
            retval = deep_list_compare(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False
        else:
            retval = compate_generic_types(object1[x], object2[x])
            if retval is False:
                print "Unable to match [{0}] element in list".format(x)
                return False

    return retval

def deep_dict_compare(object1, object2):
    retval = True

    if len(object1) != len(object2):
        return False

    for k in object1.iterkeys():
        obj1 = object1[k]
        obj2 = object2[k]
        if isinstance(obj1, list) and isinstance(obj2, list):
            retval = deep_list_compare(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False

        elif isinstance(obj1, dict) and isinstance(obj2, dict):
            retval = deep_dict_compare(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False
        else:
            retval = compate_generic_types(obj1, obj2)
            if retval is False:
                print "Unable to match [{0}]".format(k)
                return False

    return retval

回答 20

>>> x = {'a':1,'b':2,'c':3}
>>> x
{'a': 1, 'b': 2, 'c': 3}

>>> y = {'a':2,'b':4,'c':3}
>>> y
{'a': 2, 'b': 4, 'c': 3}

METHOD 1:

>>> common_item = x.items()&y.items() #using union,x.item() 
>>> common_item
{('c', 3)}

METHOD 2:

 >>> for i in x.items():
        if i in y.items():
           print('true')
        else:
           print('false')


false
false
true
>>> x = {'a':1,'b':2,'c':3}
>>> x
{'a': 1, 'b': 2, 'c': 3}

>>> y = {'a':2,'b':4,'c':3}
>>> y
{'a': 2, 'b': 4, 'c': 3}

METHOD 1:

>>> common_item = x.items()&y.items() #using union,x.item() 
>>> common_item
{('c', 3)}

METHOD 2:

 >>> for i in x.items():
        if i in y.items():
           print('true')
        else:
           print('false')


false
false
true

回答 21

在Python 3.6中,可以通过以下方式完成:

if (len(dict_1)==len(dict_2): 
  for i in dict_1.items():
        ret=bool(i in dict_2.items())

如果dict_2中存在dict_1的所有项,则ret变量将为true

In Python 3.6, It can be done as:-

if (len(dict_1)==len(dict_2): 
  for i in dict_1.items():
        ret=bool(i in dict_2.items())

ret variable will be true if all the items of dict_1 in present in dict_2


回答 22

这是我的答案,使用递归大小方法:

def dict_equals(da, db):
    if not isinstance(da, dict) or not isinstance(db, dict):
        return False
    if len(da) != len(db):
        return False
    for da_key in da:
        if da_key not in db:
            return False
        if not isinstance(db[da_key], type(da[da_key])):
            return False
        if isinstance(da[da_key], dict):
            res = dict_equals(da[da_key], db[da_key])
            if res is False:
                return False
        elif da[da_key] != db[da_key]:
            return False
    return True

a = {1:{2:3, 'name': 'cc', "dd": {3:4, 21:"nm"}}}
b = {1:{2:3, 'name': 'cc', "dd": {3:4, 21:"nm"}}}
print dict_equals(a, b)

希望有帮助!

Here is my answer, use a recursize way:

def dict_equals(da, db):
    if not isinstance(da, dict) or not isinstance(db, dict):
        return False
    if len(da) != len(db):
        return False
    for da_key in da:
        if da_key not in db:
            return False
        if not isinstance(db[da_key], type(da[da_key])):
            return False
        if isinstance(da[da_key], dict):
            res = dict_equals(da[da_key], db[da_key])
            if res is False:
                return False
        elif da[da_key] != db[da_key]:
            return False
    return True

a = {1:{2:3, 'name': 'cc', "dd": {3:4, 21:"nm"}}}
b = {1:{2:3, 'name': 'cc', "dd": {3:4, 21:"nm"}}}
print dict_equals(a, b)

Hope that helps!


回答 23

为什么不仅仅遍历一个字典并检查另一个字典(假设两个字典具有相同的键)呢?

x = dict(a=1, b=2)
y = dict(a=2, b=2)

for key, val in x.items():
    if val == y[key]:
        print ('Ok', val, y[key])
    else:
        print ('Not', val, y[key])

输出:

Not 1 2
Ok 2 2

Why not just iterate through one dictionary and check the other in the process (assuming both dictionaries have the same keys)?

x = dict(a=1, b=2)
y = dict(a=2, b=2)

for key, val in x.items():
    if val == y[key]:
        print ('Ok', val, y[key])
    else:
        print ('Not', val, y[key])

Output:

Not 1 2
Ok 2 2

回答 24

import json

if json.dumps(dict1) == json.dumps(dict2):
    print("Equal")
import json

if json.dumps(dict1) == json.dumps(dict2):
    print("Equal")

映射python字典中的值

问题:映射python字典中的值

给定一个字典,{ k1: v1, k2: v2 ... }我想{ k1: f(v1), k2: f(v2) ... }提供一个函数f

有没有这样的内置功能?还是我必须做

dict([(k, f(v)) for (k, v) in my_dictionary.iteritems()])

理想情况下,我只会写

my_dictionary.map_values(f)

要么

my_dictionary.mutate_values_with(f)

也就是说,对原始词典进行了突变还是创建副本对我来说都没有关系。

Given a dictionary { k1: v1, k2: v2 ... } I want to get { k1: f(v1), k2: f(v2) ... } provided I pass a function f.

Is there any such built in function? Or do I have to do

dict([(k, f(v)) for (k, v) in my_dictionary.iteritems()])

Ideally I would just write

my_dictionary.map_values(f)

or

my_dictionary.mutate_values_with(f)

That is, it doesn’t matter to me if the original dictionary is mutated or a copy is created.


回答 0

没有这样的功能;最简单的方法是使用dict理解:

my_dictionary = {k: f(v) for k, v in my_dictionary.items()}

在python 2.7中,请使用.iteritems()方法而不是.items()节省内存。dict理解语法直到python 2.7才引入。

注意,列表上也没有这种方法。您将不得不使用列表推导或map()函数。

这样,您也可以使用该map()函数来处理字典:

my_dictionary = dict(map(lambda kv: (kv[0], f(kv[1])), my_dictionary.iteritems()))

但这确实不是那么可读。

There is no such function; the easiest way to do this is to use a dict comprehension:

my_dictionary = {k: f(v) for k, v in my_dictionary.items()}

In python 2.7, use the .iteritems() method instead of .items() to save memory. The dict comprehension syntax wasn’t introduced until python 2.7.

Note that there is no such method on lists either; you’d have to use a list comprehension or the map() function.

As such, you could use the map() function for processing your dict as well:

my_dictionary = dict(map(lambda kv: (kv[0], f(kv[1])), my_dictionary.iteritems()))

but that’s not that readable, really.


回答 1

这些工具非常适合这种简单但重复的逻辑。

http://toolz.readthedocs.org/en/latest/api.html#toolz.dicttoolz.valmap

使您正确地放在想要的位置。

import toolz
def f(x):
  return x+1

toolz.valmap(f, my_list)

These toolz are great for this kind of simple yet repetitive logic.

http://toolz.readthedocs.org/en/latest/api.html#toolz.dicttoolz.valmap

Gets you right where you want to be.

import toolz
def f(x):
  return x+1

toolz.valmap(f, my_list)

回答 2

您可以就地执行此操作,而不是创建一个新的字典,这对于大型词典(如果您不需要副本)可能更可取。

def mutate_dict(f,d):
    for k, v in d.iteritems():
        d[k] = f(v)

my_dictionary = {'a':1, 'b':2}
mutate_dict(lambda x: x+1, my_dictionary)

结果my_dictionary包含:

{'a': 2, 'b': 3}

You can do this in-place, rather than create a new dict, which may be preferable for large dictionaries (if you do not need a copy).

def mutate_dict(f,d):
    for k, v in d.iteritems():
        d[k] = f(v)

my_dictionary = {'a':1, 'b':2}
mutate_dict(lambda x: x+1, my_dictionary)

results in my_dictionary containing:

{'a': 2, 'b': 3}

回答 3

由于PEP-0469将iteritems()重命名为items(),而PEP-3113删除了Tuple参数解包,因此在Python 3.x中,您应该这样编写Martijn Pieters♦答案

my_dictionary = dict(map(lambda item: (item[0], f(item[1])), my_dictionary.items()))

Due to PEP-0469 which renamed iteritems() to items() and PEP-3113 which removed Tuple parameter unpacking, in Python 3.x you should write Martijn Pieters♦ answer like this:

my_dictionary = dict(map(lambda item: (item[0], f(item[1])), my_dictionary.items()))

回答 4

虽然我的原始答案没有指出要点(通过尝试使用defaultdict的工厂中的Accessing key解决方案来解决此问题),但我对其进行了重新设计以提出针对当前问题的实际解决方案。

这里是:

class walkableDict(dict):
  def walk(self, callback):
    try:
      for key in self:
        self[key] = callback(self[key])
    except TypeError:
      return False
    return True

用法:

>>> d = walkableDict({ k1: v1, k2: v2 ... })
>>> d.walk(f)

想法是将原始dict子类化以赋予其所需的功能:在所有值上“映射”一个功能。

加号的是,该字典可用于存储原始数据,就好像它是一个dict,同时根据请求通过回调转换任何数据。

当然,可以随意使用所需的名称来命名类和函数(此答案中选择的名称受PHP array_walk()函数的启发)。

注意:tryexcept块和return语句都不是功能必需的,它们可以进一步模仿PHP的行为array_walk

While my original answer missed the point (by trying to solve this problem with the solution to Accessing key in factory of defaultdict), I have reworked it to propose an actual solution to the present question.

Here it is:

class walkableDict(dict):
  def walk(self, callback):
    try:
      for key in self:
        self[key] = callback(self[key])
    except TypeError:
      return False
    return True

Usage:

>>> d = walkableDict({ k1: v1, k2: v2 ... })
>>> d.walk(f)

The idea is to subclass the original dict to give it the desired functionality: “mapping” a function over all the values.

The plus point is that this dictionary can be used to store the original data as if it was a dict, while transforming any data on request with a callback.

Of course, feel free to name the class and the function the way you want (the name chosen in this answer is inspired by PHP’s array_walk() function).

Note: Neither the tryexcept block nor the return statements are mandatory for the functionality, they are there to further mimic the behavior of the PHP’s array_walk.


回答 5

为了避免从lambda内部进行索引,例如:

rval = dict(map(lambda kv : (kv[0], ' '.join(kv[1])), rval.iteritems()))

您也可以这样做:

rval = dict(map(lambda(k,v) : (k, ' '.join(v)), rval.iteritems()))

To avoid doing indexing from inside lambda, like:

rval = dict(map(lambda kv : (kv[0], ' '.join(kv[1])), rval.iteritems()))

You can also do:

rval = dict(map(lambda(k,v) : (k, ' '.join(v)), rval.iteritems()))

回答 6

刚遇到这个用例。我实现了gens的answer,添加了一种递归方法来处理也是dict的值:

def mutate_dict_in_place(f, d):
    for k, v in d.iteritems():
        if isinstance(v, dict):
            mutate_dict_in_place(f, v)
        else:
            d[k] = f(v)

# Exemple handy usage
def utf8_everywhere(d):
    mutate_dict_in_place((
        lambda value:
            value.decode('utf-8')
            if isinstance(value, bytes)
            else value
        ),
        d
    )

my_dict = {'a': b'byte1', 'b': {'c': b'byte2', 'd': b'byte3'}}
utf8_everywhere(my_dict)
print(my_dict)

这在处理在Python 2中将字符串编码为字节的json或yaml文件时非常有用

Just came accross this use case. I implemented gens’s answer, adding a recursive approach for handling values that are also dicts:

def mutate_dict_in_place(f, d):
    for k, v in d.iteritems():
        if isinstance(v, dict):
            mutate_dict_in_place(f, v)
        else:
            d[k] = f(v)

# Exemple handy usage
def utf8_everywhere(d):
    mutate_dict_in_place((
        lambda value:
            value.decode('utf-8')
            if isinstance(value, bytes)
            else value
        ),
        d
    )

my_dict = {'a': b'byte1', 'b': {'c': b'byte2', 'd': b'byte3'}}
utf8_everywhere(my_dict)
print(my_dict)

This can be useful when dealing with json or yaml files that encode strings as bytes in Python 2