分类目录归档:知识问答

在Python中,什么时候使用字典,列表或集合?

问题:在Python中,什么时候使用字典,列表或集合?

我什么时候应该使用字典,列表或集合?

是否存在更适合每种数据类型的方案?

When should I use a dictionary, list or set?

Are there scenarios that are more suited for each data type?


回答 0

一个list保持秩序,dictset不要:当你关心的秩序,因此,您必须使用list(如果你的容器的选择仅限于这三种,当然;-)。

dict与每个键关联一个值,而listset仅包含值:很明显,非常不同的用例。

set要求项目是可哈希的,list不是:如果您有不可哈希的项目,则不能使用,set而必须使用list

set禁止重复,list不禁止:也是至关重要的区别。(可以在以下位置找到“多重集”,该多重集将重复项映射到不止一次存在的项目的不同计数中;如果出于某些奇怪的原因而无法导入,则collections.Counter可以将其构建为,或者在2.7之前的版本中Python作为,使用项目作为键,并将相关值作为计数)。dictcollectionscollections.defaultdict(int)

set(或dict键中)中检查值的隶属关系非常快(花费一个恒定的短时间),而在列表中,它花费的时间与列表的长度成正比(在一般情况下和最坏情况下)。因此,如果您有可散列的项目,则不关心订单或重复项,而希望快速进行成员资格检查set比更好list

A list keeps order, dict and set don’t: when you care about order, therefore, you must use list (if your choice of containers is limited to these three, of course;-).

dict associates with each key a value, while list and set just contain values: very different use cases, obviously.

set requires items to be hashable, list doesn’t: if you have non-hashable items, therefore, you cannot use set and must instead use list.

set forbids duplicates, list does not: also a crucial distinction. (A “multiset”, which maps duplicates into a different count for items present more than once, can be found in collections.Counter — you could build one as a dict, if for some weird reason you couldn’t import collections, or, in pre-2.7 Python as a collections.defaultdict(int), using the items as keys and the associated value as the count).

Checking for membership of a value in a set (or dict, for keys) is blazingly fast (taking about a constant, short time), while in a list it takes time proportional to the list’s length in the average and worst cases. So, if you have hashable items, don’t care either way about order or duplicates, and want speedy membership checking, set is better than list.


回答 1

  • 您是否只需要订购的物品序列?取得清单。
  • 你只需要知道你是否已经一个特定的值,但不排序(你不需要存储复本)?使用一套。
  • 您是否需要将值与键相关联,以便稍后可以有效地(通过键)查找它们?使用字典。
  • Do you just need an ordered sequence of items? Go for a list.
  • Do you just need to know whether or not you’ve already got a particular value, but without ordering (and you don’t need to store duplicates)? Use a set.
  • Do you need to associate values with keys, so you can look them up efficiently (by key) later on? Use a dictionary.

回答 2

如果您想要无序的唯一元素集合,请使用set。(例如,当您要在文档中使用所有单词的集合时)。

当您想要收集元素的不可变的有序列表时,请使用tuple。(例如,当您希望将(名称,phone_number)对用作集合中的元素时,您将需要一个元组而不是一个列表,因为集合要求元素是不可变的。

当您想收集元素的可变的有序列表时,请使用list。(例如,当您要将新的电话号码追加到列表中时:[number1,number2,…])。

当您想要从键到值的映射时,请使用dict。(例如,当您需要将姓名映射到电话号码的电话簿时:){'John Smith' : '555-1212'}。请注意,字典中的键是无序的。(如果您遍历字典(电话簿),则按键(名称)可能以任何顺序显示)。

When you want an unordered collection of unique elements, use a set. (For example, when you want the set of all the words used in a document).

When you want to collect an immutable ordered list of elements, use a tuple. (For example, when you want a (name, phone_number) pair that you wish to use as an element in a set, you would need a tuple rather than a list since sets require elements be immutable).

When you want to collect a mutable ordered list of elements, use a list. (For example, when you want to append new phone numbers to a list: [number1, number2, …]).

When you want a mapping from keys to values, use a dict. (For example, when you want a telephone book which maps names to phone numbers: {'John Smith' : '555-1212'}). Note the keys in a dict are unordered. (If you iterate through a dict (telephone book), the keys (names) may show up in any order).


回答 3

  • 当您有一组映射到值的唯一键时,请使用字典。

  • 如果您有项目的有序集合,请使用列表。

  • 使用一组存储一组无序的项目。

  • Use a dictionary when you have a set of unique keys that map to values.

  • Use a list if you have an ordered collection of items.

  • Use a set to store an unordered set of items.


回答 4

简而言之,使用:

list -如果您需要订购的物品序列。

dict -如果您需要将值与键相关联

set -如果您需要保留唯一元素。

详细说明

清单

列表是可变序列,通常用于存储同类项目的集合。

列表实现了所有常见的序列操作:

  • x in lx not in l
  • l[i]l[i:j]l[i:j:k]
  • len(l)min(l)max(l)
  • l.count(x)
  • l.index(x[, i[, j]])-的第一出现的索引xl(在或之后i和之前j的indeces)

列表还实现了所有可变序列操作:

  • l[i] = x-项目il被替换x
  • l[i:j] = tlito的切片j被iterable的内容替换t
  • del l[i:j] – 如同 l[i:j] = []
  • l[i:j:k] = t-的元素l[i:j:k]已替换为t
  • del l[i:j:k]s[i:j:k]从列表中删除的元素
  • l.append(x)-追加x到序列的末尾
  • l.clear()-从中删除所有项目l(与del相同l[:]
  • l.copy()-创建的浅表副本l(与相同l[:]
  • l.extend(t)l += t-扩展l以下内容t
  • l *= n-更新l其内容重复n
  • l.insert(i, x)-插入xl由下式给出的指数在i
  • l.pop([i])-在处检索项目,i并将其从中删除l
  • l.remove(x)-从等于x的l位置删除第一项l[i]
  • l.reverse()-反转l到位的项目

利用方法append和可以将列表用作堆栈pop

字典

字典将可散列的值映射到任意对象。字典是可变对象。字典的主要操作是使用一些键存储值并提取给定键的值。

在字典中,不能将不可哈希的值(即包含列表,字典或其他可变类型的值)用作键。

集合是不同的可哈希对象的无序集合。集合通常用于进行成员资格测试,从序列中删除重复项以及计算数学运算(例如交集,并集,差和对称差)。

In short, use:

list – if you require an ordered sequence of items.

dict – if you require to relate values with keys

set – if you require to keep unique elements.

Detailed Explanation

List

A list is a mutable sequence, typically used to store collections of homogeneous items.

A list implements all of the common sequence operations:

  • x in l and x not in l
  • l[i], l[i:j], l[i:j:k]
  • len(l), min(l), max(l)
  • l.count(x)
  • l.index(x[, i[, j]]) – index of the 1st occurrence of x in l (at or after i and before j indeces)

A list also implements all of the mutable sequence operations:

  • l[i] = x – item i of l is replaced by x
  • l[i:j] = t – slice of l from i to j is replaced by the contents of the iterable t
  • del l[i:j] – same as l[i:j] = []
  • l[i:j:k] = t – the elements of l[i:j:k] are replaced by those of t
  • del l[i:j:k] – removes the elements of s[i:j:k] from the list
  • l.append(x) – appends x to the end of the sequence
  • l.clear() – removes all items from l (same as del l[:])
  • l.copy() – creates a shallow copy of l (same as l[:])
  • l.extend(t) or l += t – extends l with the contents of t
  • l *= n – updates l with its contents repeated n times
  • l.insert(i, x) – inserts x into l at the index given by i
  • l.pop([i]) – retrieves the item at i and also removes it from l
  • l.remove(x) – remove the first item from l where l[i] is equal to x
  • l.reverse() – reverses the items of l in place

A list could be used as stack by taking advantage of the methods append and pop.

Dictionary

A dictionary maps hashable values to arbitrary objects. A dictionary is a mutable object. The main operations on a dictionary are storing a value with some key and extracting the value given the key.

In a dictionary, you cannot use as keys values that are not hashable, that is, values containing lists, dictionaries or other mutable types.

Set

A set is an unordered collection of distinct hashable objects. A set is commonly used to include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.


回答 5

尽管这并不涵盖sets,但这是对dicts和lists 的很好解释:

列表看起来就是-值列表。它们中的每一个都从零开始编号-第一个从零开始编号,第二个为1,第三个为2,依此类推。您可以从列表中删除值,并在末尾添加新值。例如:您的许多猫的名字。

字典类似于其名称所暗示的内容-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值未编号-类似于其名称所建议的名称-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。字典中的值没有编号-它们也没有任何特定的顺序-键执行相同的操作。您可以添加,删除和修改字典中的值。例如:电话簿。

http://www.sthurlow.com/python/lesson06/

Although this doesn’t cover sets, it is a good explanation of dicts and lists:

Lists are what they seem – a list of values. Each one of them is numbered, starting from zero – the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats’ names.

Dictionaries are similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – tare similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. The values in a dictionary aren’t numbered – they aren’t in any specific order, either – the key does the same thing. You can add, remove, and modify the values in dictionaries. Example: telephone book.

http://www.sthurlow.com/python/lesson06/


回答 6

对于C ++,我始终牢记以下流程图:在哪种情况下,我使用特定的STL容器?,所以我很好奇Python3是否也有类似的东西,但是我没有运气。

对于Python,需要记住的是:没有像C ++一样的Python标准。因此,不同的Python解释器(例如CPython,PyPy)可能会有巨大的差异。以下流程图适用于CPython。

另外,我发现包含以下数据结构到图中,没有什么好办法:bytesbyte arraystuplesnamed_tuplesChainMapCounter,和arrays

  • OrderedDict并且deque可以通过collections模块获得。
  • heapq可从heapq模块中获得
  • LifoQueueQueuePriorityQueue可以通过queue专门用于并发(线程)访问的模块获得。(也有一个multiprocessing.Queue可用的,但我不知道与它之间的区别,queue.Queue但是假设需要从进程进行并发访问时应该使用它。)
  • dictsetfrozen_set,和list被内置当然

对于任何人,如果您可以改善此答案并在各个方面提供更好的图表,我将不胜感激。随时欢迎。 流程图

PS:该图已通过yed制作。graphml文件在这里

For C++ I was always having this flow chart in mind: In which scenario do I use a particular STL container?, so I was curious if something similar is available for Python3 as well, but I had no luck.

What you need to keep in mind for Python is: There is no single Python standard as for C++. Hence there might be huge differences for different Python interpreters (e.g. CPython, PyPy). The following flow chart is for CPython.

Additionally I found no good way to incorporate the following data structures into the diagram: bytes, byte arrays, tuples, named_tuples, ChainMap, Counter, and arrays.

  • OrderedDict and deque are available via collections module.
  • heapq is available from the heapq module
  • LifoQueue, Queue, and PriorityQueue are available via the queue module which is designed for concurrent (threads) access. (There is also a multiprocessing.Queue available but I don’t know the differences to queue.Queue but would assume that it should be used when concurrent access from processes is needed.)
  • dict, set, frozen_set, and list are builtin of course

For anyone I would be grateful if you could improve this answer and provide a better diagram in every aspect. Feel free and welcome. flowchart

PS: the diagram has been made with yed. The graphml file is here


回答 7

结合列表字典集合,还有另一个有趣的python对象OrderedDicts

顺序词典与常规词典一样,但是它们记住项目插入的顺序。在有序字典上进行迭代时,将按照项的键首次添加的顺序返回项。

当您需要保留键的顺序(例如处理文档)时,OrderedDicts可能会很有用:通常需要文档中所有术语的向量表示。因此,使用OrderedDicts,您可以有效地验证术语是否已被阅读过,添加术语,提取术语,以及在所有操作之后可以提取它们的有序矢量表示。

In combination with lists, dicts and sets, there are also another interesting python objects, OrderedDicts.

Ordered dictionaries are just like regular dictionaries but they remember the order that items were inserted. When iterating over an ordered dictionary, the items are returned in the order their keys were first added.

OrderedDicts could be useful when you need to preserve the order of the keys, for example working with documents: It’s common to need the vector representation of all terms in a document. So using OrderedDicts you can efficiently verify if a term has been read before, add terms, extract terms, and after all the manipulations you can extract the ordered vector representation of them.


回答 8

列表就是它们的外观-值列表。它们中的每一个都从零开始编号-第一个从零开始编号,第二个为1,第三个为2,依此类推。您可以从列表中删除值,并在末尾添加新值。例如:您的许多猫的名字。

元组就像列表一样,但是您不能更改它们的值。首先给出的值是程序其余部分所保持的值。同样,每个值都从零开始编号,以方便参考。示例:一年中的月份名称。

字典类似于其名称所暗示的内容-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值未编号-类似于其名称所建议的名称-字典。在字典中,您有单词的“索引”,并且每个单词都有一个定义。在python中,单词称为“键”,而定义称为“值”。字典中的值没有编号-它们也没有任何特定的顺序-键执行相同的操作。您可以添加,删除和修改字典中的值。例如:电话簿。

Lists are what they seem – a list of values. Each one of them is numbered, starting from zero – the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats’ names.

Tuples are just like lists, but you can’t change their values. The values that you give it first up, are the values that you are stuck with for the rest of the program. Again, each value is numbered starting from zero, for easy reference. Example: the names of the months of the year.

Dictionaries are similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – tare similar to what their name suggests – a dictionary. In a dictionary, you have an ‘index’ of words, and for each of them a definition. In python, the word is called a ‘key’, and the definition a ‘value’. The values in a dictionary aren’t numbered – they aren’t in any specific order, either – the key does the same thing. You can add, remove, and modify the values in dictionaries. Example: telephone book.


回答 9

在使用它们时,我会详尽列出它们的方法,以供您参考:

class ContainerMethods:
    def __init__(self):
        self.list_methods_11 = {
                    'Add':{'append','extend','insert'},
                    'Subtract':{'pop','remove'},
                    'Sort':{'reverse', 'sort'},
                    'Search':{'count', 'index'},
                    'Entire':{'clear','copy'},
                            }
        self.tuple_methods_2 = {'Search':'count','index'}

        self.dict_methods_11 = {
                    'Views':{'keys', 'values', 'items'},
                    'Add':{'update'},
                    'Subtract':{'pop', 'popitem',},
                    'Extract':{'get','setdefault',},
                    'Entire':{ 'clear', 'copy','fromkeys'},
                            }
        self.set_methods_17 ={
                    'Add':{['add', 'update'],['difference_update','symmetric_difference_update','intersection_update']},
                    'Subtract':{'pop', 'remove','discard'},
                    'Relation':{'isdisjoint', 'issubset', 'issuperset'},
                    'operation':{'union' 'intersection','difference', 'symmetric_difference'}
                    'Entire':{'clear', 'copy'}}

When use them, I make an exhaustive cheatsheet of their methods for your reference:

class ContainerMethods:
    def __init__(self):
        self.list_methods_11 = {
                    'Add':{'append','extend','insert'},
                    'Subtract':{'pop','remove'},
                    'Sort':{'reverse', 'sort'},
                    'Search':{'count', 'index'},
                    'Entire':{'clear','copy'},
                            }
        self.tuple_methods_2 = {'Search':'count','index'}

        self.dict_methods_11 = {
                    'Views':{'keys', 'values', 'items'},
                    'Add':{'update'},
                    'Subtract':{'pop', 'popitem',},
                    'Extract':{'get','setdefault',},
                    'Entire':{ 'clear', 'copy','fromkeys'},
                            }
        self.set_methods_17 ={
                    'Add':{['add', 'update'],['difference_update','symmetric_difference_update','intersection_update']},
                    'Subtract':{'pop', 'remove','discard'},
                    'Relation':{'isdisjoint', 'issubset', 'issuperset'},
                    'operation':{'union' 'intersection','difference', 'symmetric_difference'}
                    'Entire':{'clear', 'copy'}}

回答 10

字典:Python字典的用法类似于哈希表,其键为索引,对象为值。

列表:列表用于将对象保存在数组中,该对象由该对象在数组中的位置索引。

集合:集合是具有函数的集合,这些函数可以判断集合中是否存在对象。

Dictionary: A python dictionary is used like a hash table with key as index and object as value.

List: A list is used for holding objects in an array indexed by position of that object in the array.

Set: A set is a collection with functions that can tell if an object is present or not present in the set.


Python的内置词典如何实现?

问题:Python的内置词典如何实现?

有谁知道python内置字典类型是如何实现的?我的理解是,这是某种哈希表,但我无法找到任何确定的答案。

Does anyone know how the built in dictionary type for python is implemented? My understanding is that it is some sort of hash table, but I haven’t been able to find any sort of definitive answer.


回答 0

这是我能够汇总的有关Python字典的所有内容(可能比任何人都想知道的要多;但是答案很全面)。

  • Python字典实现为哈希表
  • 哈希表必须允许哈希冲突,即,即使两个不同的键具有相同的哈希值,表的实现也必须具有明确插入和检索键和值对的策略。
  • Python dict使用开放式寻址来解决哈希冲突(如下所述)(请参阅dictobject.c:296-297)。
  • Python哈希表只是一个连续的内存块(有点像一个数组,因此您可以O(1)按索引进行查找)。
  • 表中的每个插槽只能存储一个条目。这个很重要。
  • 中的每个条目实际上是三个值的组合:<hash,key,value>。这被实现为C结构(请参见dictobject.h:51-56)。
  • 下图是Python哈希表的逻辑表示。在下图中,0, 1, ..., i, ...左侧是哈希表中插槽的索引(它们仅用于说明目的,与表显然没有一起存储!)。

    # Logical model of Python Hash table
    -+-----------------+
    0| <hash|key|value>|
    -+-----------------+
    1|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    i|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    n|      ...        |
    -+-----------------+
    
  • 初始化新字典时,它从8 个插槽开始。(见dictobject.h:49

  • 在向表中添加条目时,我们从某个槽开始i,该槽基于键的哈希值。CPython最初使用i = hash(key) & mask(where mask = PyDictMINSIZE - 1,但这并不重要)。请注意,i选中的初始插槽取决于密钥的哈希值
  • 如果该插槽为空,则将条目添加到该插槽(通过输入,我是说<hash|key|value>)。但是,如果那个插槽被占用!?最可能是因为另一个条目具有相同的哈希(哈希冲突!)
  • 如果该插槽被占用,则CPython(甚至PyPy)将插槽中条目的哈希值与键(即==比较而不是is比较)与要插入的当前条目的哈希值和键(dictobject.c)进行比较。 :337,344-345)。如果两者都匹配,则认为该条目已存在,放弃并继续下一个要插入的条目。如果哈希或密钥不匹配,则开始探测
  • 探测仅表示它按插槽搜索插槽以找到一个空插槽。从技术上讲,我们可以一个接一个地i+1, i+2, ...使用,然后使用第一个可用的(线性探测)。但是由于注释中详细解释的原因(请参阅dictobject.c:33-126),CPython使用了随机探测。在随机探测中,以伪随机顺序选择下一个时隙。该条目将添加到第一个空插槽。对于此讨论,用于选择下一个时隙的实际算法并不是很重要(有关探测的算法,请参见dictobject.c:33-126)。重要的是对插槽进行探测,直到找到第一个空插槽为止。
  • 查找也会发生相同的情况,只是从初始插槽i(其中i取决于键的哈希值)开始。如果哈希和密钥都与插槽中的条目不匹配,它将开始探测,直到找到具有匹配项的插槽。如果所有插槽均已耗尽,则报告失败。
  • 顺便说一句,dict如果三分之二满了,它将被调整大小。这样可以避免减慢查找速度。(见dictobject.h:64-65

注意:我对Python Dict的实现进行了研究,以回答我自己的问题:字典中的多个条目如何具有相同的哈希值。我在此处发布了对此回复的略作修改的版本,因为所有的研究也都与此问题相关。

Here is everything about Python dicts that I was able to put together (probably more than anyone would like to know; but the answer is comprehensive).

  • Python dictionaries are implemented as hash tables.

  • Hash tables must allow for hash collisions i.e. even if two distinct keys have the same hash value, the table’s implementation must have a strategy to insert and retrieve the key and value pairs unambiguously.

  • Python dict uses open addressing to resolve hash collisions (explained below) (see dictobject.c:296-297).

  • Python hash table is just a contiguous block of memory (sort of like an array, so you can do an O(1) lookup by index).

  • Each slot in the table can store one and only one entry. This is important.

  • Each entry in the table is actually a combination of the three values: < hash, key, value >. This is implemented as a C struct (see dictobject.h:51-56).

  • The figure below is a logical representation of a Python hash table. In the figure below, 0, 1, ..., i, ... on the left are indices of the slots in the hash table (they are just for illustrative purposes and are not stored along with the table obviously!).

      # Logical model of Python Hash table
      -+-----------------+
      0| <hash|key|value>|
      -+-----------------+
      1|      ...        |
      -+-----------------+
      .|      ...        |
      -+-----------------+
      i|      ...        |
      -+-----------------+
      .|      ...        |
      -+-----------------+
      n|      ...        |
      -+-----------------+
    
  • When a new dict is initialized it starts with 8 slots. (see dictobject.h:49)

  • When adding entries to the table, we start with some slot, i, that is based on the hash of the key. CPython initially uses i = hash(key) & mask (where mask = PyDictMINSIZE - 1, but that’s not really important). Just note that the initial slot, i, that is checked depends on the hash of the key.

  • If that slot is empty, the entry is added to the slot (by entry, I mean, <hash|key|value>). But what if that slot is occupied!? Most likely because another entry has the same hash (hash collision!)

  • If the slot is occupied, CPython (and even PyPy) compares the hash AND the key (by compare I mean == comparison not the is comparison) of the entry in the slot against the hash and key of the current entry to be inserted (dictobject.c:337,344-345) respectively. If both match, then it thinks the entry already exists, gives up and moves on to the next entry to be inserted. If either hash or the key don’t match, it starts probing.

  • Probing just means it searches the slots by slot to find an empty slot. Technically we could just go one by one, i+1, i+2, ... and use the first available one (that’s linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In random probing, the next slot is picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot is not really important (see dictobject.c:33-126 for the algorithm for probing). What is important is that the slots are probed until first empty slot is found.

  • The same thing happens for lookups, just starts with the initial slot i (where i depends on the hash of the key). If the hash and the key both don’t match the entry in the slot, it starts probing, until it finds a slot with a match. If all slots are exhausted, it reports a fail.

  • BTW, the dict will be resized if it is two-thirds full. This avoids slowing down lookups. (see dictobject.h:64-65)

NOTE: I did the research on Python Dict implementation in response to my own question about how multiple entries in a dict can have same hash values. I posted a slightly edited version of the response here because all the research is very relevant for this question as well.


回答 1

Python的内置词典如何实现?

这是短期类:

  • 它们是哈希表。(有关Python实现的详细信息,请参见下文。)
  • 从Python 3.6开始,新的布局和算法使它们
    • 通过插入密钥排序,以及
    • 占用更少的空间,
    • 几乎不牺牲性能。
  • 当字典共享密钥时(在特殊情况下),另一种优化方法可以节省空间。

从Python 3.6开始,有序方面是非官方的(给其他实现一个跟上的机会),但在Python 3.7中却是正式的

Python的字典是哈希表

长期以来,它完全像这样工作。Python将预分配8个空行,并使用哈希值确定键值对的粘贴位置。例如,如果密钥的哈希值以001结尾,则它将其保留在1(即2nd)索引中(如以下示例所示)。

   <hash>       <key>    <value>
     null        null    null
...010001    ffeb678c    633241c4 # addresses of the keys and values
     null        null    null
      ...         ...    ...

在64位体系结构上,每一行占用24个字节,在32位体系结构上占用12个字节。(请注意,列标题只是出于我们的目的而使用的标签-它们实际上并不存在于内存中。)

如果散列的结尾与预先存在的键的散列相同,则为冲突,然后它将键值对保留在不同的位置。

存储5个键值之后,添加另一个键值对时,哈希冲突的可能性太大,因此字典的大小增加了一倍。在64位处理中,在调整大小之前,我们有72个字节为空,而在此之后,由于有10个空行,我们浪费了240个字节。

这需要很多空间,但是查找时间是相当恒定的。密钥比较算法是计算哈希值,转到预期位置,比较密钥的ID-如果它们是同一对象,则它们相等。如果没有,那么比较的哈希值,如果他们一样,他们是不相等的。否则,我们最终比较键是否相等,如果相等,则返回值。最终的相等比较可能会很慢,但是较早的检查通常会缩短最终的比较,从而使查找非常快。

冲突会减慢速度,并且从理论上讲,攻击者可以使用哈希冲突来执行拒绝服务攻击,因此我们对哈希函数的初始化进行了随机化处理,以便为每个新的Python进程计算不同的哈希。

上述浪费的空间使我们修改了字典的实现,具有令人兴奋的新功能,即现在可以通过插入来对字典进行排序。

新的紧凑型哈希表

相反,我们首先为插入索引预分配一个数组。

由于我们的第一个键值对位于第二个插槽中,因此我们这样进行索引:

[null, 0, null, null, null, null, null, null]

并且我们的表只是按插入顺序填充:

   <hash>       <key>    <value>
...010001    ffeb678c    633241c4 
      ...         ...    ...

因此,当我们查找键时,我们使用哈希检查我们期望的位置(在这种情况下,我们直接转到数组的索引1),然后转到哈希表中的该索引(例如索引0) ),检查键是否相等(使用前面所述的相同算法),如果相等,则返回值。

我们保留了恒定的查找时间,在某些情况下速度损失较小,而在另一些情况下速度有所增加,其好处是,与现有的实现相比,它可以节省大量空间,并且可以保留插入顺序。唯一浪费的空间是索引数组中的空字节。

Raymond Hettinger 于2012年12月在python-dev上引入了此功能。它最终在Python 3.6中进入了CPython 。通过插入排序被认为是3.6的实现细节,以使Python的其他实现有机会赶上。

共用金钥

节省空间的另一种优化方法是共享密钥的实现。因此,我们没有重复使用共享密钥和密钥散列的冗余字典,而不是拥有占据所有空间的冗余字典。您可以这样想:

     hash         key    dict_0    dict_1    dict_2...
...010001    ffeb678c    633241c4  fffad420  ...
      ...         ...    ...       ...       ...

对于64位计算机,每个额外的字典每个键最多可以节省16个字节。

自定义对象和替代项的共享密钥

这些共享密钥字典旨在用于自定义对象__dict__。为了获得这种行为,我相信您需要__dict__在实例化下一个对象之前完成填充(请参阅PEP 412)。这意味着您应该在__init__或中分配所有属性__new__,否则可能无法节省空间。

但是,如果您知道执行时的所有属性,则__init__还可以提供__slots__对象,并保证__dict__根本不会创建该对象(如果在父级中不可用),甚至允许__dict__但保证您可以预见的属性是仍然存储在插槽中。有关更多信息__slots__请在此处查看我的答案

也可以看看:

How are Python’s Built In Dictionaries Implemented?

Here’s the short course:

  • They are hash tables. (See below for the specifics of Python’s implementation.)
  • A new layout and algorithm, as of Python 3.6, makes them
    • ordered by key insertion, and
    • take up less space,
    • at virtually no cost in performance.
  • Another optimization saves space when dicts share keys (in special cases).

The ordered aspect is unofficial as of Python 3.6 (to give other implementations a chance to keep up), but official in Python 3.7.

Python’s Dictionaries are Hash Tables

For a long time, it worked exactly like this. Python would preallocate 8 empty rows and use the hash to determine where to stick the key-value pair. For example, if the hash for the key ended in 001, it would stick it in the 1 (i.e. 2nd) index (like the example below.)

   <hash>       <key>    <value>
     null        null    null
...010001    ffeb678c    633241c4 # addresses of the keys and values
     null        null    null
      ...         ...    ...

Each row takes up 24 bytes on a 64 bit architecture, 12 on a 32 bit. (Note that the column headers are just labels for our purposes here – they don’t actually exist in memory.)

If the hash ended the same as a preexisting key’s hash, this is a collision, and then it would stick the key-value pair in a different location.

After 5 key-values are stored, when adding another key-value pair, the probability of hash collisions is too large, so the dictionary is doubled in size. In a 64 bit process, before the resize, we have 72 bytes empty, and after, we are wasting 240 bytes due to the 10 empty rows.

This takes a lot of space, but the lookup time is fairly constant. The key comparison algorithm is to compute the hash, go to the expected location, compare the key’s id – if they’re the same object, they’re equal. If not then compare the hash values, if they are not the same, they’re not equal. Else, then we finally compare keys for equality, and if they are equal, return the value. The final comparison for equality can be quite slow, but the earlier checks usually shortcut the final comparison, making the lookups very quick.

Collisions slow things down, and an attacker could theoretically use hash collisions to perform a denial of service attack, so we randomized the initialization of the hash function such that it computes different hashes for each new Python process.

The wasted space described above has led us to modify the implementation of dictionaries, with an exciting new feature that dictionaries are now ordered by insertion.

The New Compact Hash Tables

We start, instead, by preallocating an array for the index of the insertion.

Since our first key-value pair goes in the second slot, we index like this:

[null, 0, null, null, null, null, null, null]

And our table just gets populated by insertion order:

   <hash>       <key>    <value>
...010001    ffeb678c    633241c4 
      ...         ...    ...

So when we do a lookup for a key, we use the hash to check the position we expect (in this case, we go straight to index 1 of the array), then go to that index in the hash-table (e.g. index 0), check that the keys are equal (using the same algorithm described earlier), and if so, return the value.

We retain constant lookup time, with minor speed losses in some cases and gains in others, with the upsides that we save quite a lot of space over the pre-existing implementation and we retain insertion order. The only space wasted are the null bytes in the index array.

Raymond Hettinger introduced this on python-dev in December of 2012. It finally got into CPython in Python 3.6. Ordering by insertion was considered an implementation detail for 3.6 to allow other implementations of Python a chance to catch up.

Shared Keys

Another optimization to save space is an implementation that shares keys. Thus, instead of having redundant dictionaries that take up all of that space, we have dictionaries that reuse the shared keys and keys’ hashes. You can think of it like this:

     hash         key    dict_0    dict_1    dict_2...
...010001    ffeb678c    633241c4  fffad420  ...
      ...         ...    ...       ...       ...

For a 64 bit machine, this could save up to 16 bytes per key per extra dictionary.

Shared Keys for Custom Objects & Alternatives

These shared-key dicts are intended to be used for custom objects’ __dict__. To get this behavior, I believe you need to finish populating your __dict__ before you instantiate your next object (see PEP 412). This means you should assign all your attributes in the __init__ or __new__, else you might not get your space savings.

However, if you know all of your attributes at the time your __init__ is executed, you could also provide __slots__ for your object, and guarantee that __dict__ is not created at all (if not available in parents), or even allow __dict__ but guarantee that your foreseen attributes are stored in slots anyways. For more on __slots__, see my answer here.

See also:


回答 2

Python字典使用Open寻址在Beautiful代码中参考

注意! 如Wikipedia所述,开放式寻址(又名封闭式哈希)不应与其相反的开放式哈希混淆

开放式寻址意味着dict使用数组插槽,并且当对象的主要位置位于dict中时,使用“扰动”方案在同一数组中的不同索引处查找对象的位置,其中对象的哈希值发挥了作用。

Python Dictionaries use Open addressing (reference inside Beautiful code)

NB! Open addressing, a.k.a closed hashing should, as noted in Wikipedia, not be confused with its opposite open hashing!

Open addressing means that the dict uses array slots, and when an object’s primary position is taken in the dict, the object’s spot is sought at a different index in the same array, using a “perturbation” scheme, where the object’s hash value plays part.


如何更正TypeError:散列之前必须对Unicode对象进行编码?

问题:如何更正TypeError:散列之前必须对Unicode对象进行编码?

我有这个错误:

Traceback (most recent call last):
  File "python_md5_cracker.py", line 27, in <module>
  m.update(line)
TypeError: Unicode-objects must be encoded before hashing

当我尝试在Python 3.2.2中执行以下代码时:

import hashlib, sys
m = hashlib.md5()
hash = ""
hash_file = input("What is the file name in which the hash resides?  ")
wordlist = input("What is your wordlist?  (Enter the file name)  ")
try:
  hashdocument = open(hash_file, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  hash = hashdocument.readline()
  hash = hash.replace("\n", "")

try:
  wordlistfile = open(wordlist, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  pass
for line in wordlistfile:
  # Flush the buffer (this caused a massive problem when placed 
  # at the beginning of the script, because the buffer kept getting
  # overwritten, thus comparing incorrect hashes)
  m = hashlib.md5()
  line = line.replace("\n", "")
  m.update(line)
  word_hash = m.hexdigest()
  if word_hash == hash:
    print("Collision! The word corresponding to the given hash is", line)
    input()
    sys.exit()

print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()

I have this error:

Traceback (most recent call last):
  File "python_md5_cracker.py", line 27, in <module>
  m.update(line)
TypeError: Unicode-objects must be encoded before hashing

when I try to execute this code in Python 3.2.2:

import hashlib, sys
m = hashlib.md5()
hash = ""
hash_file = input("What is the file name in which the hash resides?  ")
wordlist = input("What is your wordlist?  (Enter the file name)  ")
try:
  hashdocument = open(hash_file, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  hash = hashdocument.readline()
  hash = hash.replace("\n", "")

try:
  wordlistfile = open(wordlist, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  pass
for line in wordlistfile:
  # Flush the buffer (this caused a massive problem when placed 
  # at the beginning of the script, because the buffer kept getting
  # overwritten, thus comparing incorrect hashes)
  m = hashlib.md5()
  line = line.replace("\n", "")
  m.update(line)
  word_hash = m.hexdigest()
  if word_hash == hash:
    print("Collision! The word corresponding to the given hash is", line)
    input()
    sys.exit()

print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()

回答 0

它可能正在寻找来自的字符编码wordlistfile

wordlistfile = open(wordlist,"r",encoding='utf-8')

或者,如果您逐行工作:

line.encode('utf-8')

It is probably looking for a character encoding from wordlistfile.

wordlistfile = open(wordlist,"r",encoding='utf-8')

Or, if you’re working on a line-by-line basis:

line.encode('utf-8')

回答 1

您必须定义encoding formatlike utf-8,尝试这种简单的方法,

本示例使用SHA256算法生成一个随机数:

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'

You must have to define encoding format like utf-8, Try this easy way,

This example generates a random number using the SHA256 algorithm:

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'

回答 2

要存储密码(PY3):

import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'

hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()

To store the password (PY3):

import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'

hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()

回答 3

该错误已说明您必须执行的操作。MD5对字节进行操作,因此您必须将Unicode字符串编码为bytes,例如使用line.encode('utf-8')

The error already says what you have to do. MD5 operates on bytes, so you have to encode Unicode string into bytes, e.g. with line.encode('utf-8').


回答 4

请首先查看答案。

现在,该错误信息是明确的:你只能使用字节,而不是Python字符串(曾经被认为是unicode在Python <3),所以你必须与你的首选编码编码字符串:utf-32utf-16utf-8或受限制的甚至是一个8-位编码(有些可能称为代码页)。

从文件中读取时,Python 3将自动将单词列表文件中的字节解码为Unicode。我建议你这样做:

m.update(line.encode(wordlistfile.encoding))

因此,推送到md5算法的编码数据的编码方式与基础文件完全相同。

Please take a look first at that answer.

Now, the error message is clear: you can only use bytes, not Python strings (what used to be unicode in Python < 3), so you have to encode the strings with your preferred encoding: utf-32, utf-16, utf-8 or even one of the restricted 8-bit encodings (what some might call codepages).

The bytes in your wordlist file are being automatically decoded to Unicode by Python 3 as you read from the file. I suggest you do:

m.update(line.encode(wordlistfile.encoding))

so that the encoded data pushed to the md5 algorithm are encoded exactly like the underlying file.


回答 5

import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())
import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())

回答 6

您可以以二进制模式打开文件:

import hashlib

with open(hash_file) as file:
    control_hash = file.readline().rstrip("\n")

wordlistfile = open(wordlist, "rb")
# ...
for line in wordlistfile:
    if hashlib.md5(line.rstrip(b'\n\r')).hexdigest() == control_hash:
       # collision

You could open the file in binary mode:

import hashlib

with open(hash_file) as file:
    control_hash = file.readline().rstrip("\n")

wordlistfile = open(wordlist, "rb")
# ...
for line in wordlistfile:
    if hashlib.md5(line.rstrip(b'\n\r')).hexdigest() == control_hash:
       # collision

回答 7

编码这行为我固定。

m.update(line.encode('utf-8'))

encoding this line fixed it for me.

m.update(line.encode('utf-8'))

回答 8

如果是单行字符串。用b或B包裹它。例如:

variable = b"This is a variable"

要么

variable2 = B"This is also a variable"

If it’s a single line string. wrapt it with b or B. e.g:

variable = b"This is a variable"

or

variable2 = B"This is also a variable"

回答 9

该程序是上述MD5破解程序的无错误和增强版本,该MD5破解程序读取包含哈希密码列表的文件,并根据英语词典单词列表中的哈希单词检查该文件。希望对您有所帮助。

我从以下链接https://github.com/dwyl/english-words下载了英语词典

# md5cracker.py
# English Dictionary https://github.com/dwyl/english-words 

import hashlib, sys

hash_file = 'exercise\hashed.txt'
wordlist = 'data_sets\english_dictionary\words.txt'

try:
    hashdocument = open(hash_file,'r')
except IOError:
    print('Invalid file.')
    sys.exit()
else:
    count = 0
    for hash in hashdocument:
        hash = hash.rstrip('\n')
        print(hash)
        i = 0
        with open(wordlist,'r') as wordlistfile:
            for word in wordlistfile:
                m = hashlib.md5()
                word = word.rstrip('\n')            
                m.update(word.encode('utf-8'))
                word_hash = m.hexdigest()
                if word_hash==hash:
                    print('The word, hash combination is ' + word + ',' + hash)
                    count += 1
                    break
                i += 1
        print('Itiration is ' + str(i))
    if count == 0:
        print('The hash given does not correspond to any supplied word in the wordlist.')
    else:
        print('Total passwords identified is: ' + str(count))
sys.exit()

This program is the bug free and enhanced version of the above MD5 cracker that reads the file containing list of hashed passwords and checks it against hashed word from the English dictionary word list. Hope it is helpful.

I downloaded the English dictionary from the following link https://github.com/dwyl/english-words

# md5cracker.py
# English Dictionary https://github.com/dwyl/english-words 

import hashlib, sys

hash_file = 'exercise\hashed.txt'
wordlist = 'data_sets\english_dictionary\words.txt'

try:
    hashdocument = open(hash_file,'r')
except IOError:
    print('Invalid file.')
    sys.exit()
else:
    count = 0
    for hash in hashdocument:
        hash = hash.rstrip('\n')
        print(hash)
        i = 0
        with open(wordlist,'r') as wordlistfile:
            for word in wordlistfile:
                m = hashlib.md5()
                word = word.rstrip('\n')            
                m.update(word.encode('utf-8'))
                word_hash = m.hexdigest()
                if word_hash==hash:
                    print('The word, hash combination is ' + word + ',' + hash)
                    count += 1
                    break
                i += 1
        print('Itiration is ' + str(i))
    if count == 0:
        print('The hash given does not correspond to any supplied word in the wordlist.')
    else:
        print('Total passwords identified is: ' + str(count))
sys.exit()

使用Python’with’语句时捕获异常

问题:使用Python’with’语句时捕获异常

令我感到羞耻的是,我不知道如何处理python’with’语句的异常。如果我有代码:

with open("a.txt") as f:
    print f.readlines()

我真的很想处理“找不到文件异常”以便执行某些操作。但是我不能写

with open("a.txt") as f:
    print f.readlines()
except:
    print 'oops'

而且不能写

with open("a.txt") as f:
    print f.readlines()
else:
    print 'oops'

在try / except语句中包含“ with”,否则将不起作用:不会引发异常。为了以Python方式处理“ with”语句中的失败,我该怎么办?

To my shame, I can’t figure out how to handle exception for python ‘with’ statement. If I have a code:

with open("a.txt") as f:
    print f.readlines()

I really want to handle ‘file not found exception’ in order to do somehing. But I can’t write

with open("a.txt") as f:
    print f.readlines()
except:
    print 'oops'

and can’t write

with open("a.txt") as f:
    print f.readlines()
else:
    print 'oops'

enclosing ‘with’ in a try/except statement doesn’t work else: exception is not raised. What can I do in order to process failure inside ‘with’ statement in a Pythonic way?


回答 0

from __future__ import with_statement

try:
    with open( "a.txt" ) as f :
        print f.readlines()
except EnvironmentError: # parent of IOError, OSError *and* WindowsError where available
    print 'oops'

如果您希望通过公开调用与工作代码进行不同的错误处理,则可以执行以下操作:

try:
    f = open('foo.txt')
except IOError:
    print('error')
else:
    with f:
        print f.readlines()
from __future__ import with_statement

try:
    with open( "a.txt" ) as f :
        print f.readlines()
except EnvironmentError: # parent of IOError, OSError *and* WindowsError where available
    print 'oops'

If you want different handling for errors from the open call vs the working code you could do:

try:
    f = open('foo.txt')
except IOError:
    print('error')
else:
    with f:
        print f.readlines()

回答 1

利用该with语句的最佳“ Pythonic”方法在PEP 343中列为示例6 ,给出了该语句的背景。

@contextmanager
def opened_w_error(filename, mode="r"):
    try:
        f = open(filename, mode)
    except IOError, err:
        yield None, err
    else:
        try:
            yield f, None
        finally:
            f.close()

用法如下:

with opened_w_error("/etc/passwd", "a") as (f, err):
    if err:
        print "IOError:", err
    else:
        f.write("guido::0:0::/:/bin/sh\n")

The best “Pythonic” way to do this, exploiting the with statement, is listed as Example #6 in PEP 343, which gives the background of the statement.

@contextmanager
def opened_w_error(filename, mode="r"):
    try:
        f = open(filename, mode)
    except IOError, err:
        yield None, err
    else:
        try:
            yield f, None
        finally:
            f.close()

Used as follows:

with opened_w_error("/etc/passwd", "a") as (f, err):
    if err:
        print "IOError:", err
    else:
        f.write("guido::0:0::/:/bin/sh\n")

回答 2

使用Python’with’语句时捕获异常

从Python 2.6开始,with语句无需__future__导入即可使用。您可以使用以下命令在Python 2.5之前获得它(但现在是时候进行升级了!):

from __future__ import with_statement

这是更正您所拥有的东西。您快到了,但是with没有except子句:

with open("a.txt") as f: 
    print(f.readlines())
except:                    # <- with doesn't have an except clause.
    print('oops')

上下文管理器的__exit__方法(如果返回)False将在完成时引发错误。如果返回True,它将抑制它。在open内置的__exit__不返回True,那么你只需要嵌套一个try,except块:

try:
    with open("a.txt") as f:
        print(f.readlines())
except Exception as error: 
    print('oops')

和标准样板:不要使用裸露的东西except:,以免发生BaseException任何其他异常和警告。至少要Exception和错误一样具体,也许要抓住IOError。只捕获您准备处理的错误。

因此,在这种情况下,您需要执行以下操作:

>>> try:
...     with open("a.txt") as f:
...         print(f.readlines())
... except IOError as error: 
...     print('oops')
... 
oops

Catching an exception while using a Python ‘with’ statement

The with statement has been available without the __future__ import since Python 2.6. You can get it as early as Python 2.5 (but at this point it’s time to upgrade!) with:

from __future__ import with_statement

Here’s the closest thing to correct that you have. You’re almost there, but with doesn’t have an except clause:

with open("a.txt") as f: 
    print(f.readlines())
except:                    # <- with doesn't have an except clause.
    print('oops')

A context manager’s __exit__ method, if it returns False will reraise the error when it finishes. If it returns True, it will suppress it. The open builtin’s __exit__ doesn’t return True, so you just need to nest it in a try, except block:

try:
    with open("a.txt") as f:
        print(f.readlines())
except Exception as error: 
    print('oops')

And standard boilerplate: don’t use a bare except: which catches BaseException and every other possible exception and warning. Be at least as specific as Exception, and for this error, perhaps catch IOError. Only catch errors you’re prepared to handle.

So in this case, you’d do:

>>> try:
...     with open("a.txt") as f:
...         print(f.readlines())
... except IOError as error: 
...     print('oops')
... 
oops

回答 3

区分复合with语句引发的异常的可能来源

区分with语句中发生的异常非常棘手,因为它们可能起源于不同的地方。可以从以下任一位置(或其中调用的功能)引发异常:

  • ContextManager.__init__
  • ContextManager.__enter__
  • 的身体 with
  • ContextManager.__exit__

有关更多详细信息,请参见有关Context Manager Types的文档。

如果我们想区分这些不同的情况,仅将包裹with到a中try .. except是不够的。考虑以下示例(ValueError用作示例,但当然可以用任何其他异常类型代替):

try:
    with ContextManager():
        BLOCK
except ValueError as err:
    print(err)

在此,except遗嘱将捕获源自四个不同地方的所有exceptions,因此不允许对其进行区分。如果将上下文管理器对象的实例移到之外with,则可以区分__init__BLOCK / __enter__ / __exit__

try:
    mgr = ContextManager()
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        with mgr:
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
    except ValueError as err:
        # At this point we still cannot distinguish between exceptions raised from
        # __enter__, BLOCK, __exit__ (also BLOCK since we didn't catch ValueError in the body)
        pass

有效地,这仅对__init__零件有所帮助,但是我们可以添加一个额外的哨兵变量来检查with开始执行的主体(即__enter__与其他主体区别):

try:
    mgr = ContextManager()  # __init__ could raise
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        entered_body = False
        with mgr:
            entered_body = True  # __enter__ did not raise at this point
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
    except ValueError as err:
        if not entered_body:
            print('__enter__ raised:', err)
        else:
            # At this point we know the exception came either from BLOCK or from __exit__
            pass

棘手的部分是区分源自BLOCK__exit__的异常,这是因为逃脱了withwill 主体的异常将传递给__exit__它,从而可以决定如何处理它(请参阅docs)。但是__exit__,如果出现异常,则将原来的exceptions替换为新的exceptions。为了处理这些情况,我们可以except在的主体中添加一个通用子句,with以存储可能会被忽略而不会被忽略的任何潜在异常,并将其与except后来在最外层捕获的异常进行比较-如果它们相同,则表示起源是BLOCK否则,它是__exit__(以防万一__exit__,通过在最外层返回true值来抑制异常except 根本不会执行)。

try:
    mgr = ContextManager()  # __init__ could raise
except ValueError as err:
    print('__init__ raised:', err)
else:
    entered_body = exc_escaped_from_body = False
    try:
        with mgr:
            entered_body = True  # __enter__ did not raise at this point
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
            except Exception as err:  # this exception would normally escape without notice
                # we store this exception to check in the outer `except` clause
                # whether it is the same (otherwise it comes from __exit__)
                exc_escaped_from_body = err
                raise  # re-raise since we didn't intend to handle it, just needed to store it
    except ValueError as err:
        if not entered_body:
            print('__enter__ raised:', err)
        elif err is exc_escaped_from_body:
            print('BLOCK raised:', err)
        else:
            print('__exit__ raised:', err)

使用PEP 343中提到的等效形式的替代方法

PEP 343-“ with”语句指定该语句的等效“ non-with”版本with。在这里,我们可以轻松地将各个部分包装在一起try ... except,从而区分出不同的潜在错误源:

import sys

try:
    mgr = ContextManager()
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        value = type(mgr).__enter__(mgr)
    except ValueError as err:
        print('__enter__ raised:', err)
    else:
        exit = type(mgr).__exit__
        exc = True
        try:
            try:
                BLOCK
            except TypeError:
                pass
            except:
                exc = False
                try:
                    exit_val = exit(mgr, *sys.exc_info())
                except ValueError as err:
                    print('__exit__ raised:', err)
                else:
                    if not exit_val:
                        raise
        except ValueError as err:
            print('BLOCK raised:', err)
        finally:
            if exc:
                try:
                    exit(mgr, None, None, None)
                except ValueError as err:
                    print('__exit__ raised:', err)

通常,更简单的方法就可以了

这种特殊异常处理的需求应该非常少,通常将整个包装with在一个try ... except块中就足够了。特别是如果各种错误源由不同的(自定义)异常类型指示(需要相应地设计上下文管理器),我们可以很容易地区分它们。例如:

try:
    with ContextManager():
        BLOCK
except InitError:  # raised from __init__
    ...
except AcquireResourceError:  # raised from __enter__
    ...
except ValueError:  # raised from BLOCK
    ...
except ReleaseResourceError:  # raised from __exit__
    ...

Differentiating between the possible origins of exceptions raised from a compound with statement

Differentiating between exceptions that occur in a with statement is tricky because they can originate in different places. Exceptions can be raised from either of the following places (or functions called therein):

  • ContextManager.__init__
  • ContextManager.__enter__
  • the body of the with
  • ContextManager.__exit__

For more details see the documentation about Context Manager Types.

If we want to distinguish between these different cases, just wrapping the with into a try .. except is not sufficient. Consider the following example (using ValueError as an example but of course it could be substituted with any other exception type):

try:
    with ContextManager():
        BLOCK
except ValueError as err:
    print(err)

Here the except will catch exceptions originating in all of the four different places and thus does not allow to distinguish between them. If we move the instantiation of the context manager object outside the with, we can distinguish between __init__ and BLOCK / __enter__ / __exit__:

try:
    mgr = ContextManager()
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        with mgr:
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
    except ValueError as err:
        # At this point we still cannot distinguish between exceptions raised from
        # __enter__, BLOCK, __exit__ (also BLOCK since we didn't catch ValueError in the body)
        pass

Effectively this just helped with the __init__ part but we can add an extra sentinel variable to check whether the body of the with started to execute (i.e. differentiating between __enter__ and the others):

try:
    mgr = ContextManager()  # __init__ could raise
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        entered_body = False
        with mgr:
            entered_body = True  # __enter__ did not raise at this point
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
    except ValueError as err:
        if not entered_body:
            print('__enter__ raised:', err)
        else:
            # At this point we know the exception came either from BLOCK or from __exit__
            pass

The tricky part is to differentiate between exceptions originating from BLOCK and __exit__ because an exception that escapes the body of the with will be passed to __exit__ which can decide how to handle it (see the docs). If however __exit__ raises itself, the original exception will be replaced by the new one. To deal with these cases we can add a general except clause in the body of the with to store any potential exception that would have otherwise escaped unnoticed and compare it with the one caught in the outermost except later on – if they are the same this means the origin was BLOCK or otherwise it was __exit__ (in case __exit__ suppresses the exception by returning a true value the outermost except will simply not be executed).

try:
    mgr = ContextManager()  # __init__ could raise
except ValueError as err:
    print('__init__ raised:', err)
else:
    entered_body = exc_escaped_from_body = False
    try:
        with mgr:
            entered_body = True  # __enter__ did not raise at this point
            try:
                BLOCK
            except TypeError:  # catching another type (which we want to handle here)
                pass
            except Exception as err:  # this exception would normally escape without notice
                # we store this exception to check in the outer `except` clause
                # whether it is the same (otherwise it comes from __exit__)
                exc_escaped_from_body = err
                raise  # re-raise since we didn't intend to handle it, just needed to store it
    except ValueError as err:
        if not entered_body:
            print('__enter__ raised:', err)
        elif err is exc_escaped_from_body:
            print('BLOCK raised:', err)
        else:
            print('__exit__ raised:', err)

Alternative approach using the equivalent form mentioned in PEP 343

PEP 343 — The “with” Statement specifies an equivalent “non-with” version of the with statement. Here we can readily wrap the various parts with try ... except and thus differentiate between the different potential error sources:

import sys

try:
    mgr = ContextManager()
except ValueError as err:
    print('__init__ raised:', err)
else:
    try:
        value = type(mgr).__enter__(mgr)
    except ValueError as err:
        print('__enter__ raised:', err)
    else:
        exit = type(mgr).__exit__
        exc = True
        try:
            try:
                BLOCK
            except TypeError:
                pass
            except:
                exc = False
                try:
                    exit_val = exit(mgr, *sys.exc_info())
                except ValueError as err:
                    print('__exit__ raised:', err)
                else:
                    if not exit_val:
                        raise
        except ValueError as err:
            print('BLOCK raised:', err)
        finally:
            if exc:
                try:
                    exit(mgr, None, None, None)
                except ValueError as err:
                    print('__exit__ raised:', err)

Usually a simpler approach will do just fine

The need for such special exception handling should be quite rare and normally wrapping the whole with in a try ... except block will be sufficient. Especially if the various error sources are indicated by different (custom) exception types (the context managers need to be designed accordingly) we can readily distinguish between them. For example:

try:
    with ContextManager():
        BLOCK
except InitError:  # raised from __init__
    ...
except AcquireResourceError:  # raised from __enter__
    ...
except ValueError:  # raised from BLOCK
    ...
except ReleaseResourceError:  # raised from __exit__
    ...

如何在Django queryset中执行OR条件?

问题:如何在Django queryset中执行OR条件?

我想编写一个与此SQL查询等效的Django查询:

SELECT * from user where income >= 5000 or income is NULL.

如何构造Django queryset过滤器?

User.objects.filter(income__gte=5000, income=0)

这是行不通的,因为它AND是过滤器。我想要OR过滤器以获取单个查询集的并集。

I want to write a Django query equivalent to this SQL query:

SELECT * from user where income >= 5000 or income is NULL.

How to construct the Django queryset filter?

User.objects.filter(income__gte=5000, income=0)

This doesn’t work, because it ANDs the filters. I want to OR the filters to get union of individual querysets.


回答 0

from django.db.models import Q
User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

通过文档

from django.db.models import Q
User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

via Documentation


回答 1

由于QuerySet实现了Python __or__运算符(|)或并集,因此它可以正常工作。正如您所期望的,|二进制运算符返回一个QuerySetso order_by(),,.distinct()其他查询集过滤器可以附加到末尾。

combined_queryset = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)
ordered_queryset = combined_queryset.order_by('-income')

更新2019-06-20:现在在Django 2.1 QuerySet API参考中已完全记录了该内容。更多历史性讨论可以在DjangoProject票证#21333中找到

Because QuerySets implement the Python __or__ operator (|), or union, it just works. As you’d expect, the | binary operator returns a QuerySet so order_by(), .distinct(), and other queryset filters can be tacked on to the end.

combined_queryset = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)
ordered_queryset = combined_queryset.order_by('-income')

Update 2019-06-20: This is now fully documented in the Django 2.1 QuerySet API reference. More historic discussion can be found in DjangoProject ticket #21333.


回答 2

现有答案中已经提到了这两种选择:

from django.db.models import Q
q1 = User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

q2 = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)

但是,似乎偏爱哪一个。

关键是它们在SQL级别上是相同的,所以请随意选择您喜欢的任何一个!

Django的ORM食谱在这个比较详细谈到,这里是有关部分:


queryset = User.objects.filter(
        first_name__startswith='R'
    ) | User.objects.filter(
    last_name__startswith='D'
)

导致

In [5]: str(queryset.query)
Out[5]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
"auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
"auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

qs = User.objects.filter(Q(first_name__startswith='R') | Q(last_name__startswith='D'))

导致

In [9]: str(qs.query)
Out[9]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
 "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
  "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
  "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
  WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

资料来源:django-orm-cookbook


Both options are already mentioned in the existing answers:

from django.db.models import Q
q1 = User.objects.filter(Q(income__gte=5000) | Q(income__isnull=True))

and

q2 = User.objects.filter(income__gte=5000) | User.objects.filter(income__isnull=True)

However, there seems to be some confusion regarding which one is to prefer.

The point is that they are identical on the SQL level, so feel free to pick whichever you like!

The Django ORM Cookbook talks in some detail about this, here is the relevant part:


queryset = User.objects.filter(
        first_name__startswith='R'
    ) | User.objects.filter(
    last_name__startswith='D'
)

leads to

In [5]: str(queryset.query)
Out[5]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
"auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
"auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
"auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

and

qs = User.objects.filter(Q(first_name__startswith='R') | Q(last_name__startswith='D'))

leads to

In [9]: str(qs.query)
Out[9]: 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login",
 "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name",
  "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff",
  "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user"
  WHERE ("auth_user"."first_name"::text LIKE R% OR "auth_user"."last_name"::text LIKE D%)'

source: django-orm-cookbook



numpy中的flatten和ravel函数有什么区别?

问题:numpy中的flatten和ravel函数有什么区别?

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

这两个函数返回相同的列表。那么需要两个不同的功能来执行相同的工作。

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

Both function return the same list. Then what is the need of two different functions performing same job.


回答 0

当前的API是:

  • flatten 总是返回一个副本。
  • ravel尽可能返回原始数组的视图。这在打印输出中不可见,但是如果您修改ravel返回的数组,则可能会修改原始数组中的条目。如果您修改从flatten返回的数组中的条目,则将永远不会发生。ravel通常会更快,因为没有内存被复制,但是您在修改返回的数组时要格外小心。
  • reshape((-1,)) 只要数组的步幅允许,就可以得到一个视图,即使这意味着您并不总是可以获得连续的数组。

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn’t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don’t always get a contiguous array.

回答 1

如此所述,关键区别在于:

  • flatten 是ndarray对象的方法,因此只能用于真正的numpy数组。

  • ravel 是库级别的函数,因此可以在任何可以成功解析的对象上调用。

例如,ravel将对ndarray列表起作用,flatten而不适用于该类型的对象。

@IanH还在回答中指出了与内存处理的重要区别。

As explained here a key difference is that:

  • flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

  • ravel is a library-level function and hence can be called on any object that can successfully be parsed.

For example ravel will work on a list of ndarrays, while flatten is not available for that type of object.

@IanH also points out important differences with memory handling in his answer.


回答 2

这是函数的正确命名空间:

这两个函数均返回指向新存储器结构的展平一维数组。

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

在上例中:

  • 结果的存储位置不同,
  • 结果看起来一样
  • 展平将返回副本
  • ravel将返回一个视图。

我们如何检查某物是否是副本?使用的.base属性ndarray。如果是视图,则基础将是原始数组;如果是副本,则基数为None

Here is the correct namespace for the functions:

Both functions return flattened 1D arrays pointing to the new memory structures.

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

In the upper example:

  • the memory locations of the results are different,
  • the results look the same
  • flatten would return a copy
  • ravel would return a view.

How we check if something is a copy? Using the .base attribute of the ndarray. If it’s a view, the base will be the original array; if it is a copy, the base will be None.


如何在NumPy数组中添加额外的列

问题:如何在NumPy数组中添加额外的列

假设我有一个NumPy数组a

a = np.array([
    [1, 2, 3],
    [2, 3, 4]
    ])

我想添加一列零以获取一个数组b

b = np.array([
    [1, 2, 3, 0],
    [2, 3, 4, 0]
    ])

我如何在NumPy中轻松做到这一点?

Let’s say I have a NumPy array, a:

a = np.array([
    [1, 2, 3],
    [2, 3, 4]
    ])

And I would like to add a column of zeros to get an array, b:

b = np.array([
    [1, 2, 3, 0],
    [2, 3, 4, 0]
    ])

How can I do this easily in NumPy?


回答 0

我认为,更简单,更快速的启动方法是执行以下操作:

import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a

和时间:

In [23]: N = 10

In [24]: a = np.random.rand(N,N)

In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop

In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop

I think a more straightforward solution and faster to boot is to do the following:

import numpy as np
N = 10
a = np.random.rand(N,N)
b = np.zeros((N,N+1))
b[:,:-1] = a

And timings:

In [23]: N = 10

In [24]: a = np.random.rand(N,N)

In [25]: %timeit b = np.hstack((a,np.zeros((a.shape[0],1))))
10000 loops, best of 3: 19.6 us per loop

In [27]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 5.62 us per loop

回答 1

np.r_[ ... ]并且np.c_[ ... ] 是有用的替代品vstackhstack,用方括号[]代替圆()。
几个例子:

: import numpy as np
: N = 3
: A = np.eye(N)

: np.c_[ A, np.ones(N) ]              # add a column
array([[ 1.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  1.],
       [ 0.,  0.,  1.,  1.]])

: np.c_[ np.ones(N), A, np.ones(N) ]  # or two
array([[ 1.,  1.,  0.,  0.,  1.],
       [ 1.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  1.,  1.]])

: np.r_[ A, [A[1]] ]              # add a row
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])
: # not np.r_[ A, A[1] ]

: np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

(使用方括号[]代替round()的原因是Python扩展了方括号内的比例,例如1:4,这是重载的奇迹。)

np.r_[ ... ] and np.c_[ ... ] are useful alternatives to vstack and hstack, with square brackets [] instead of round ().
A couple of examples:

: import numpy as np
: N = 3
: A = np.eye(N)

: np.c_[ A, np.ones(N) ]              # add a column
array([[ 1.,  0.,  0.,  1.],
       [ 0.,  1.,  0.,  1.],
       [ 0.,  0.,  1.,  1.]])

: np.c_[ np.ones(N), A, np.ones(N) ]  # or two
array([[ 1.,  1.,  0.,  0.,  1.],
       [ 1.,  0.,  1.,  0.,  1.],
       [ 1.,  0.,  0.,  1.,  1.]])

: np.r_[ A, [A[1]] ]              # add a row
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.]])
: # not np.r_[ A, A[1] ]

: np.r_[ A[0], 1, 2, 3, A[1] ]    # mix vecs and scalars
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], [1, 2, 3], A[1] ]  # lists
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], (1, 2, 3), A[1] ]  # tuples
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

: np.r_[ A[0], 1:4, A[1] ]        # same, 1:4 == arange(1,4) == 1,2,3
  array([ 1.,  0.,  0.,  1.,  2.,  3.,  0.,  1.,  0.])

(The reason for square brackets [] instead of round () is that Python expands e.g. 1:4 in square — the wonders of overloading.)


回答 2

用途numpy.append

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])

>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
       [0]])

>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
       [2, 3, 4, 0]])

Use numpy.append:

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])

>>> z = np.zeros((2,1), dtype=int64)
>>> z
array([[0],
       [0]])

>>> np.append(a, z, axis=1)
array([[1, 2, 3, 0],
       [2, 3, 4, 0]])

回答 3

使用hstack的一种方法是:

b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))

One way, using hstack, is:

b = np.hstack((a, np.zeros((a.shape[0], 1), dtype=a.dtype)))

回答 4

我发现以下最优雅的东西:

b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3

的优点insert是,它还允许您在数组内的其他位置插入列(或行)。同样,除了插入单个值,您还可以轻松插入整个向量,例如,复制最后一列:

b = np.insert(a, insert_index, values=a[:,2], axis=1)

这导致:

array([[1, 2, 3, 3],
       [2, 3, 4, 4]])

在时间上,insert可能比JoshAdel的解决方案慢:

In [1]: N = 10

In [2]: a = np.random.rand(N,N)

In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop

In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop

In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop

I find the following most elegant:

b = np.insert(a, 3, values=0, axis=1) # Insert values before column 3

An advantage of insert is that it also allows you to insert columns (or rows) at other places inside the array. Also instead of inserting a single value you can easily insert a whole vector, for instance duplicate the last column:

b = np.insert(a, insert_index, values=a[:,2], axis=1)

Which leads to:

array([[1, 2, 3, 3],
       [2, 3, 4, 4]])

For the timing, insert might be slower than JoshAdel’s solution:

In [1]: N = 10

In [2]: a = np.random.rand(N,N)

In [3]: %timeit b = np.hstack((a, np.zeros((a.shape[0], 1))))
100000 loops, best of 3: 7.5 µs per loop

In [4]: %timeit b = np.zeros((a.shape[0], a.shape[1]+1)); b[:,:-1] = a
100000 loops, best of 3: 2.17 µs per loop

In [5]: %timeit b = np.insert(a, 3, values=0, axis=1)
100000 loops, best of 3: 10.2 µs per loop

回答 5

我对这个问题也很感兴趣,并比较了

numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)               
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T

所有输入向量都做同样的事情a。生长时间a

在此处输入图片说明

请注意,所有非连续变体(特别是 stack/ vstack)最终都比所有连续变体快。column_stack(出于清晰度和速度方面)(如果需要连续性)似乎是一个不错的选择。


复制剧情的代码:

import numpy
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.c_[a, a],
        lambda a: numpy.ascontiguousarray(numpy.stack([a, a]).T),
        lambda a: numpy.ascontiguousarray(numpy.vstack([a, a]).T),
        lambda a: numpy.column_stack([a, a]),
        lambda a: numpy.concatenate([a[:, None], a[:, None]], axis=1),
        lambda a: numpy.ascontiguousarray(
            numpy.concatenate([a[None], a[None]], axis=0).T
        ),
        lambda a: numpy.stack([a, a]).T,
        lambda a: numpy.vstack([a, a]).T,
        lambda a: numpy.concatenate([a[None], a[None]], axis=0).T,
    ],
    labels=[
        "c_",
        "ascont(stack)",
        "ascont(vstack)",
        "column_stack",
        "concat",
        "ascont(concat)",
        "stack (non-cont)",
        "vstack (non-cont)",
        "concat (non-cont)",
    ],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(a)",
    logx=True,
    logy=True,
)

I was also interested in this question and compared the speed of

numpy.c_[a, a]
numpy.stack([a, a]).T
numpy.vstack([a, a]).T
numpy.ascontiguousarray(numpy.stack([a, a]).T)               
numpy.ascontiguousarray(numpy.vstack([a, a]).T)
numpy.column_stack([a, a])
numpy.concatenate([a[:,None], a[:,None]], axis=1)
numpy.concatenate([a[None], a[None]], axis=0).T

which all do the same thing for any input vector a. Timings for growing a:

enter image description here

Note that all non-contiguous variants (in particular stack/vstack) are eventually faster than all contiguous variants. column_stack (for its clarity and speed) appears to be a good option if you require contiguity.


Code to reproduce the plot:

import numpy
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.c_[a, a],
        lambda a: numpy.ascontiguousarray(numpy.stack([a, a]).T),
        lambda a: numpy.ascontiguousarray(numpy.vstack([a, a]).T),
        lambda a: numpy.column_stack([a, a]),
        lambda a: numpy.concatenate([a[:, None], a[:, None]], axis=1),
        lambda a: numpy.ascontiguousarray(
            numpy.concatenate([a[None], a[None]], axis=0).T
        ),
        lambda a: numpy.stack([a, a]).T,
        lambda a: numpy.vstack([a, a]).T,
        lambda a: numpy.concatenate([a[None], a[None]], axis=0).T,
    ],
    labels=[
        "c_",
        "ascont(stack)",
        "ascont(vstack)",
        "column_stack",
        "concat",
        "ascont(concat)",
        "stack (non-cont)",
        "vstack (non-cont)",
        "concat (non-cont)",
    ],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(a)",
    logx=True,
    logy=True,
)

回答 6

我认为:

np.column_stack((a, zeros(shape(a)[0])))

更优雅。

I think:

np.column_stack((a, zeros(shape(a)[0])))

is more elegant.


回答 7

np.concatenate也可以

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
       [ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1.,  2.,  3.,  0.],
       [ 2.,  3.,  4.,  0.]])

np.concatenate also works

>>> a = np.array([[1,2,3],[2,3,4]])
>>> a
array([[1, 2, 3],
       [2, 3, 4]])
>>> z = np.zeros((2,1))
>>> z
array([[ 0.],
       [ 0.]])
>>> np.concatenate((a, z), axis=1)
array([[ 1.,  2.,  3.,  0.],
       [ 2.,  3.,  4.,  0.]])

回答 8

假设M一个(100,3)ndarray和y一个(100,)ndarray append可以按以下方式使用:

M=numpy.append(M,y[:,None],1)

诀窍是使用

y[:, None]

这将转换y为(100,1)2D数组。

M.shape

现在给

(100, 4)

Assuming M is a (100,3) ndarray and y is a (100,) ndarray append can be used as follows:

M=numpy.append(M,y[:,None],1)

The trick is to use

y[:, None]

This converts y to a (100, 1) 2D array.

M.shape

now gives

(100, 4)

回答 9

我喜欢JoshAdel的答案,因为它专注于性能。性能上的次要改进是避免仅被覆盖的初始化零的开销。当N较大时,使用空而不是零,并且将零列作为单独的步骤写入时,这具有可测量的差异:

In [1]: import numpy as np

In [2]: N = 10000

In [3]: a = np.ones((N,N))

In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop

In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop

I like JoshAdel’s answer because of the focus on performance. A minor performance improvement is to avoid the overhead of initializing with zeros, only to be overwritten. This has a measurable difference when N is large, empty is used instead of zeros, and the column of zeros is written as a separate step:

In [1]: import numpy as np

In [2]: N = 10000

In [3]: a = np.ones((N,N))

In [4]: %timeit b = np.zeros((a.shape[0],a.shape[1]+1)); b[:,:-1] = a
1 loops, best of 3: 492 ms per loop

In [5]: %timeit b = np.empty((a.shape[0],a.shape[1]+1)); b[:,:-1] = a; b[:,-1] = np.zeros((a.shape[0],))
1 loops, best of 3: 407 ms per loop

回答 10

np.insert 也达到目的。

matA = np.array([[1,2,3], 
                 [2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)

array([[1, 2, 3, 0],
       [2, 3, 4, 0]])

它沿一个轴new_col在给定索引之前在此处插入值idx。换句话说,新插入的值将占据该idx列并向后移动原始位置idx

np.insert also serves the purpose.

matA = np.array([[1,2,3], 
                 [2,3,4]])
idx = 3
new_col = np.array([0, 0])
np.insert(matA, idx, new_col, axis=1)

array([[1, 2, 3, 0],
       [2, 3, 4, 0]])

It inserts values, here new_col, before a given index, here idx along one axis. In other words, the newly inserted values will occupy the idx column and move what were originally there at and after idx backward.


回答 11

向numpy数组添加额外的列:

Numpy的np.append方法需要三个参数,前两个是2D numpy数组,第三个是轴参数,指示要沿哪个轴附加:

import numpy as np  
x = np.array([[1,2,3], [4,5,6]]) 
print("Original x:") 
print(x) 

y = np.array([[1], [1]]) 
print("Original y:") 
print(y) 

print("x appended to y on axis of 1:") 
print(np.append(x, y, axis=1)) 

印刷品:

Original x:
[[1 2 3]
 [4 5 6]]
Original y:
[[1]
 [1]]
x appended to y on axis of 1:
[[1 2 3 1]
 [4 5 6 1]]

Add an extra column to a numpy array:

Numpy’s np.append method takes three parameters, the first two are 2D numpy arrays and the 3rd is an axis parameter instructing along which axis to append:

import numpy as np  
x = np.array([[1,2,3], [4,5,6]]) 
print("Original x:") 
print(x) 

y = np.array([[1], [1]]) 
print("Original y:") 
print(y) 

print("x appended to y on axis of 1:") 
print(np.append(x, y, axis=1)) 

Prints:

Original x:
[[1 2 3]
 [4 5 6]]
Original y:
[[1]
 [1]]
x appended to y on axis of 1:
[[1 2 3 1]
 [4 5 6 1]]

回答 12

晚会晚了一点,但是还没有人发布这个答案,因此为了完整起见:您可以使用列表推导在一个简单的Python数组上执行此操作:

source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)

A bit late to the party, but nobody posted this answer yet, so for the sake of completeness: you can do this with list comprehensions, on a plain Python array:

source = a.tolist()
result = [row + [0] for row in source]
b = np.array(result)

回答 13

就我而言,我必须在NumPy数组中添加一列

X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)

在X.shape =>(97,2)之后

array([[ 1. , 6.1101],
       [ 1. , 5.5277],
...

In my case, I had to add a column of ones to a NumPy array

X = array([ 6.1101, 5.5277, ... ])
X.shape => (97,)
X = np.concatenate((np.ones((m,1), dtype=np.int), X.reshape(m,1)), axis=1)

After X.shape => (97, 2)

array([[ 1. , 6.1101],
       [ 1. , 5.5277],
...

回答 14

对我来说,下一种方法看起来非常直观和简单。

zeros = np.zeros((2,1)) #2 is a number of rows in your array.   
b = np.hstack((a, zeros))

For me, the next way looks pretty intuitive and simple.

zeros = np.zeros((2,1)) #2 is a number of rows in your array.   
b = np.hstack((a, zeros))

回答 15

有专门为此功能。它叫做numpy.pad

a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
           [2, 3, 4, 0]])

这是它在文档字符串中所说的:

Pads an array.

Parameters
----------
array : array_like of rank N
    Input array
pad_width : {sequence, array_like, int}
    Number of values padded to the edges of each axis.
    ((before_1, after_1), ... (before_N, after_N)) unique pad widths
    for each axis.
    ((before, after),) yields same before and after pad for each axis.
    (pad,) or int is a shortcut for before = after = pad width for all
    axes.
mode : str or function
    One of the following string values or a user supplied function.

    'constant'
        Pads with a constant value.
    'edge'
        Pads with the edge values of array.
    'linear_ramp'
        Pads with the linear ramp between end_value and the
        array edge value.
    'maximum'
        Pads with the maximum value of all or part of the
        vector along each axis.
    'mean'
        Pads with the mean value of all or part of the
        vector along each axis.
    'median'
        Pads with the median value of all or part of the
        vector along each axis.
    'minimum'
        Pads with the minimum value of all or part of the
        vector along each axis.
    'reflect'
        Pads with the reflection of the vector mirrored on
        the first and last values of the vector along each
        axis.
    'symmetric'
        Pads with the reflection of the vector mirrored
        along the edge of the array.
    'wrap'
        Pads with the wrap of the vector along the axis.
        The first values are used to pad the end and the
        end values are used to pad the beginning.
    <function>
        Padding function, see Notes.
stat_length : sequence or int, optional
    Used in 'maximum', 'mean', 'median', and 'minimum'.  Number of
    values at edge of each axis used to calculate the statistic value.

    ((before_1, after_1), ... (before_N, after_N)) unique statistic
    lengths for each axis.

    ((before, after),) yields same before and after statistic lengths
    for each axis.

    (stat_length,) or int is a shortcut for before = after = statistic
    length for all axes.

    Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
    Used in 'constant'.  The values to set the padded values for each
    axis.

    ((before_1, after_1), ... (before_N, after_N)) unique pad constants
    for each axis.

    ((before, after),) yields same before and after constants for each
    axis.

    (constant,) or int is a shortcut for before = after = constant for
    all axes.

    Default is 0.
end_values : sequence or int, optional
    Used in 'linear_ramp'.  The values used for the ending value of the
    linear_ramp and that will form the edge of the padded array.

    ((before_1, after_1), ... (before_N, after_N)) unique end values
    for each axis.

    ((before, after),) yields same before and after end values for each
    axis.

    (constant,) or int is a shortcut for before = after = end value for
    all axes.

    Default is 0.
reflect_type : {'even', 'odd'}, optional
    Used in 'reflect', and 'symmetric'.  The 'even' style is the
    default with an unaltered reflection around the edge value.  For
    the 'odd' style, the extented part of the array is created by
    subtracting the reflected values from two times the edge value.

Returns
-------
pad : ndarray
    Padded array of rank equal to `array` with shape increased
    according to `pad_width`.

Notes
-----
.. versionadded:: 1.7.0

For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes.  This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.

The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::

    padding_func(vector, iaxis_pad_width, iaxis, kwargs)

where

    vector : ndarray
        A rank 1 array already padded with zeros.  Padded values are
        vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
    iaxis_pad_width : tuple
        A 2-tuple of ints, iaxis_pad_width[0] represents the number of
        values padded at the beginning of vector where
        iaxis_pad_width[1] represents the number of values padded at
        the end of vector.
    iaxis : int
        The axis currently being calculated.
    kwargs : dict
        Any keyword arguments the function requires.

Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])

>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])

>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5,  3,  1,  2,  3,  4,  5,  2, -1, -4])

>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])

>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [3, 3, 3, 4, 3, 3, 3],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1]])

>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])

>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])

>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])

>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])

>>> def pad_with(vector, pad_width, iaxis, kwargs):
...     pad_value = kwargs.get('padder', 10)
...     vector[:pad_width[0]] = pad_value
...     vector[-pad_width[1]:] = pad_value
...     return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10,  0,  1,  2, 10, 10],
       [10, 10,  3,  4,  5, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100,   0,   1,   2, 100, 100],
       [100, 100,   3,   4,   5, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100]])

There is a function specifically for this. It is called numpy.pad

a = np.array([[1,2,3], [2,3,4]])
b = np.pad(a, ((0, 0), (0, 1)), mode='constant', constant_values=0)
print b
>>> array([[1, 2, 3, 0],
           [2, 3, 4, 0]])

Here is what it says in the docstring:

Pads an array.

Parameters
----------
array : array_like of rank N
    Input array
pad_width : {sequence, array_like, int}
    Number of values padded to the edges of each axis.
    ((before_1, after_1), ... (before_N, after_N)) unique pad widths
    for each axis.
    ((before, after),) yields same before and after pad for each axis.
    (pad,) or int is a shortcut for before = after = pad width for all
    axes.
mode : str or function
    One of the following string values or a user supplied function.

    'constant'
        Pads with a constant value.
    'edge'
        Pads with the edge values of array.
    'linear_ramp'
        Pads with the linear ramp between end_value and the
        array edge value.
    'maximum'
        Pads with the maximum value of all or part of the
        vector along each axis.
    'mean'
        Pads with the mean value of all or part of the
        vector along each axis.
    'median'
        Pads with the median value of all or part of the
        vector along each axis.
    'minimum'
        Pads with the minimum value of all or part of the
        vector along each axis.
    'reflect'
        Pads with the reflection of the vector mirrored on
        the first and last values of the vector along each
        axis.
    'symmetric'
        Pads with the reflection of the vector mirrored
        along the edge of the array.
    'wrap'
        Pads with the wrap of the vector along the axis.
        The first values are used to pad the end and the
        end values are used to pad the beginning.
    <function>
        Padding function, see Notes.
stat_length : sequence or int, optional
    Used in 'maximum', 'mean', 'median', and 'minimum'.  Number of
    values at edge of each axis used to calculate the statistic value.

    ((before_1, after_1), ... (before_N, after_N)) unique statistic
    lengths for each axis.

    ((before, after),) yields same before and after statistic lengths
    for each axis.

    (stat_length,) or int is a shortcut for before = after = statistic
    length for all axes.

    Default is ``None``, to use the entire axis.
constant_values : sequence or int, optional
    Used in 'constant'.  The values to set the padded values for each
    axis.

    ((before_1, after_1), ... (before_N, after_N)) unique pad constants
    for each axis.

    ((before, after),) yields same before and after constants for each
    axis.

    (constant,) or int is a shortcut for before = after = constant for
    all axes.

    Default is 0.
end_values : sequence or int, optional
    Used in 'linear_ramp'.  The values used for the ending value of the
    linear_ramp and that will form the edge of the padded array.

    ((before_1, after_1), ... (before_N, after_N)) unique end values
    for each axis.

    ((before, after),) yields same before and after end values for each
    axis.

    (constant,) or int is a shortcut for before = after = end value for
    all axes.

    Default is 0.
reflect_type : {'even', 'odd'}, optional
    Used in 'reflect', and 'symmetric'.  The 'even' style is the
    default with an unaltered reflection around the edge value.  For
    the 'odd' style, the extented part of the array is created by
    subtracting the reflected values from two times the edge value.

Returns
-------
pad : ndarray
    Padded array of rank equal to `array` with shape increased
    according to `pad_width`.

Notes
-----
.. versionadded:: 1.7.0

For an array with rank greater than 1, some of the padding of later
axes is calculated from padding of previous axes.  This is easiest to
think about with a rank 2 array where the corners of the padded array
are calculated by using padded values from the first axis.

The padding function, if used, should return a rank 1 array equal in
length to the vector argument with padded values replaced. It has the
following signature::

    padding_func(vector, iaxis_pad_width, iaxis, kwargs)

where

    vector : ndarray
        A rank 1 array already padded with zeros.  Padded values are
        vector[:pad_tuple[0]] and vector[-pad_tuple[1]:].
    iaxis_pad_width : tuple
        A 2-tuple of ints, iaxis_pad_width[0] represents the number of
        values padded at the beginning of vector where
        iaxis_pad_width[1] represents the number of values padded at
        the end of vector.
    iaxis : int
        The axis currently being calculated.
    kwargs : dict
        Any keyword arguments the function requires.

Examples
--------
>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2,3), 'constant', constant_values=(4, 6))
array([4, 4, 1, 2, 3, 4, 5, 6, 6, 6])

>>> np.pad(a, (2, 3), 'edge')
array([1, 1, 1, 2, 3, 4, 5, 5, 5, 5])

>>> np.pad(a, (2, 3), 'linear_ramp', end_values=(5, -4))
array([ 5,  3,  1,  2,  3,  4,  5,  2, -1, -4])

>>> np.pad(a, (2,), 'maximum')
array([5, 5, 1, 2, 3, 4, 5, 5, 5])

>>> np.pad(a, (2,), 'mean')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> np.pad(a, (2,), 'median')
array([3, 3, 1, 2, 3, 4, 5, 3, 3])

>>> a = [[1, 2], [3, 4]]
>>> np.pad(a, ((3, 2), (2, 3)), 'minimum')
array([[1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1],
       [3, 3, 3, 4, 3, 3, 3],
       [1, 1, 1, 2, 1, 1, 1],
       [1, 1, 1, 2, 1, 1, 1]])

>>> a = [1, 2, 3, 4, 5]
>>> np.pad(a, (2, 3), 'reflect')
array([3, 2, 1, 2, 3, 4, 5, 4, 3, 2])

>>> np.pad(a, (2, 3), 'reflect', reflect_type='odd')
array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

>>> np.pad(a, (2, 3), 'symmetric')
array([2, 1, 1, 2, 3, 4, 5, 5, 4, 3])

>>> np.pad(a, (2, 3), 'symmetric', reflect_type='odd')
array([0, 1, 1, 2, 3, 4, 5, 5, 6, 7])

>>> np.pad(a, (2, 3), 'wrap')
array([4, 5, 1, 2, 3, 4, 5, 1, 2, 3])

>>> def pad_with(vector, pad_width, iaxis, kwargs):
...     pad_value = kwargs.get('padder', 10)
...     vector[:pad_width[0]] = pad_value
...     vector[-pad_width[1]:] = pad_value
...     return vector
>>> a = np.arange(6)
>>> a = a.reshape((2, 3))
>>> np.pad(a, 2, pad_with)
array([[10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10,  0,  1,  2, 10, 10],
       [10, 10,  3,  4,  5, 10, 10],
       [10, 10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10, 10]])
>>> np.pad(a, 2, pad_with, padder=100)
array([[100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100,   0,   1,   2, 100, 100],
       [100, 100,   3,   4,   5, 100, 100],
       [100, 100, 100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100, 100, 100]])

在Python中搜索并替换文件中的一行

问题:在Python中搜索并替换文件中的一行

我想遍历文本文件的内容,进行搜索并替换某些行,然后将结果写回到文件中。我可以先将整个文件加载到内存中,然后再写回去,但这可能不是最好的方法。

在以下代码中,执行此操作的最佳方法是什么?

f = open(file)
for line in f:
    if line.contains('foo'):
        newline = line.replace('foo', 'bar')
        # how to write this newline back to the file

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.

What is the best way to do this, within the following code?

f = open(file)
for line in f:
    if line.contains('foo'):
        newline = line.replace('foo', 'bar')
        # how to write this newline back to the file

回答 0

我想类似的事情应该做。它基本上将内容写入新文件,并用新文件替换旧文件:

from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove

def replace(file_path, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    with fdopen(fh,'w') as new_file:
        with open(file_path) as old_file:
            for line in old_file:
                new_file.write(line.replace(pattern, subst))
    #Copy the file permissions from the old file to the new file
    copymode(file_path, abs_path)
    #Remove original file
    remove(file_path)
    #Move new file
    move(abs_path, file_path)

I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:

from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove

def replace(file_path, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    with fdopen(fh,'w') as new_file:
        with open(file_path) as old_file:
            for line in old_file:
                new_file.write(line.replace(pattern, subst))
    #Copy the file permissions from the old file to the new file
    copymode(file_path, abs_path)
    #Remove original file
    remove(file_path)
    #Move new file
    move(abs_path, file_path)

回答 1

最短的方法可能是使用fileinput模块。例如,以下代码将行号就地添加到文件中:

import fileinput

for line in fileinput.input("test.txt", inplace=True):
    print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
    # print "%d: %s" % (fileinput.filelineno(), line), # for Python 2

这里发生的是:

  1. 原始文件已移至备份文件
  2. 标准输出在循环中重定向到原始文件
  3. 因此,所有print语句都会写回到原始文件中

fileinput有更多的钟声和口哨声。例如,它可以用于自动操作中的所有文件sys.args[1:],而无需显式遍历它们。从Python 3.2开始,它还提供了在with语句中使用的便捷上下文管理器。


虽然fileinput对于一次性脚本非常有用,但我会警惕在实际代码中使用它,因为要承认它不是很易读或不熟悉。在实际(生产)代码中,值得花几行代码来使过程明确,从而使代码可读。

有两种选择:

  1. 该文件不是太大,您可以将其全部读取到内存中。然后关闭文件,以写入模式将其重新打开,然后将修改后的内容写回。
  2. 该文件太大,无法存储在内存中。您可以将其移到一个临时文件中并打开它,逐行阅读,然后写回到原始文件中。请注意,这需要两倍的存储空间。

The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:

import fileinput

for line in fileinput.input("test.txt", inplace=True):
    print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
    # print "%d: %s" % (fileinput.filelineno(), line), # for Python 2

What happens here is:

  1. The original file is moved to a backup file
  2. The standard output is redirected to the original file within the loop
  3. Thus any print statements write back into the original file

fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.


While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it’s not very readable or familiar. In real (production) code it’s worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.

There are two options:

  1. The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
  2. The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.

回答 2

这是另一个经过测试的示例,它将匹配搜索和替换模式:

import fileinput
import sys

def replaceAll(file,searchExp,replaceExp):
    for line in fileinput.input(file, inplace=1):
        if searchExp in line:
            line = line.replace(searchExp,replaceExp)
        sys.stdout.write(line)

使用示例:

replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")

Here’s another example that was tested, and will match search & replace patterns:

import fileinput
import sys

def replaceAll(file,searchExp,replaceExp):
    for line in fileinput.input(file, inplace=1):
        if searchExp in line:
            line = line.replace(searchExp,replaceExp)
        sys.stdout.write(line)

Example use:

replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")

回答 3

这应该起作用:(就地编辑)

import fileinput

# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1): 
      print line.replace("foo", "bar"),

This should work: (inplace editing)

import fileinput

# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1): 
      print line.replace("foo", "bar"),

回答 4

根据Thomas Watnedal的回答。但是,这不能完全回答原始问题的线对线部分。该功能仍可以逐行替换

此实现无需使用临时文件即可替换文件内容,因此文件权限保持不变。

同样,re.sub代替replace,仅允许正则表达式代替纯文本替换。

以单个字符串而不是逐行读取文件可以进行多行匹配和替换。

import re

def replace(file, pattern, subst):
    # Read contents from file as a single string
    file_handle = open(file, 'r')
    file_string = file_handle.read()
    file_handle.close()

    # Use RE package to allow for replacement (also allowing for (multiline) REGEX)
    file_string = (re.sub(pattern, subst, file_string))

    # Write contents to file.
    # Using mode 'w' truncates the file.
    file_handle = open(file, 'w')
    file_handle.write(file_string)
    file_handle.close()

Based on the answer by Thomas Watnedal. However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis

This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.

Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.

Reading the file as a single string instead of line by line allows for multiline match and replacement.

import re

def replace(file, pattern, subst):
    # Read contents from file as a single string
    file_handle = open(file, 'r')
    file_string = file_handle.read()
    file_handle.close()

    # Use RE package to allow for replacement (also allowing for (multiline) REGEX)
    file_string = (re.sub(pattern, subst, file_string))

    # Write contents to file.
    # Using mode 'w' truncates the file.
    file_handle = open(file, 'w')
    file_handle.write(file_string)
    file_handle.close()

回答 5

就像lassevk建议的那样,随时随地写出新文件,这是一些示例代码:

fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
    fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()

As lassevk suggests, write out the new file as you go, here is some example code:

fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
    fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()

回答 6

如果您想要一个通用函数来将任何文本替换为其他文本,那么这可能是最好的选择,特别是如果您是正则表达式的支持者:

import re
def replace( filePath, text, subs, flags=0 ):
    with open( filePath, "r+" ) as file:
        fileContents = file.read()
        textPattern = re.compile( re.escape( text ), flags )
        fileContents = textPattern.sub( subs, fileContents )
        file.seek( 0 )
        file.truncate()
        file.write( fileContents )

If you’re wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you’re a fan of regex’s:

import re
def replace( filePath, text, subs, flags=0 ):
    with open( filePath, "r+" ) as file:
        fileContents = file.read()
        textPattern = re.compile( re.escape( text ), flags )
        fileContents = textPattern.sub( subs, fileContents )
        file.seek( 0 )
        file.truncate()
        file.write( fileContents )

回答 7

更加Python化的方式是使用上下文管理器,如下面的代码:

from tempfile import mkstemp
from shutil import move
from os import remove

def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()
    with open(target_file_path, 'w') as target_file:
        with open(source_file_path, 'r') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

您可以在此处找到完整的代码段。

A more pythonic way would be to use context managers like the code below:

from tempfile import mkstemp
from shutil import move
from os import remove

def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()
    with open(target_file_path, 'w') as target_file:
        with open(source_file_path, 'r') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

You can find the full snippet here.


回答 8

创建一个新文件,将行从旧复制到新,并在将行写入新文件之前进行替换。

Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.


回答 9

扩展@Kiran的答案(我同意是更简洁和Pythonic的),这增加了编解码器以支持UTF-8的读写:

import codecs 

from tempfile import mkstemp
from shutil import move
from os import remove


def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()

    with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
        with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

Expanding on @Kiran’s answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:

import codecs 

from tempfile import mkstemp
from shutil import move
from os import remove


def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()

    with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
        with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

回答 10

使用hamishmcn的答案作为模板,我能够在文件中搜索与我的正则表达式匹配的一行并将其替换为空字符串。

import re 

fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
    p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
    newline = p.sub('',line) # replace matching strings with empty string
    print newline
    fout.write(newline)
fin.close()
fout.close()

Using hamishmcn’s answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.

import re 

fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
    p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
    newline = p.sub('',line) # replace matching strings with empty string
    print newline
    fout.write(newline)
fin.close()
fout.close()

回答 11

fileinput 如先前的答案所述,它非常简单:

import fileinput

def replace_in_file(file_path, search_text, new_text):
    with fileinput.input(file_path, inplace=True) as f:
        for line in f:
            new_line = line.replace(search_text, new_text)
            print(new_line, end='')

说明:

  • fileinput可以接受多个文件,但是我更喜欢在处理每个文件后立即将其关闭。所以,放置单file_pathwith声明。
  • printinplace=True,语句不会打印任何内容,因为STDOUT将其转发到原始文件。
  • end=''in print语句是消除中间的空白新行。

可以如下使用:

file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')

fileinput is quite straightforward as mentioned on previous answers:

import fileinput

def replace_in_file(file_path, search_text, new_text):
    with fileinput.input(file_path, inplace=True) as f:
        for line in f:
            new_line = line.replace(search_text, new_text)
            print(new_line, end='')

Explanation:

  • fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
  • print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
  • end='' in print statement is to eliminate intermediate blank new lines.

Can be used as follows:

file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')

回答 12

如果您在以下位置删除缩进,它将搜索并替换成多行。参见以下示例。

def replace(file, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    print fh, abs_path
    new_file = open(abs_path,'w')
    old_file = open(file)
    for line in old_file:
        new_file.write(line.replace(pattern, subst))
    #close temp file
    new_file.close()
    close(fh)
    old_file.close()
    #Remove original file
    remove(file)
    #Move new file
    move(abs_path, file)

if you remove the indent at the like below, it will search and replace in multiple line. See below for example.

def replace(file, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    print fh, abs_path
    new_file = open(abs_path,'w')
    old_file = open(file)
    for line in old_file:
        new_file.write(line.replace(pattern, subst))
    #close temp file
    new_file.close()
    close(fh)
    old_file.close()
    #Remove original file
    remove(file)
    #Move new file
    move(abs_path, file)

如何正确断言pytest中引发了异常?

问题:如何正确断言pytest中引发了异常?

码:

# coding=utf-8
import pytest


def whatever():
    return 9/0

def test_whatever():
    try:
        whatever()
    except ZeroDivisionError as exc:
        pytest.fail(exc, pytrace=True)

输出:

================================ test session starts =================================
platform linux2 -- Python 2.7.3 -- py-1.4.20 -- pytest-2.5.2
plugins: django, cov
collected 1 items 

pytest_test.py F

====================================== FAILURES ======================================
___________________________________ test_whatever ____________________________________

    def test_whatever():
        try:
            whatever()
        except ZeroDivisionError as exc:
>           pytest.fail(exc, pytrace=True)
E           Failed: integer division or modulo by zero

pytest_test.py:12: Failed
============================== 1 failed in 1.16 seconds ==============================

如何使pytest打印回溯,所以我会看到在whatever函数中引发异常的地方?

Code:

# coding=utf-8
import pytest


def whatever():
    return 9/0

def test_whatever():
    try:
        whatever()
    except ZeroDivisionError as exc:
        pytest.fail(exc, pytrace=True)

Output:

================================ test session starts =================================
platform linux2 -- Python 2.7.3 -- py-1.4.20 -- pytest-2.5.2
plugins: django, cov
collected 1 items 

pytest_test.py F

====================================== FAILURES ======================================
___________________________________ test_whatever ____________________________________

    def test_whatever():
        try:
            whatever()
        except ZeroDivisionError as exc:
>           pytest.fail(exc, pytrace=True)
E           Failed: integer division or modulo by zero

pytest_test.py:12: Failed
============================== 1 failed in 1.16 seconds ==============================

How to make pytest print traceback, so I would see where in the whatever function an exception was raised?


回答 0

pytest.raises(Exception) 是您所需要的。

import pytest

def test_passes():
    with pytest.raises(Exception) as e_info:
        x = 1 / 0

def test_passes_without_info():
    with pytest.raises(Exception):
        x = 1 / 0

def test_fails():
    with pytest.raises(Exception) as e_info:
        x = 1 / 1

def test_fails_without_info():
    with pytest.raises(Exception):
        x = 1 / 1

# Don't do this. Assertions are caught as exceptions.
def test_passes_but_should_not():
    try:
        x = 1 / 1
        assert False
    except Exception:
        assert True

# Even if the appropriate exception is caught, it is bad style,
# because the test result is less informative
# than it would be with pytest.raises(e)
# (it just says pass or fail.)

def test_passes_but_bad_style():
    try:
        x = 1 / 0
        assert False
    except ZeroDivisionError:
        assert True

def test_fails_but_bad_style():
    try:
        x = 1 / 1
        assert False
    except ZeroDivisionError:
        assert True

输出量

============================================================================================= test session starts ==============================================================================================
platform linux2 -- Python 2.7.6 -- py-1.4.26 -- pytest-2.6.4
collected 7 items 

test.py ..FF..F

=================================================================================================== FAILURES ===================================================================================================
__________________________________________________________________________________________________ test_fails __________________________________________________________________________________________________

    def test_fails():
        with pytest.raises(Exception) as e_info:
>           x = 1 / 1
E           Failed: DID NOT RAISE

test.py:13: Failed
___________________________________________________________________________________________ test_fails_without_info ____________________________________________________________________________________________

    def test_fails_without_info():
        with pytest.raises(Exception):
>           x = 1 / 1
E           Failed: DID NOT RAISE

test.py:17: Failed
___________________________________________________________________________________________ test_fails_but_bad_style ___________________________________________________________________________________________

    def test_fails_but_bad_style():
        try:
            x = 1 / 1
>           assert False
E           assert False

test.py:43: AssertionError
====================================================================================== 3 failed, 4 passed in 0.02 seconds ======================================================================================

请注意,e_info将保存异常对象,以便您可以从中提取详细信息。例如,如果要检查异常调用堆栈或内部的另一个嵌套异常。

pytest.raises(Exception) is what you need.

Code

import pytest

def test_passes():
    with pytest.raises(Exception) as e_info:
        x = 1 / 0

def test_passes_without_info():
    with pytest.raises(Exception):
        x = 1 / 0

def test_fails():
    with pytest.raises(Exception) as e_info:
        x = 1 / 1

def test_fails_without_info():
    with pytest.raises(Exception):
        x = 1 / 1

# Don't do this. Assertions are caught as exceptions.
def test_passes_but_should_not():
    try:
        x = 1 / 1
        assert False
    except Exception:
        assert True

# Even if the appropriate exception is caught, it is bad style,
# because the test result is less informative
# than it would be with pytest.raises(e)
# (it just says pass or fail.)

def test_passes_but_bad_style():
    try:
        x = 1 / 0
        assert False
    except ZeroDivisionError:
        assert True

def test_fails_but_bad_style():
    try:
        x = 1 / 1
        assert False
    except ZeroDivisionError:
        assert True

Output

============================================================================================= test session starts ==============================================================================================
platform linux2 -- Python 2.7.6 -- py-1.4.26 -- pytest-2.6.4
collected 7 items 

test.py ..FF..F

=================================================================================================== FAILURES ===================================================================================================
__________________________________________________________________________________________________ test_fails __________________________________________________________________________________________________

    def test_fails():
        with pytest.raises(Exception) as e_info:
>           x = 1 / 1
E           Failed: DID NOT RAISE

test.py:13: Failed
___________________________________________________________________________________________ test_fails_without_info ____________________________________________________________________________________________

    def test_fails_without_info():
        with pytest.raises(Exception):
>           x = 1 / 1
E           Failed: DID NOT RAISE

test.py:17: Failed
___________________________________________________________________________________________ test_fails_but_bad_style ___________________________________________________________________________________________

    def test_fails_but_bad_style():
        try:
            x = 1 / 1
>           assert False
E           assert False

test.py:43: AssertionError
====================================================================================== 3 failed, 4 passed in 0.02 seconds ======================================================================================

Note that e_info saves the exception object so you can extract details from it. For example, if you want to check the exception call stack or another nested exception inside.


回答 1

您的意思是这样的吗:

def test_raises():
    with pytest.raises(Exception) as execinfo:   
        raise Exception('some info')
    # these asserts are identical; you can use either one   
    assert execinfo.value.args[0] == 'some info'
    assert str(execinfo.value) == 'some info'

Do you mean something like this:

def test_raises():
    with pytest.raises(Exception) as execinfo:   
        raise Exception('some info')
    # these asserts are identical; you can use either one   
    assert execinfo.value.args[0] == 'some info'
    assert str(execinfo.value) == 'some info'

回答 2

有两种方法可以在pytest中处理此类情况:

  • 使用pytest.raises功能

  • 使用pytest.mark.xfail装饰器

用途pytest.raises

def whatever():
    return 9/0
def test_whatever():
    with pytest.raises(ZeroDivisionError):
        whatever()

用途pytest.mark.xfail

@pytest.mark.xfail(raises=ZeroDivisionError)
def test_whatever():
    whatever()

输出pytest.raises

============================= test session starts ============================
platform linux2 -- Python 2.7.10, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- 
/usr/local/python_2.7_10/bin/python
cachedir: .cache
rootdir: /home/user, inifile:
collected 1 item

test_fun.py::test_whatever PASSED


======================== 1 passed in 0.01 seconds =============================

pytest.xfail标记的输出:

============================= test session starts ============================
platform linux2 -- Python 2.7.10, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- 
/usr/local/python_2.7_10/bin/python
cachedir: .cache
rootdir: /home/user, inifile:
collected 1 item

test_fun.py::test_whatever xfail

======================== 1 xfailed in 0.03 seconds=============================

文档所述

使用pytest.raises很可能是更好的为你在哪里测试异常你自己的代码被刻意提高的情况下,而使用@pytest.mark.xfail具有校验功能可能是更好的东西,像记录未修正的错误(如测试描述什么是“应该”发生)或错误的依赖。

There are two ways to handle these kind of cases in pytest:

  • Using pytest.raises function

  • Using pytest.mark.xfail decorator

As the documentation says:

Using pytest.raises is likely to be better for cases where you are testing exceptions your own code is deliberately raising, whereas using @pytest.mark.xfail with a check function is probably better for something like documenting unfixed bugs (where the test describes what “should” happen) or bugs in dependencies.

Usage of pytest.raises:

def whatever():
    return 9/0
def test_whatever():
    with pytest.raises(ZeroDivisionError):
        whatever()

Usage of pytest.mark.xfail:

@pytest.mark.xfail(raises=ZeroDivisionError)
def test_whatever():
    whatever()

Output of pytest.raises:

============================= test session starts ============================
platform linux2 -- Python 2.7.10, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- 
/usr/local/python_2.7_10/bin/python
cachedir: .cache
rootdir: /home/user, inifile:
collected 1 item

test_fun.py::test_whatever PASSED


======================== 1 passed in 0.01 seconds =============================

Output of pytest.xfail marker:

============================= test session starts ============================
platform linux2 -- Python 2.7.10, pytest-3.2.3, py-1.4.34, pluggy-0.4.0 -- 
/usr/local/python_2.7_10/bin/python
cachedir: .cache
rootdir: /home/user, inifile:
collected 1 item

test_fun.py::test_whatever xfail

======================== 1 xfailed in 0.03 seconds=============================

回答 3

你可以试试

def test_exception():
    with pytest.raises(Exception) as excinfo:   
        function_that_raises_exception()   
    assert str(excinfo.value) == 'some info' 

you can try

def test_exception():
    with pytest.raises(Exception) as excinfo:   
        function_that_raises_exception()   
    assert str(excinfo.value) == 'some info' 

回答 4

pytest不断发展,并且随着最近的一次不错的变化,现在可以同时测试

  • 异常类型(严格测试)
  • 错误消息(使用正则表达式进行严格检查或宽松检查)

文档中的两个示例:

with pytest.raises(ValueError, match='must be 0 or None'):
    raise ValueError('value must be 0 or None')
with pytest.raises(ValueError, match=r'must be \d+$'):
    raise ValueError('value must be 42')

我已经在许多项目中使用了这种方法,并且非常喜欢它。

pytest constantly evolves and with one of the nice changes in the recent past it is now possible to simultaneously test for

  • the exception type (strict test)
  • the error message (strict or loose check using a regular expression)

Two examples from the documentation:

with pytest.raises(ValueError, match='must be 0 or None'):
    raise ValueError('value must be 0 or None')
with pytest.raises(ValueError, match=r'must be \d+$'):
    raise ValueError('value must be 42')

I have been using that approach in a number of projects and like it very much.


回答 5

正确的方法正在使用,pytest.raises但是我在这里的评论中找到了有趣的替代方法,并希望将其保存给以后这个问题的读者:

try:
    thing_that_rasises_typeerror()
    assert False
except TypeError:
    assert True

Right way is using pytest.raises but I found interesting alternative way in comments here and want to save it for future readers of this question:

try:
    thing_that_rasises_typeerror()
    assert False
except TypeError:
    assert True

回答 6

此解决方案是我们正在使用的解决方案:

def test_date_invalidformat():
    """
    Test if input incorrect data will raises ValueError exception
    """
    date = "06/21/2018 00:00:00"
    with pytest.raises(ValueError):
        app.func(date) #my function to be tested

请参阅pytest,https: //docs.pytest.org/en/latest/reference.html#pytest-raises

This solution is what we are using:

def test_date_invalidformat():
    """
    Test if input incorrect data will raises ValueError exception
    """
    date = "06/21/2018 00:00:00"
    with pytest.raises(ValueError):
        app.func(date) #my function to be tested

Please refer to pytest, https://docs.pytest.org/en/latest/reference.html#pytest-raises


回答 7

更好的做法是使用继承unittest.TestCase并运行self.assertRaises的类。

例如:

import unittest


def whatever():
    return 9/0


class TestWhatEver(unittest.TestCase):

    def test_whatever():
        with self.assertRaises(ZeroDivisionError):
            whatever()

然后,您可以通过运行以下命令来执行它:

pytest -vs test_path

Better practice will be using a class that inherit unittest.TestCase and running self.assertRaises.

For example:

import unittest


def whatever():
    return 9/0


class TestWhatEver(unittest.TestCase):

    def test_whatever():
        with self.assertRaises(ZeroDivisionError):
            whatever()

Then you would execute it by running:

pytest -vs test_path

回答 8

您是否尝试删除“ pytrace = True”?

pytest.fail(exc, pytrace=True) # before
pytest.fail(exc) # after

您是否尝试过使用’–fulltrace’吗?

Have you tried to remove “pytrace=True” ?

pytest.fail(exc, pytrace=True) # before
pytest.fail(exc) # after

Have you tried to run with ‘–fulltrace’ ?


将不同类型的项目列表作为字符串加入Python

问题:将不同类型的项目列表作为字符串加入Python

我需要加入项目清单。列表中的许多项目都是从函数返回的整数值;即

myList.append(munfunc()) 

如何将返回的结果转换为字符串,以便将其与列表连接?

我是否需要对每个整数值执行以下操作:

myList.append(str(myfunc()))

有没有更Python化的方法来解决铸造问题?

I need to join a list of items. Many of the items in the list are integer values returned from a function; i.e.,

myList.append(munfunc()) 

How should I convert the returned result to a string in order to join it with the list?

Do I need to do the following for every integer value:

myList.append(str(myfunc()))

Is there a more Pythonic way to solve casting problems?


回答 0

调用str(...)是将内容转换为字符串的Python方式。

您可能需要考虑为什么要使用字符串列表。您可以改为将其保存为整数列表,并且仅在需要显示整数时才将其转换为字符串。例如,如果您有一个整数列表,则可以在for循环中将它们一一转换,然后将它们与结合,

print(','.join(str(x) for x in list_of_ints))

Calling str(...) is the Pythonic way to convert something to a string.

You might want to consider why you want a list of strings. You could instead keep it as a list of integers and only convert the integers to strings when you need to display them. For example, if you have a list of integers then you can convert them one by one in a for-loop and join them with ,:

print(','.join(str(x) for x in list_of_ints))

回答 1

将整数传递给str并没有错。您可能不这样做的一个原因是myList实际上应该是整数列表,例如,将列表中的值相加是合理的。在这种情况下,在将int附加到myList之前,请勿将其传递给str。如果最终在追加之前没有转换为字符串,则可以通过执行以下操作来构造一个大字符串

', '.join(map(str, myList))

There’s nothing wrong with passing integers to str. One reason you might not do this is that myList is really supposed to be a list of integers e.g. it would be reasonable to sum the values in the list. In that case, do not pass your ints to str before appending them to myList. If you end up not converting to strings before appending, you can construct one big string by doing something like

', '.join(map(str, myList))

回答 2

可以使用python中的map函数。它有两个参数。第一个参数是必须用于列表中每个元素的函数。第二个论点是可迭代的

a = [1, 2, 3]   
map(str, a)  
['1', '2', '3']

将列表转换为字符串后,可以使用简单的连接功能将列表组合为单个字符串

a = map(str, a)    
''.join(a)      
'123'

map function in python can be used. It takes two arguments. First argument is the function which has to be used for each element of the list. Second argument is the iterable.

a = [1, 2, 3]   
map(str, a)  
['1', '2', '3']

After converting the list into string you can use simple join function to combine list into a single string

a = map(str, a)    
''.join(a)      
'123'

回答 3

a=[1,2,3]
b=[str(x) for x in a]
print b

上面的方法是将列表转换为字符串的最简单,最通用的方法。另一个简短的方法是-

a=[1,2,3]
b=map(str,a)
print b
a=[1,2,3]
b=[str(x) for x in a]
print b

above method is the easiest and most general way to convert list into string. another short method is-

a=[1,2,3]
b=map(str,a)
print b

回答 4

有三种方法可以做到这一点。

假设您有一个整数列表

my_list = [100,200,300]
  1. "-".join(str(n) for n in my_list)
  2. "-".join([str(n) for n in my_list])
  3. "-".join(map(str, my_list))

但是,如python网站https://docs.python.org/2/library/timeit.html上的timeit示例中所述,使用地图的速度更快。所以我建议您使用"-".join(map(str, my_list))

There are three ways of doing this.

let say you have a list of integers

my_list = [100,200,300]
  1. "-".join(str(n) for n in my_list)
  2. "-".join([str(n) for n in my_list])
  3. "-".join(map(str, my_list))

However as stated in the example of timeit on python website at https://docs.python.org/2/library/timeit.html using a map is faster. So I would recommend you using "-".join(map(str, my_list))


回答 5

您的问题很清楚。也许您正在寻找扩展,以便将另一个列表的所有元素添加到现有列表中:

>>> x = [1,2]
>>> x.extend([3,4,5])
>>> x
[1, 2, 3, 4, 5]

如果要将整数转换为字符串,请使用str()或字符串插值,可能与列表推导结合使用,即

>>> x = ['1', '2']
>>> x.extend([str(i) for i in range(3, 6)])
>>> x
['1', '2', '3', '4', '5']

所有这些都被认为是pythonic的(好吧,生成器表达式更像是pythonic的,但让我们保持话题简单)

Your problem is rather clear. Perhaps you’re looking for extend, to add all elements of another list to an existing list:

>>> x = [1,2]
>>> x.extend([3,4,5])
>>> x
[1, 2, 3, 4, 5]

If you want to convert integers to strings, use str() or string interpolation, possibly combined with a list comprehension, i.e.

>>> x = ['1', '2']
>>> x.extend([str(i) for i in range(3, 6)])
>>> x
['1', '2', '3', '4', '5']

All of this is considered pythonic (ok, a generator expression is even more pythonic but let’s stay simple and on topic)


回答 6

例如:

lst_points = [[313, 262, 470, 482], [551, 254, 697, 449]]

lst_s_points = [" ".join(map(str, lst)) for lst in lst_points]
print lst_s_points
# ['313 262 470 482', '551 254 697 449']

对于我来说,我想str在每个str列表之前添加一个:

# here o means class, other four points means coordinate
print ['0 ' + " ".join(map(str, lst)) for lst in lst_points]
# ['0 313 262 470 482', '0 551 254 697 449']

或单个列表:

lst = [313, 262, 470, 482]
lst_str = [str(i) for i in lst]
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

lst_str = map(str, lst)
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

For example:

lst_points = [[313, 262, 470, 482], [551, 254, 697, 449]]

lst_s_points = [" ".join(map(str, lst)) for lst in lst_points]
print lst_s_points
# ['313 262 470 482', '551 254 697 449']

As to me, I want to add a str before each str list:

# here o means class, other four points means coordinate
print ['0 ' + " ".join(map(str, lst)) for lst in lst_points]
# ['0 313 262 470 482', '0 551 254 697 449']

Or single list:

lst = [313, 262, 470, 482]
lst_str = [str(i) for i in lst]
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

lst_str = map(str, lst)
print lst_str, ", ".join(lst_str)
# ['313', '262', '470', '482'], 313, 262, 470, 482

回答 7

也许您不需要数字作为字符串,只需执行以下操作:

functaulu = [munfunc(arg) for arg in range(loppu)]

稍后,如果您需要将其作为字符串,则可以使用字符串或格式字符串来实现:

print "Vastaus5 = %s" % functaulu[5]

Maybe you do not need numbers as strings, just do:

functaulu = [munfunc(arg) for arg in range(loppu)]

Later if you need it as string you can do it with string or with format string:

print "Vastaus5 = %s" % functaulu[5]


回答 8

没人会喜欢repr吗?
python 3.7.2:

>>> int_list = [1, 2, 3, 4, 5]
>>> print(repr(int_list))
[1, 2, 3, 4, 5]
>>> 

请注意,这是一个明确的表示。一个示例显示:

#Print repr(object) backwards
>>> print(repr(int_list)[::-1])
]5 ,4 ,3 ,2 ,1[
>>> 

pydocs-repr上的更多信息

How come no-one seems to like repr?
python 3.7.2:

>>> int_list = [1, 2, 3, 4, 5]
>>> print(repr(int_list))
[1, 2, 3, 4, 5]
>>> 

Take care though, it’s an explicit representation. An example shows:

#Print repr(object) backwards
>>> print(repr(int_list)[::-1])
]5 ,4 ,3 ,2 ,1[
>>> 

more info at pydocs-repr