2套并集不包含所有项目

问题:2套并集不包含所有项目

在下面的联合中更改两个集合的顺序时,为什么会得到不同的结果?

set1 = {1, 2, 3}
set2 = {True, False}

print(set1 | set2)
# {False, 1, 2, 3}

print(set2 | set1)
#{False, True, 2, 3}

How come when I change the order of the two sets in the unions below, I get different results?

set1 = {1, 2, 3}
set2 = {True, False}

print(set1 | set2)
# {False, 1, 2, 3}

print(set2 | set1)
#{False, True, 2, 3}

回答 0

为什么union()不包含所有项目

1True是等价的,被认为是重复的。同样,0False也等效:

>>> 1 == True
True
>>> 0 == False
True

使用哪个等效值

当遇到多个等效值时,集合将保持第一个可见:

>>> {0, False}
{0}
>>> {False, 0}
{False}

使价值观与众不同的方法

为了使它们与众不同,只需将它们(value, type)成对存储:

>>> set1 = {(1, int), (2, int), (3, int)}
>>> set2 = {(True, bool), (False, bool)}
>>> set1 | set2
{(3, <class 'int'>), (1, <class 'int'>), (2, <class 'int'>),
 (True, <class 'bool'>), (False, <class 'bool'>)}
>>> set1 & set2
set()

区分值的另一种方法是将它们存储为字符串:

>>> set1 = {'1', '2', '3'}
>>> set2 = {'True', 'False'}
>>> set1 | set2
{'2', '3', 'False', 'True', '1'}
>>> set1 & set2
set()

希望这可以消除谜团并显示前进的方向:-)


从评论中救出:

这是为破坏十字型等价(即标准技术0.0 == 0True == 1以及Decimal(8.5) == 8.5)该技术是在Python 2.7的正则表达式模块用于力的unicode正则表达式被从其他等效的STR正则表达式清楚地高速缓存。是在Python也使用的技术类型参数为true时,对于functools.lru_cache()为3。

如果OP需要除默认等效关系以外的其他内容,则需要定义一些新关系。根据使用情况,可能是字符串不区分大小写,Unicode规范化,视觉外观(看起来不同的事物被认为是不同的),身份(没有两个不同的对象被认为是相等的),值/类型对或其他一些定义等价关系的函数。给定OP的特定示例,他/她似乎期望按类型区分或视觉区分。

Why the union() doesn’t contain all items

The 1 and True are equivalent and considered to be duplicates. Likewise the 0 and False are equivalent as well:

>>> 1 == True
True
>>> 0 == False
True

Which equivalent value is used

When multiple equivalent values are encountered, sets keep the first one seen:

>>> {0, False}
{0}
>>> {False, 0}
{False}

Ways to make the values be distinct

To get them to be treated as distinct, just store them in a (value, type) pair:

>>> set1 = {(1, int), (2, int), (3, int)}
>>> set2 = {(True, bool), (False, bool)}
>>> set1 | set2
{(3, <class 'int'>), (1, <class 'int'>), (2, <class 'int'>),
 (True, <class 'bool'>), (False, <class 'bool'>)}
>>> set1 & set2
set()

Another way to make the values distinct is to store them as strings:

>>> set1 = {'1', '2', '3'}
>>> set2 = {'True', 'False'}
>>> set1 | set2
{'2', '3', 'False', 'True', '1'}
>>> set1 & set2
set()

Hope this clears up the mystery and shows the way forward :-)


Rescued from the comments:

This is the standard technique for breaking cross-type equivalence (i.e. 0.0 == 0, True == 1, and Decimal(8.5) == 8.5). The technique is used in Python 2.7’s regular expression module to force unicode regexes to be cached distinctly from otherwise equivalent str regexes. The technique is also used in Python 3 for functools.lru_cache() when the typed parameter is true.

If the OP needs something other than the default equivalence relation, then some new relation needs to be defined. Depending the use case, that could be case-insensitivity for strings, normalization for unicode, visual appearance (things that look different are considered different), identity (no two distinct objects are considered equal), a value/type pair, or some other function that defines an equivalence relation. Given the OPs specific example, it would seem that he/she expected either distinction by type or visual distinction.


回答 1

在Python,False0被认为是等效的,True1。因为True1被视为相同的值,所以它们中的一个只能同时出现在集合中。哪一个取决于它们添加到集合中的顺序。在第一行中,set1将其用作第一集合,因此我们可以得到1结果集合。在第二组中,True在第一组中,因此True包含在结果中。

In Python, False and 0 are considered equivalent, as are True and 1. Because True and 1 are considered the same value, only one of them can be present in a set a the same time. Which one depends on the order they are added to the set in. In the first line, set1 is used as the first set, so we get 1 in the resulting set. In the second set, True is in the first set, so True is included in the result.


回答 2

如果您查看https://docs.python.org/3/library/stdtypes.html#boolean-values部分4.12.10。布尔值:

布尔值是两个常量对象False和True。它们用于表示真值(尽管其他值也可以视为假或真)。在数字上下文中(例如,用作算术运算符的参数时),它们的行为分别类似于整数0和1

If you look at https://docs.python.org/3/library/stdtypes.html#boolean-values section 4.12.10. Boolean Values:

Boolean values are the two constant objects False and True. They are used to represent truth values (although other values can also be considered false or true). In numeric contexts (for example when used as the argument to an arithmetic operator), they behave like the integers 0 and 1, respectively.


回答 3

为布尔定义了比较运算符(==!=),TrueFalse匹配1和0。

这就是为什么在集合联合中,当它检查是否True已经在新集合中时,会得到一个真实的答案:

>>> True in {1}
True
>>> 1 in {True}
True

The comparison operator (==, !=) is defined for boolean True and False to match 1 and 0.

That’s why, in the set union, when it checks whether True is in the new set already, it gets a truthy answer:

>>> True in {1}
True
>>> 1 in {True}
True