标签归档:uuid

什么时候应该在python中使用uuid.uuid1()和uuid.uuid4()?

问题:什么时候应该在python中使用uuid.uuid1()和uuid.uuid4()?

我从文档中了解了两者之间的区别

uuid1()
根据主机ID,序列号和当前时间生成UUID

uuid4()
生成随机UUID。

因此,uuid1使用机器/序列/时间信息来生成UUID。使用每种方法的利弊是什么?

我知道uuid1()可能会涉及到隐私问题,因为它基于机器信息。我想知道在选择一个或另一个时是否还有其他细微之处。我uuid4()现在就使用,因为它是完全随机的UUID。但是我想知道我是否应该使用它uuid1来减少碰撞的风险。

基本上,我正在寻找人们使用某一种方法与另一种方法的最佳使用技巧。谢谢!

I understand the differences between the two from the docs.

uuid1():
Generate a UUID from a host ID, sequence number, and the current time

uuid4():
Generate a random UUID.

So uuid1 uses machine/sequence/time info to generate a UUID. What are the pros and cons of using each?

I know uuid1() can have privacy concerns, since it’s based off of machine-information. I wonder if there’s any more subtle when choosing one or the other. I just use uuid4() right now, since it’s a completely random UUID. But I wonder if I should be using uuid1 to lessen the risk of collisions.

Basically, I’m looking for people’s tips for best-practices on using one vs. the other. Thanks!


回答 0

uuid1()确保不会产生任何碰撞(假设您不会同时创建太多碰撞)。如果uuid计算机与计算机之间没有连接很重要,那么我就不会使用它,因为mac地址已被用来使它在计算机之间具有唯一性。

您可以通过在不到100ns 的时间内创建2 以上的14 uuid1 来创建重复项,但这对于大多数用例而言都不是问题。

uuid4()如您所说,生成一个随机的UUID。碰撞的机会确实很小。足够小,您不必担心。问题在于,不良的随机数生成器使其更有可能发生冲突。

鲍勃·阿曼的出色回答很好地总结了这一点。(我建议阅读整个答案。)

坦白说,在没有恶意行为者的单个应用程序空间中,即使在每秒生成大量UUID的情况下,即使在版本4 UUID发生碰撞之前,地球上所有生命的消亡也会发生很久。

uuid1() is guaranteed to not produce any collisions (under the assumption you do not create too many of them at the same time). I wouldn’t use it if it’s important that there’s no connection between the uuid and the computer, as the mac address gets used to make it unique across computers.

You can create duplicates by creating more than 214 uuid1 in less than 100ns, but this is not a problem for most use cases.

uuid4() generates, as you said, a random UUID. The chance of a collision is really, really, really small. Small enough, that you shouldn’t worry about it. The problem is, that a bad random-number generator makes it more likely to have collisions.

This excellent answer by Bob Aman sums it up nicely. (I recommend reading the whole answer.)

Frankly, in a single application space without malicious actors, the extinction of all life on earth will occur long before you have a collision, even on a version 4 UUID, even if you’re generating quite a few UUIDs per second.


回答 1

当你可以考虑一个实例uuid1(),而不是uuid4()当UUID的是在不同的机器生产的,例如,当多个网上交易是在几台机器换算目的的过程。

在这种情况下,例如,由于初始化伪随机数生成器的方式选择不当而导致发生冲突的风险,以及产生的潜在数量更多的UUID使得创建重复ID的可能性更大。

uuid1()在这种情况下,的另一个兴趣是,最初生成每个GUID的机器都被隐式记录(在UUID的“节点”部分中)。仅在调试时,此信息和时间信息可能会有所帮助。

One instance when you may consider uuid1() rather than uuid4() is when UUIDs are produced on separate machines, for example when multiple online transactions are process on several machines for scaling purposes.

In such a situation, the risks of having collisions due to poor choices in the way the pseudo-random number generators are initialized, for example, and also the potentially higher numbers of UUIDs produced render more likely the possibility of creating duplicate IDs.

Another interest of uuid1(), in that case is that the machine where each GUID was initially produced is implicitly recorded (in the “node” part of UUID). This and the time info, may help if only with debugging.


回答 2

我的团队在使用UUID1进行数据库升级脚本时遇到了麻烦,我们在几分钟内生成了约12万个UUID。UUID冲突导致违反主键约束。

我们已经升级了100台服务器,但是在我们的Amazon EC2实例上,我们几次遇到了这个问题。我怀疑时钟分辨率不佳,切换到UUID4可以为我们解决。

My team just ran into trouble using UUID1 for a database upgrade script where we generated ~120k UUIDs within a couple of minutes. The UUID collision led to violation of a primary key constraint.

We’ve upgraded 100s of servers but on our Amazon EC2 instances we ran into this issue a few times. I suspect poor clock resolution and switching to UUID4 solved it for us.


回答 3

使用时要注意的一件事uuid1,如果使用默认调用(不提供clock_seq参数),则有可能发生冲突:您只有14位的随机性(在100ns内生成18个条目会给您大约1%的冲突机会,请参见生日悖论/攻击)。在大多数情况下,该问题永远不会发生,但是在时钟分辨率较差的虚拟机上,它会咬你。

One thing to note when using uuid1, if you use the default call (without giving clock_seq parameter) you have a chance of running into collisions: you have only 14 bit of randomness (generating 18 entries within 100ns gives you roughly 1% chance of a collision see birthday paradox/attack). The problem will never occur in most use cases, but on a virtual machine with poor clock resolution it will bite you.


回答 4

也许没有提到的是本地性。

MAC地址或基于时间的排序(UUID1)可以提高数据库性能,因为与随机分布的数字(UUID4)相比,将数字更紧密地排序在一起的工作量更少(请参见此处)。

第二个相关问题是,即使原始数据丢失或未显式存储,使用UUID1仍可用于调试(这显然与OP提到的隐私问题相冲突)。

Perhaps something that’s not been mentioned is that of locality.

A MAC address or time-based ordering (UUID1) can afford increased database performance, since it’s less work to sort numbers closer-together than those distributed randomly (UUID4) (see here).

A second related issue, is that using UUID1 can be useful in debugging, even if origin data is lost or not explicitly stored (this is obviously in conflict with the privacy issue mentioned by the OP).


回答 5

除了可接受的答案外,还有在某些情况下有用的第三种选择:

带有随机MAC的v1(“ v1mc”)

通过故意生成带有随机广播MAC地址的v1 UUID(v1规范允许),可以在v1和v4之间进行混合。生成的v1 UUID与时间有关(类似于常规v1),但是缺少所有主机特定的信息(如v4)。它的抗冲突性也更接近于v4:v1mc = 60位时间+ 61个随机位= 121个唯一位;v4 = 122个随机位。

我遇到的第一个地方是Postgres的uuid_generate_v1mc()函数。此后,我使用了以下等效的python:

from os import urandom
from uuid import uuid1
_int_from_bytes = int.from_bytes  # py3 only

def uuid1mc():
    # NOTE: The constant here is required by the UUIDv1 spec...
    return uuid1(_int_from_bytes(urandom(6), "big") | 0x010000000000)

(注意:我有一个更长,更快的版本,可以直接创建UUID对象;如果有人愿意,可以发布)


如果每秒的呼叫量很大,则有可能耗尽系统的随机性。您可以改用stdlib random模块(它可能也会更快)。但请注意:攻击者只需几百个UUID即可确定RNG状态,从而部分预测未来的UUID。

import random
from uuid import uuid1

def uuid1mc_insecure():
    return uuid1(random.getrandbits(48) | 0x010000000000)

In addition to the accepted answer, there’s a third option that can be useful in some cases:

v1 with random MAC (“v1mc”)

You can make a hybrid between v1 & v4 by deliberately generating v1 UUIDs with a random broadcast MAC address (this is allowed by the v1 spec). The resulting v1 UUID is time dependant (like regular v1), but lacks all host-specific information (like v4). It’s also much closer to v4 in it’s collision-resistance: v1mc = 60 bits of time + 61 random bits = 121 unique bits; v4 = 122 random bits.

First place I encountered this was Postgres’ uuid_generate_v1mc() function. I’ve since used the following python equivalent:

from os import urandom
from uuid import uuid1
_int_from_bytes = int.from_bytes  # py3 only

def uuid1mc():
    # NOTE: The constant here is required by the UUIDv1 spec...
    return uuid1(_int_from_bytes(urandom(6), "big") | 0x010000000000)

(note: I’ve got a longer + faster version that creates the UUID object directly; can post if anyone wants)


In case of LARGE volumes of calls/second, this has the potential to exhaust system randomness. You could use the stdlib random module instead (it will probably also be faster). But BE WARNED: it only takes a few hundred UUIDs before an attacker can determine the RNG state, and thus partially predict future UUIDs.

import random
from uuid import uuid1

def uuid1mc_insecure():
    return uuid1(random.getrandbits(48) | 0x010000000000)

如何在Python中创建GUID / UUID

问题:如何在Python中创建GUID / UUID

如何在独立于平台的Python中创建GUID?我听说有一种在Windows上使用ActivePython的方法,但这仅是Windows,因为它使用COM。有没有使用普通Python的方法?

How do I create a GUID in Python that is platform independent? I hear there is a method using ActivePython on Windows but it’s Windows only because it uses COM. Is there a method using plain Python?


回答 0

Python 2.5及更高版本中的uuid模块提供了符合RFC的UUID生成。有关详细信息,请参见模块文档和RFC。[ 来源 ]

文件:

示例(在2和3上工作):

>>> import uuid
>>> uuid.uuid4()
UUID('bd65600d-8669-4903-8a14-af88203add38')
>>> str(uuid.uuid4())
'f50ec0b7-f960-400d-91f0-c42a6d44e3d0'
>>> uuid.uuid4().hex
'9fe2c4e93f654fdbb24c02b15259716c'

The uuid module, in Python 2.5 and up, provides RFC compliant UUID generation. See the module docs and the RFC for details. [source]

Docs:

Example (working on 2 and 3):

>>> import uuid
>>> uuid.uuid4()
UUID('bd65600d-8669-4903-8a14-af88203add38')
>>> str(uuid.uuid4())
'f50ec0b7-f960-400d-91f0-c42a6d44e3d0'
>>> uuid.uuid4().hex
'9fe2c4e93f654fdbb24c02b15259716c'

回答 1

如果您使用的是Python 2.5或更高版本,则uuid模块已经包含在Python标准发行版中。

例如:

>>> import uuid
>>> uuid.uuid4()
UUID('5361a11b-615c-42bf-9bdb-e2c3790ada14')

If you’re using Python 2.5 or later, the uuid module is already included with the Python standard distribution.

Ex:

>>> import uuid
>>> uuid.uuid4()
UUID('5361a11b-615c-42bf-9bdb-e2c3790ada14')

回答 2

复制自:https : //docs.python.org/2/library/uuid.html(由于发布的链接无效,并且会不断更新)

>>> import uuid

>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

>>> # make a UUID using an MD5 hash of a namespace UUID and a name
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')

>>> # make a random UUID
>>> uuid.uuid4()
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')

>>> # make a UUID using a SHA-1 hash of a namespace UUID and a name
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')

>>> # make a UUID from a string of hex digits (braces and hyphens ignored)
>>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')

>>> # convert a UUID to a string of hex digits in standard form
>>> str(x)
'00010203-0405-0607-0809-0a0b0c0d0e0f'

>>> # get the raw 16 bytes of the UUID
>>> x.bytes
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'

>>> # make a UUID from a 16-byte string
>>> uuid.UUID(bytes=x.bytes)
UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')

Copied from : https://docs.python.org/2/library/uuid.html (Since the links posted were not active and they keep updating)

>>> import uuid

>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

>>> # make a UUID using an MD5 hash of a namespace UUID and a name
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')

>>> # make a random UUID
>>> uuid.uuid4()
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')

>>> # make a UUID using a SHA-1 hash of a namespace UUID and a name
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')

>>> # make a UUID from a string of hex digits (braces and hyphens ignored)
>>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')

>>> # convert a UUID to a string of hex digits in standard form
>>> str(x)
'00010203-0405-0607-0809-0a0b0c0d0e0f'

>>> # get the raw 16 bytes of the UUID
>>> x.bytes
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'

>>> # make a UUID from a 16-byte string
>>> uuid.UUID(bytes=x.bytes)
UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')

回答 3

我将GUID用作数据库类型操作的随机密钥。

对我来说,带有破折号和多余字符的十六进制形式似乎不必要。但我也喜欢表示十六进制数字的字符串,因为它们不包含在某些情况下可能导致问题的字符,例如“ +”,“ =”等,因此非常安全。

我使用的是网址安全的base64字符串,而不是十六进制的。但是,以下内容不符合任何UUID / GUID规范(除了具有所需的随机性之外)。

import base64
import uuid

# get a UUID - URL safe, Base64
def get_a_uuid():
    r_uuid = base64.urlsafe_b64encode(uuid.uuid4().bytes)
    return r_uuid.replace('=', '')

I use GUIDs as random keys for database type operations.

The hexadecimal form, with the dashes and extra characters seem unnecessarily long to me. But I also like that strings representing hexadecimal numbers are very safe in that they do not contain characters that can cause problems in some situations such as ‘+’,’=’, etc..

Instead of hexadecimal, I use a url-safe base64 string. The following does not conform to any UUID/GUID spec though (other than having the required amount of randomness).

import base64
import uuid

# get a UUID - URL safe, Base64
def get_a_uuid():
    r_uuid = base64.urlsafe_b64encode(uuid.uuid4().bytes)
    return r_uuid.replace('=', '')

回答 4

如果您需要为模型或唯一字段的主键传递UUID,则下面的代码将返回UUID对象-

 import uuid
 uuid.uuid4()

如果您需要将UUID用作URL的参数,则可以执行以下代码-

import uuid
str(uuid.uuid4())

如果您想要UUID的十六进制值,则可以执行以下操作-

import uuid    
uuid.uuid4().hex

If you need to pass UUID for a primary key for your model or unique field then below code returns the UUID object –

 import uuid
 uuid.uuid4()

If you need to pass UUID as a parameter for URL you can do like below code –

import uuid
str(uuid.uuid4())

If you want the hex value for a UUID you can do the below one –

import uuid    
uuid.uuid4().hex

回答 5

此功能是完全可配置的,并根据指定的格式生成唯一的uid

例如:-[8,4,4,4,12],这是提到的格式,它将生成以下uuid

LxoYNyXe-7hbQ-caJt-DSdU-PDAht56cMEWi

 import random as r

 def generate_uuid():
        random_string = ''
        random_str_seq = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
        uuid_format = [8, 4, 4, 4, 12]
        for n in uuid_format:
            for i in range(0,n):
                random_string += str(random_str_seq[r.randint(0, len(random_str_seq) - 1)])
            if n != 12:
                random_string += '-'
        return random_string

This function is fully configurable and generates unique uid based on the format specified

eg:- [8, 4, 4, 4, 12] , this is the format mentioned and it will generate the following uuid

LxoYNyXe-7hbQ-caJt-DSdU-PDAht56cMEWi

 import random as r

 def generate_uuid():
        random_string = ''
        random_str_seq = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
        uuid_format = [8, 4, 4, 4, 12]
        for n in uuid_format:
            for i in range(0,n):
                random_string += str(random_str_seq[r.randint(0, len(random_str_seq) - 1)])
            if n != 12:
                random_string += '-'
        return random_string

回答 6

2019年答案(对于Windows):

如果您希望使用永久性的UUID在Windows上唯一标识一台机器,则可以使用以下技巧:(摘自https://stackoverflow.com/a/58416992/8874388的答案)。

from typing import Optional
import re
import subprocess
import uuid

def get_windows_uuid() -> Optional[uuid.UUID]:
    try:
        # Ask Windows for the device's permanent UUID. Throws if command missing/fails.
        txt = subprocess.check_output("wmic csproduct get uuid").decode()

        # Attempt to extract the UUID from the command's result.
        match = re.search(r"\bUUID\b[\s\r\n]+([^\s\r\n]+)", txt)
        if match is not None:
            txt = match.group(1)
            if txt is not None:
                # Remove the surrounding whitespace (newlines, space, etc)
                # and useless dashes etc, by only keeping hex (0-9 A-F) chars.
                txt = re.sub(r"[^0-9A-Fa-f]+", "", txt)

                # Ensure we have exactly 32 characters (16 bytes).
                if len(txt) == 32:
                    return uuid.UUID(txt)
    except:
        pass # Silence subprocess exception.

    return None

print(get_windows_uuid())

使用Windows API获取计算机的永久UUID,然后处理字符串以确保它是有效的UUID,最后返回一个Python对象(https://docs.python.org/3/library/uuid.html),这为您提供了方便使用数据的方式(例如128位整数,十六进制字符串等)。

祝好运!

PS:子进程调用可能被直接调用Windows内核/ DLL的ctypes代替。但是出于我的目的,此功能是我所需要的。它会进行严格的验证并产生正确的结果。

2019 Answer (for Windows):

If you want a permanent UUID that identifies a machine uniquely on Windows, you can use this trick: (Copied from my answer at https://stackoverflow.com/a/58416992/8874388).

from typing import Optional
import re
import subprocess
import uuid

def get_windows_uuid() -> Optional[uuid.UUID]:
    try:
        # Ask Windows for the device's permanent UUID. Throws if command missing/fails.
        txt = subprocess.check_output("wmic csproduct get uuid").decode()

        # Attempt to extract the UUID from the command's result.
        match = re.search(r"\bUUID\b[\s\r\n]+([^\s\r\n]+)", txt)
        if match is not None:
            txt = match.group(1)
            if txt is not None:
                # Remove the surrounding whitespace (newlines, space, etc)
                # and useless dashes etc, by only keeping hex (0-9 A-F) chars.
                txt = re.sub(r"[^0-9A-Fa-f]+", "", txt)

                # Ensure we have exactly 32 characters (16 bytes).
                if len(txt) == 32:
                    return uuid.UUID(txt)
    except:
        pass # Silence subprocess exception.

    return None

print(get_windows_uuid())

Uses Windows API to get the computer’s permanent UUID, then processes the string to ensure it’s a valid UUID, and lastly returns a Python object (https://docs.python.org/3/library/uuid.html) which gives you convenient ways to use the data (such as 128-bit integer, hex string, etc).

Good luck!

PS: The subprocess call could probably be replaced with ctypes directly calling Windows kernel/DLLs. But for my purposes this function is all I need. It does strong validation and produces correct results.


回答 7

查看这篇文章,对我有很大帮助。简而言之,对我来说最好的选择是:

import random 
import string 

# defining function for random 
# string id with parameter 
def ran_gen(size, chars=string.ascii_uppercase + string.digits): 
    return ''.join(random.choice(chars) for x in range(size)) 

# function call for random string 
# generation with size 8 and string  
print (ran_gen(8, "AEIOSUMA23")) 

因为我只需要4-6个随机字符,而不需要笨重的GUID。

Check this post, helped me a lot. In short, the best option for me was:

import random 
import string 

# defining function for random 
# string id with parameter 
def ran_gen(size, chars=string.ascii_uppercase + string.digits): 
    return ''.join(random.choice(chars) for x in range(size)) 

# function call for random string 
# generation with size 8 and string  
print (ran_gen(8, "AEIOSUMA23")) 

Because I needed just 4-6 random characters instead of bulky GUID.