标签归档:random

在其他两个日期之间生成一个随机日期

问题:在其他两个日期之间生成一个随机日期

如何生成必须在其他两个给定日期之间的随机日期?

该函数的签名应如下所示:

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

并返回一个日期,例如: 2/4/2008 7:20 PM

How would I generate a random date that has to be between two other given dates?

The function’s signature should be something like this:

random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", 0.34)
                   ^                       ^          ^

            date generated has  date generated has  a random number
            to be after this    to be before this

and would return a date such as: 2/4/2008 7:20 PM


回答 0

将两个字符串都转换为时间戳(以您选择的分辨率为单位,例如毫秒,秒,小时,天等),从后一个减去前一个,将您的随机数(假设分布在中range [0, 1])乘以该差,然后再次加较早的一个。将时间戳转换回日期字符串,并且您在该范围内有一个随机时间。

Python示例(输出几乎是您指定的格式,而不是0填充-归咎于美国时间格式约定):

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

Convert both strings to timestamps (in your chosen resolution, e.g. milliseconds, seconds, hours, days, whatever), subtract the earlier from the later, multiply your random number (assuming it is distributed in the range [0, 1]) with that difference, and add again to the earlier one. Convert the timestamp back to date string and you have a random time in that range.

Python example (output is almost in the format you specified, other than 0 padding – blame the American time format conventions):

import random
import time

def str_time_prop(start, end, format, prop):
    """Get a time at a proportion of a range of two formatted times.

    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """

    stime = time.mktime(time.strptime(start, format))
    etime = time.mktime(time.strptime(end, format))

    ptime = stime + prop * (etime - stime)

    return time.strftime(format, time.localtime(ptime))


def random_date(start, end, prop):
    return str_time_prop(start, end, '%m/%d/%Y %I:%M %p', prop)

print(random_date("1/1/2008 1:30 PM", "1/1/2009 4:50 AM", random.random()))

回答 1

from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

精度是秒。如果需要,您可以将精度提高到微秒,或降低到半小时。为此,只需更改最后一行的计算即可。

示例运行:

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

输出:

2008-12-04 01:50:17
from random import randrange
from datetime import timedelta

def random_date(start, end):
    """
    This function will return a random datetime between two datetime 
    objects.
    """
    delta = end - start
    int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(int_delta)
    return start + timedelta(seconds=random_second)

The precision is seconds. You can increase precision up to microseconds, or decrease to, say, half-hours, if you want. For that just change the last line’s calculation.

example run:

from datetime import datetime

d1 = datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p')
d2 = datetime.strptime('1/1/2009 4:50 AM', '%m/%d/%Y %I:%M %p')

print(random_date(d1, d2))

output:

2008-12-04 01:50:17

回答 2

一个小版本。

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

请注意,startend参数都应该是datetime对象。如果您有字符串,则很容易转换。其他答案指出了这样做的一些方法。

A tiny version.

import datetime
import random


def random_date(start, end):
    """Generate a random datetime between `start` and `end`"""
    return start + datetime.timedelta(
        # Get a random amount of seconds between `start` and `end`
        seconds=random.randint(0, int((end - start).total_seconds())),
    )

Note that both start and end arguments should be datetime objects. If you’ve got strings instead, it’s fairly easy to convert. The other answers point to some ways to do so.


回答 3

更新的答案

使用Faker甚至更简单。

安装

pip install faker

用法:

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

旧答案

使用雷达非常简单

安装

pip install radar

用法

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

Updated answer

It’s even more simple using Faker.

Installation

pip install faker

Usage:

from faker import Faker
fake = Faker()

fake.date_between(start_date='today', end_date='+30y')
# datetime.date(2025, 3, 12)

fake.date_time_between(start_date='-30y', end_date='now')
# datetime.datetime(2007, 2, 28, 11, 28, 16)

# Or if you need a more specific date boundaries, provide the start 
# and end dates explicitly.
import datetime
start_date = datetime.date(year=2015, month=1, day=1)
fake.date_between(start_date=start_date, end_date='+30y')

Old answer

It’s very simple using radar

Installation

pip install radar

Usage

import datetime

import radar 

# Generate random datetime (parsing dates from str values)
radar.random_datetime(start='2000-05-24', stop='2013-05-24T23:59:59')

# Generate random datetime from datetime.datetime values
radar.random_datetime(
    start = datetime.datetime(year=2000, month=5, day=24),
    stop = datetime.datetime(year=2013, month=5, day=24)
)

# Just render some random datetime. If no range is given, start defaults to 
# 1970-01-01 and stop defaults to datetime.datetime.now()
radar.random_datetime()

回答 4

这是另一种方法-这种工作。

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

更好的方法

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

This is a different approach – that sort of works..

from random import randint
import datetime

date=datetime.date(randint(2005,2025), randint(1,12),randint(1,28))

BETTER APPROACH

startdate=datetime.date(YYYY,MM,DD)
date=startdate+datetime.timedelta(randint(1,365))

回答 5

由于Python 3 timedelta支持浮点数乘法,因此现在您可以执行以下操作:

import random
random_date = start + (end - start) * random.random()

鉴于startend是类型的datetime.datetime。例如,要在第二天生成一个随机的日期时间:

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

Since Python 3 timedelta supports multiplication with floats, so now you can do:

import random
random_date = start + (end - start) * random.random()

given that start and end are of the type datetime.datetime. For example, to generate a random datetime within the next day:

import random
from datetime import datetime, timedelta

start = datetime.now()
end = start + timedelta(days=1)
random_date = start + (end - start) * random.random()

回答 6

要使用基于熊猫的解决方案,我使用:

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

我喜欢它,因为很好 pd.Timestamp出色功能使我可以抛出不同的内容和格式。考虑以下几个示例…

你的签名。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

随机位置。

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

不同的格式。

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

直接传递熊猫/日期时间对象。

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

To chip in a pandas-based solution I use:

import pandas as pd
import numpy as np

def random_date(start, end, position=None):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    delta = (end - start).total_seconds()
    if position is None:
        offset = np.random.uniform(0., delta)
    else:
        offset = position * delta
    offset = pd.offsets.Second(offset)
    t = start + offset
    return t

I like it, because of the nice pd.Timestamp features that allow me to throw different stuff and formats at it. Consider the following few examples…

Your signature.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM", position=0.34)
Timestamp('2008-05-04 21:06:48', tz=None)

Random position.

>>> random_date(start="1/1/2008 1:30 PM", end="1/1/2009 4:50 AM")
Timestamp('2008-10-21 05:30:10', tz=None)

Different format.

>>> random_date('2008-01-01 13:30', '2009-01-01 4:50')
Timestamp('2008-11-18 17:20:19', tz=None)

Passing pandas/datetime objects directly.

>>> random_date(pd.datetime.now(), pd.datetime.now() + pd.offsets.Hour(3))
Timestamp('2014-03-06 14:51:16.035965', tz=None)

回答 7

这是标题标题的字面意思的答案,而不是问题的正文:

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

这段代码大致基于公认的答案。

Here is an answer to the literal meaning of the title rather than the body of this question:

import time
import datetime
import random

def date_to_timestamp(d) :
  return int(time.mktime(d.timetuple()))

def randomDate(start, end):
  """Get a random date between two dates"""

  stime = date_to_timestamp(start)
  etime = date_to_timestamp(end)

  ptime = stime + random.random() * (etime - stime)

  return datetime.date.fromtimestamp(ptime)

This code is based loosely on the accepted answer.


回答 8

您可以使用Mixer

pip install mixer

和,

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

You can Use Mixer,

pip install mixer

and,

from mixer import generators as gen
print gen.get_datetime(min_datetime=(1900, 1, 1, 0, 0, 0), max_datetime=(2020, 12, 31, 23, 59, 59))

回答 9

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Create random datetime object."""

from datetime import datetime
import random


def create_random_datetime(from_date, to_date, rand_type='uniform'):
    """
    Create random date within timeframe.

    Parameters
    ----------
    from_date : datetime object
    to_date : datetime object
    rand_type : {'uniform'}

    Examples
    --------
    >>> random.seed(28041990)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(1998, 12, 13, 23, 38, 0, 121628)
    >>> create_random_datetime(datetime(1990, 4, 28), datetime(2000, 12, 31))
    datetime.datetime(2000, 3, 19, 19, 24, 31, 193940)
    """
    delta = to_date - from_date
    if rand_type == 'uniform':
        rand = random.random()
    else:
        raise NotImplementedError('Unknown random mode \'{}\''
                                  .format(rand_type))
    return from_date + rand * delta


if __name__ == '__main__':
    import doctest
    doctest.testmod()

回答 10

将您的日期转换为时间戳并random.randint使用时间戳进行调用,然后将随机生成的时间戳转换回日期:

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

那你可以这样用

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

如果您关心时区,则应该date_time_between_datesFaker库中使用它,因为我已经从中窃取了此代码,因为已经给出了另一个答案。

Convert your dates into timestamps and call random.randint with the timestamps, then convert the randomly generated timestamp back into a date:

from datetime import datetime
import random

def random_date(first_date, second_date):
    first_timestamp = int(first_date.timestamp())
    second_timestamp = int(second_date.timestamp())
    random_timestamp = random.randint(first_timestamp, second_timestamp)
    return datetime.fromtimestamp(random_timestamp)

Then you can use it like this

from datetime import datetime

d1 = datetime.strptime("1/1/2018 1:30 PM", "%m/%d/%Y %I:%M %p")
d2 = datetime.strptime("1/1/2019 4:50 AM", "%m/%d/%Y %I:%M %p")

random_date(d1, d2)

random_date(d2, d1)  # ValueError because the first date comes after the second date

If you care about timezones you should just use date_time_between_dates from the Faker library, where I stole this code from, as a different answer already suggests.


回答 11

  1. 将输入日期转换为数字(整数,浮点数,最适合您的用法)
  2. 在两个日期数字之间选择一个数字。
  3. 将此数字转换回日期。

许多操作系统中已经提供了许多用于将日期与数字进行日期转换的算法。

  1. Convert your input dates to numbers (int, float, whatever is best for your usage)
  2. Choose a number between your two date numbers.
  3. Convert this number back to a date.

Many algorithms for converting date to and from numbers are already available in many operating systems.


回答 12

您需要什么随机数?通常(取决于语言),您可以从日期开始获取到纪元的秒数​​/毫秒数。因此,对于startDate和endDate之间的随机日期,您可以执行以下操作:

  1. 以毫秒为单位计算startDate和endDate之间的时间(endDate.toMilliseconds()-startDate.toMilliseconds())
  2. 生成一个介于0和1之间的数字
  3. 生成一个新的Date,其时间偏移量= startDate.toMilliseconds()+ 2中获得的数字

What do you need the random number for? Usually (depending on the language) you can get the number of seconds/milliseconds from the Epoch from a date. So for a randomd date between startDate and endDate you could do:

  1. compute the time in ms between startDate and endDate (endDate.toMilliseconds() – startDate.toMilliseconds())
  2. generate a number between 0 and the number you obtained in 1
  3. generate a new Date with time offset = startDate.toMilliseconds() + number obtained in 2

回答 13

最简单的方法是将两个数字都转换为时间戳,然后将其设置为随机数生成器的最小和最大界限。

一个快速的PHP示例是:

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

此函数strtotime()用于将日期时间描述转换为Unix时间戳,并date()根据已生成的随机时间戳生成有效日期。

The easiest way of doing this is to convert both numbers to timestamps, then set these as the minimum and maximum bounds on a random number generator.

A quick PHP example would be:

// Find a randomDate between $start_date and $end_date
function randomDate($start_date, $end_date)
{
    // Convert to timetamps
    $min = strtotime($start_date);
    $max = strtotime($end_date);

    // Generate random number using above bounds
    $val = rand($min, $max);

    // Convert back to desired date format
    return date('Y-m-d H:i:s', $val);
}

This function makes use of strtotime() to convert a datetime description into a Unix timestamp, and date() to make a valid date out of the random timestamp which has been generated.


回答 14

只是添加另一个:

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

日常处理需要一些注意事项。28岁时,您就在安全的网站上。

Just to add another one:

datestring = datetime.datetime.strftime(datetime.datetime( \
    random.randint(2000, 2015), \
    random.randint(1, 12), \
    random.randint(1, 28), \
    random.randrange(23), \
    random.randrange(59), \
    random.randrange(59), \
    random.randrange(1000000)), '%Y-%m-%d %H:%M:%S')

The day handling needs some considerations. With 28 you are on the secure site.


回答 15

这是从emyller的方法修改而来的解决方案,该方法以任何分辨率返回随机日期数组

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

这种方法的部分优点在于,np.datetime64它确实擅长将日期强制转换为日期,因此您可以将开始/结束日期指定为字符串,日期时间,熊猫时间戳记……几乎所有东西都可以使用。

Here’s a solution modified from emyller’s approach which returns an array of random dates at any resolution

import numpy as np

def random_dates(start, end, size=1, resolution='s'):
    """
    Returns an array of random dates in the interval [start, end]. Valid 
    resolution arguments are numpy date/time units, as documented at: 
        https://docs.scipy.org/doc/numpy-dev/reference/arrays.datetime.html
    """
    start, end = np.datetime64(start), np.datetime64(end)
    delta = (end-start).astype('timedelta64[{}]'.format(resolution))
    delta_mat = np.random.randint(0, delta.astype('int'), size)
    return start + delta_mat.astype('timedelta64[{}]'.format(resolution))

Part of what’s nice about this approach is that np.datetime64 is really good at coercing things to dates, so you can specify your start/end dates as strings, datetimes, pandas timestamps… pretty much anything will work.


回答 16

从概念上讲,这很简单。根据您所使用的语言,您将能够将这些日期转换为参考32或64位整数,通常表示自纪元(1970年1月1日)以来的秒数(否则称为“ Unix时间”)或自某个其他任意日期以来的毫秒数。只需在这两个值之间生成一个随机的32或64位整数。这应该是任何语言的统一班轮。

在某些平台上,您可以将时间生成为两倍(日期是整数部分,时间是小数部分是一种实现)。除了要处理单精度或双精度浮点数(在C,Java和其他语言中为“ floats”或“ doubles”)外,该原理均适用。减去差,乘以随机数(0 <= r <= 1),加到开始时间并完成。

Conceptually it’s quite simple. Depending on which language you’re using you will be able to convert those dates into some reference 32 or 64 bit integer, typically representing seconds since epoch (1 January 1970) otherwise known as “Unix time” or milliseconds since some other arbitrary date. Simply generate a random 32 or 64 bit integer between those two values. This should be a one liner in any language.

On some platforms you can generate a time as a double (date is the integer part, time is the fractional part is one implementation). The same principle applies except you’re dealing with single or double precision floating point numbers (“floats” or “doubles” in C, Java and other languages). Subtract the difference, multiply by random number (0 <= r <= 1), add to start time and done.


回答 17

在python中:

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

(需要python dateutil库– pip install python-dateutil

In python:

>>> from dateutil.rrule import rrule, DAILY
>>> import datetime, random
>>> random.choice(
                 list(
                     rrule(DAILY, 
                           dtstart=datetime.date(2009,8,21), 
                           until=datetime.date(2010,10,12))
                     )
                 )
datetime.datetime(2010, 2, 1, 0, 0)

(need python dateutil library – pip install python-dateutil)


回答 18

使用ApacheCommonUtils生成给定范围内的随机长度,然后在该长度范围之外创建Date。

例:

导入org.apache.commons.math.random.RandomData;

导入org.apache.commons.math.random.RandomDataImpl;

公开日期nextDate(最小日期,最大日期){

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}

Use ApacheCommonUtils to generate a random long within a given range, and then create Date out of that long.

Example:

import org.apache.commons.math.random.RandomData;

import org.apache.commons.math.random.RandomDataImpl;

public Date nextDate(Date min, Date max) {

RandomData randomData = new RandomDataImpl();

return new Date(randomData.nextLong(min.getTime(), max.getTime()));

}


回答 19

我用随机和时间为另一个项目做了这个。我从一开始就使用通用格式,您可以在此处查看strftime()中第一个参数的文档。第二部分是random.randrange函数。它在参数之间返回一个整数。将其更改为与您想要的字符串匹配的范围。在第二个扩展的元组中,您必须有很好的论据。

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

I made this for another project using random and time. I used a general format from time you can view the documentation here for the first argument in strftime(). The second part is a random.randrange function. It returns an integer between the arguments. Change it to the ranges that match the strings you would like. You must have nice arguments in the tuple of the second arugment.

import time
import random


def get_random_date():
    return strftime("%Y-%m-%d %H:%M:%S",(random.randrange(2000,2016),random.randrange(1,12),
    random.randrange(1,28),random.randrange(1,24),random.randrange(1,60),random.randrange(1,60),random.randrange(1,7),random.randrange(0,366),1))

回答 20

熊猫+ numpy解决方案

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts是时间戳之间的时间差(以秒为单位)(浮动)。然后将其用于创建介于0和dts之间的熊猫时间增量,并将其添加到开始时间戳中。

Pandas + numpy solution

import pandas as pd
import numpy as np

def RandomTimestamp(start, end):
    dts = (end - start).total_seconds()
    return start + pd.Timedelta(np.random.uniform(0, dts), 's')

dts is the difference between timestamps in seconds (float). It is then used to create a pandas timedelta between 0 and dts, that is added to the start timestamp.


回答 21

根据mouviciel的回答,这是使用numpy的矢量化解决方案。将开始日期和结束日期转换为整数,在它们之间生成一个随机数数组,然后将整个数组转换回日期。

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

Based on the answer by mouviciel, here is a vectorized solution using numpy. Convert the start and end dates to ints, generate an array of random numbers between them, and convert the whole array back to dates.

import time
import datetime
import numpy as np

n_rows = 10

start_time = "01/12/2011"
end_time = "05/08/2017"

date2int = lambda s: time.mktime(datetime.datetime.strptime(s,"%d/%m/%Y").timetuple())
int2date = lambda s: datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d %H:%M:%S')

start_time = date2int(start_time)
end_time = date2int(end_time)

random_ints = np.random.randint(low=start_time, high=end_time, size=(n_rows,1))
random_dates = np.apply_along_axis(int2date, 1, random_ints).reshape(n_rows,1)

print random_dates

回答 22

它是@(Tom Alsberg)的修改方法。我将其修改为以毫秒为单位获取日期。

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

例:

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

输出: 2028/07/08 12:34:49.977963

It’s modified method of @(Tom Alsberg). I modified it to get date with milliseconds.

import random
import time
import datetime

def random_date(start_time_string, end_time_string, format_string, random_number):
    """
    Get a time at a proportion of a range of two formatted times.
    start and end should be strings specifying times formated in the
    given format (strftime-style), giving an interval [start, end].
    prop specifies how a proportion of the interval to be taken after
    start.  The returned time will be in the specified format.
    """
    dt_start = datetime.datetime.strptime(start_time_string, format_string)
    dt_end = datetime.datetime.strptime(end_time_string, format_string)

    start_time = time.mktime(dt_start.timetuple()) + dt_start.microsecond / 1000000.0
    end_time = time.mktime(dt_end.timetuple()) + dt_end.microsecond / 1000000.0

    random_time = start_time + random_number * (end_time - start_time)

    return datetime.datetime.fromtimestamp(random_time).strftime(format_string)

Example:

print TestData.TestData.random_date("2000/01/01 00:00:00.000000", "2049/12/31 23:59:59.999999", '%Y/%m/%d %H:%M:%S.%f', random.random())

Output: 2028/07/08 12:34:49.977963


回答 23

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

参考

start_timestamp = time.mktime(time.strptime('Jun 1 2010  01:33:00', '%b %d %Y %I:%M:%S'))
end_timestamp = time.mktime(time.strptime('Jun 1 2017  12:33:00', '%b %d %Y %I:%M:%S'))
time.strftime('%b %d %Y %I:%M:%S',time.localtime(randrange(start_timestamp,end_timestamp)))

refer


回答 24

    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))
    # needed to create data for 1000 fictitious employees for testing code 
    # code relating to randomly assigning forenames, surnames, and genders
    # has been removed as not germaine to the question asked above but FYI
    # genders were randomly assigned, forenames/surnames were web scrapped,
    # there is no accounting for leap years, and the data stored in mySQL

    import random 
    from datetime import datetime
    from datetime import timedelta

    for employee in range(1000):
        # assign a random date of birth (employees are aged between sixteen and sixty five)
        dlt = random.randint(365*16, 365*65)
        dob = datetime.today() - timedelta(days=dlt)
        # assign a random date of hire sometime between sixteenth birthday and yesterday
        doh = datetime.today() - timedelta(days=random.randint(1, dlt-365*16))
        print("born {} hired {}".format(dob.strftime("%d-%m-%y"), doh.strftime("%d-%m-%y")))

回答 25

另一种方法两个日期之间创建随机日期使用np.random.randint()pd.Timestamp().valuepd.to_datetime()具有for loop

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

输出量

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

Alternative way to create random dates between two dates using np.random.randint(), pd.Timestamp().value and pd.to_datetime() with for loop:

# Import libraries
import pandas as pd

# Initialize
start = '2020-01-01' # Specify start date
end = '2020-03-10' # Specify end date
n = 10 # Specify number of dates needed

# Get random dates
x = np.random.randint(pd.Timestamp(start).value, pd.Timestamp(end).value,n)
random_dates = [pd.to_datetime((i/10**9)/(60*60)/24, unit='D').strftime('%Y-%m-%d')  for i in x]

print(random_dates)

Output

['2020-01-06',
 '2020-03-08',
 '2020-01-23',
 '2020-02-03',
 '2020-01-30',
 '2020-01-05',
 '2020-02-16',
 '2020-03-08',
 '2020-02-09',
 '2020-01-04']

生成具有给定(数字)分布的随机数

问题:生成具有给定(数字)分布的随机数

我有一个具有不同值的概率的文件,例如:

1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2

我想使用此分布生成随机数。是否存在处理此问题的现有模块?自己编写代码是很简单的(构建累积密度函数,生成随机值[0,1]并选择相应的值),但这似乎是一个常见问题,可能有人为它创建了一个函数/模块它。

我需要这个,因为我想生成一个生日列表(不遵循标准random模块中的任何分布)。

I have a file with some probabilities for different values e.g.:

1 0.1
2 0.05
3 0.05
4 0.2
5 0.4
6 0.2

I would like to generate random numbers using this distribution. Does an existing module that handles this exist? It’s fairly simple to code on your own (build the cumulative density function, generate a random value [0,1] and pick the corresponding value) but it seems like this should be a common problem and probably someone has created a function/module for it.

I need this because I want to generate a list of birthdays (which do not follow any distribution in the standard random module).


回答 0

scipy.stats.rv_discrete可能就是您想要的。您可以通过values参数提供概率。然后,您可以使用rvs()分发对象的方法来生成随机数。

正如Eugene Pakhomov在评论中指出的那样,您还可以将p关键字参数传递给numpy.random.choice(),例如

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

如果您使用的是Python 3.6或更高版本,则可以random.choices()在标准库中使用–请参见Mark Dickinson答案

scipy.stats.rv_discrete might be what you want. You can supply your probabilities via the values parameter. You can then use the rvs() method of the distribution object to generate random numbers.

As pointed out by Eugene Pakhomov in the comments, you can also pass a p keyword parameter to numpy.random.choice(), e.g.

numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

If you are using Python 3.6 or above, you can use random.choices() from the standard library – see the answer by Mark Dickinson.


回答 1

从Python 3.6开始,Python的标准库中提供了一个解决方案random.choices

用法示例:让我们设置与OP中的问题相匹配的总体和权重:

>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]

现在choices(population, weights)生成一个样本:

>>> choices(population, weights)
4

可选的仅关键字参数k允许一个参数一次请求多个样本。这很有价值,因为random.choices在生成任何样本之前,每次调用时都要做一些准备工作。通过一次生成许多样本,我们只需要做一次准备工作。在这里,我们生成了一百万个样本,并collections.Counter用来检查我们得到的分布与我们赋予的权重大致匹配。

>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})

Since Python 3.6, there’s a solution for this in Python’s standard library, namely random.choices.

Example usage: let’s set up a population and weights matching those in the OP’s question:

>>> from random import choices
>>> population = [1, 2, 3, 4, 5, 6]
>>> weights = [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]

Now choices(population, weights) generates a single sample:

>>> choices(population, weights)
4

The optional keyword-only argument k allows one to request more than one sample at once. This is valuable because there’s some preparatory work that random.choices has to do every time it’s called, prior to generating any samples; by generating many samples at once, we only have to do that preparatory work once. Here we generate a million samples, and use collections.Counter to check that the distribution we get roughly matches the weights we gave.

>>> million_samples = choices(population, weights, k=10**6)
>>> from collections import Counter
>>> Counter(million_samples)
Counter({5: 399616, 6: 200387, 4: 200117, 1: 99636, 3: 50219, 2: 50025})

回答 2

使用CDF生成列表的一个优点是可以使用二进制搜索。当您需要O(n)的时间和空间进行预处理时,您可以在O(k log n)中获得k个数字。由于普通的Python列表效率低下,因此可以使用array模块。

如果您坚持使用恒定的空间,则可以执行以下操作;O(n)时间,O(1)空间。

def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies

An advantage to generating the list using CDF is that you can use binary search. While you need O(n) time and space for preprocessing, you can get k numbers in O(k log n). Since normal Python lists are inefficient, you can use array module.

If you insist on constant space, you can do the following; O(n) time, O(1) space.

def random_distr(l):
    r = random.uniform(0, 1)
    s = 0
    for item, prob in l:
        s += prob
        if s >= r:
            return item
    return item  # Might occur because of floating point inaccuracies

回答 3

也许有点晚了。但是您可以使用numpy.random.choice()传递p参数:

val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

Maybe it is kind of late. But you can use numpy.random.choice(), passing the p parameter:

val = numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2])

回答 4

(好吧,我知道您正在要求收缩包装,但是也许这些自制的解决方案还不够简洁,无法满足您的喜好。:-)

pdf = [(1, 0.1), (2, 0.05), (3, 0.05), (4, 0.2), (5, 0.4), (6, 0.2)]
cdf = [(i, sum(p for j,p in pdf if j < i)) for i,_ in pdf]
R = max(i for r in [random.random()] for i,c in cdf if c <= r)

我通过确认此表达式的输出来伪确认此方法有效:

sorted(max(i for r in [random.random()] for i,c in cdf if c <= r)
       for _ in range(1000))

(OK, I know you are asking for shrink-wrap, but maybe those home-grown solutions just weren’t succinct enough for your liking. :-)

pdf = [(1, 0.1), (2, 0.05), (3, 0.05), (4, 0.2), (5, 0.4), (6, 0.2)]
cdf = [(i, sum(p for j,p in pdf if j < i)) for i,_ in pdf]
R = max(i for r in [random.random()] for i,c in cdf if c <= r)

I pseudo-confirmed that this works by eyeballing the output of this expression:

sorted(max(i for r in [random.random()] for i,c in cdf if c <= r)
       for _ in range(1000))

回答 5

我写了一个从自定义连续分布中抽取随机样本的解决方案。

我需要一个与您的用例类似的用例(即生成具有给定概率分布的随机日期)。

您只需要功能random_custDist和功能samples=random_custDist(x0,x1,custDist=custDist,size=1000)。剩下的就是装饰^^。

import numpy as np

#funtion
def random_custDist(x0,x1,custDist,size=None, nControl=10**6):
    #genearte a list of size random samples, obeying the distribution custDist
    #suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)
    #custDist noes not need to be normalized. Add this condition to increase performance. 
    #Best performance for max_{x in [x0,x1]} custDist(x) = 1
    samples=[]
    nLoop=0
    while len(samples)<size and nLoop<nControl:
        x=np.random.uniform(low=x0,high=x1)
        prop=custDist(x)
        assert prop>=0 and prop<=1
        if np.random.uniform(low=0,high=1) <=prop:
            samples += [x]
        nLoop+=1
    return samples

#call
x0=2007
x1=2019
def custDist(x):
    if x<2010:
        return .3
    else:
        return (np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)
print(samples)

#plot
import matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar( (bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')
#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x) for x in grid]) #distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)
#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()

该解决方案的性能肯定可以提高,但是我更喜欢可读性。

I wrote a solution for drawing random samples from a custom continuous distribution.

I needed this for a similar use-case to yours (i.e. generating random dates with a given probability distribution).

You just need the funtion random_custDist and the line samples=random_custDist(x0,x1,custDist=custDist,size=1000). The rest is decoration ^^.

import numpy as np

#funtion
def random_custDist(x0,x1,custDist,size=None, nControl=10**6):
    #genearte a list of size random samples, obeying the distribution custDist
    #suggests random samples between x0 and x1 and accepts the suggestion with probability custDist(x)
    #custDist noes not need to be normalized. Add this condition to increase performance. 
    #Best performance for max_{x in [x0,x1]} custDist(x) = 1
    samples=[]
    nLoop=0
    while len(samples)<size and nLoop<nControl:
        x=np.random.uniform(low=x0,high=x1)
        prop=custDist(x)
        assert prop>=0 and prop<=1
        if np.random.uniform(low=0,high=1) <=prop:
            samples += [x]
        nLoop+=1
    return samples

#call
x0=2007
x1=2019
def custDist(x):
    if x<2010:
        return .3
    else:
        return (np.exp(x-2008)-1)/(np.exp(2019-2007)-1)
samples=random_custDist(x0,x1,custDist=custDist,size=1000)
print(samples)

#plot
import matplotlib.pyplot as plt
#hist
bins=np.linspace(x0,x1,int(x1-x0+1))
hist=np.histogram(samples, bins )[0]
hist=hist/np.sum(hist)
plt.bar( (bins[:-1]+bins[1:])/2, hist, width=.96, label='sample distribution')
#dist
grid=np.linspace(x0,x1,100)
discCustDist=np.array([custDist(x) for x in grid]) #distrete version
discCustDist*=1/(grid[1]-grid[0])/np.sum(discCustDist)
plt.plot(grid,discCustDist,label='custom distribustion (custDist)', color='C1', linewidth=4)
#decoration
plt.legend(loc=3,bbox_to_anchor=(1,0))
plt.show()

The performance of this solution is improvable for sure, but I prefer readability.


回答 6

根据以下内容列出项目weights

items = [1, 2, 3, 4, 5, 6]
probabilities= [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities) # find sum of probs, to normalize them
c = (1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)
print probabilities

ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)): # iterate through original items
  itemsList += items[i:i+1]*amounts[i]

# choose from itemsList randomly
print itemsList

一种优化可能是通过最大公约数对数量进行归一化,以使目标列表更小。

另外,可能很有趣。

Make a list of items, based on their weights:

items = [1, 2, 3, 4, 5, 6]
probabilities= [0.1, 0.05, 0.05, 0.2, 0.4, 0.2]
# if the list of probs is normalized (sum(probs) == 1), omit this part
prob = sum(probabilities) # find sum of probs, to normalize them
c = (1.0)/prob # a multiplier to make a list of normalized probs
probabilities = map(lambda x: c*x, probabilities)
print probabilities

ml = max(probabilities, key=lambda x: len(str(x)) - str(x).find('.'))
ml = len(str(ml)) - str(ml).find('.') -1
amounts = [ int(x*(10**ml)) for x in probabilities]
itemsList = list()
for i in range(0, len(items)): # iterate through original items
  itemsList += items[i:i+1]*amounts[i]

# choose from itemsList randomly
print itemsList

An optimization may be to normalize amounts by the greatest common divisor, to make the target list smaller.

Also, this might be interesting.


回答 7

另一个答案,可能更快:)

distribution = [(1, 0.2), (2, 0.3), (3, 0.5)]  
# init distribution  
dlist = []  
sumchance = 0  
for value, chance in distribution:  
    sumchance += chance  
    dlist.append((value, sumchance))  
assert sumchance == 1.0 # not good assert because of float equality  

# get random value  
r = random.random()  
# for small distributions use lineair search  
if len(distribution) < 64: # don't know exact speed limit  
    for value, sumchance in dlist:  
        if r < sumchance:  
            return value  
else:  
    # else (not implemented) binary search algorithm  

Another answer, probably faster :)

distribution = [(1, 0.2), (2, 0.3), (3, 0.5)]  
# init distribution  
dlist = []  
sumchance = 0  
for value, chance in distribution:  
    sumchance += chance  
    dlist.append((value, sumchance))  
assert sumchance == 1.0 # not good assert because of float equality  

# get random value  
r = random.random()  
# for small distributions use lineair search  
if len(distribution) < 64: # don't know exact speed limit  
    for value, sumchance in dlist:  
        if r < sumchance:  
            return value  
else:  
    # else (not implemented) binary search algorithm  

回答 8

from __future__ import division
import random
from collections import Counter


def num_gen(num_probs):
    # calculate minimum probability to normalize
    min_prob = min(prob for num, prob in num_probs)
    lst = []
    for num, prob in num_probs:
        # keep appending num to lst, proportional to its probability in the distribution
        for _ in range(int(prob/min_prob)):
            lst.append(num)
    # all elems in lst occur proportional to their distribution probablities
    while True:
        # pick a random index from lst
        ind = random.randint(0, len(lst)-1)
        yield lst[ind]

验证:

gen = num_gen([(1, 0.1),
               (2, 0.05),
               (3, 0.05),
               (4, 0.2),
               (5, 0.4),
               (6, 0.2)])
lst = []
times = 10000
for _ in range(times):
    lst.append(next(gen))
# Verify the created distribution:
for item, count in Counter(lst).iteritems():
    print '%d has %f probability' % (item, count/times)

1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability 
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability
from __future__ import division
import random
from collections import Counter


def num_gen(num_probs):
    # calculate minimum probability to normalize
    min_prob = min(prob for num, prob in num_probs)
    lst = []
    for num, prob in num_probs:
        # keep appending num to lst, proportional to its probability in the distribution
        for _ in range(int(prob/min_prob)):
            lst.append(num)
    # all elems in lst occur proportional to their distribution probablities
    while True:
        # pick a random index from lst
        ind = random.randint(0, len(lst)-1)
        yield lst[ind]

Verification:

gen = num_gen([(1, 0.1),
               (2, 0.05),
               (3, 0.05),
               (4, 0.2),
               (5, 0.4),
               (6, 0.2)])
lst = []
times = 10000
for _ in range(times):
    lst.append(next(gen))
# Verify the created distribution:
for item, count in Counter(lst).iteritems():
    print '%d has %f probability' % (item, count/times)

1 has 0.099737 probability
2 has 0.050022 probability
3 has 0.049996 probability 
4 has 0.200154 probability
5 has 0.399791 probability
6 has 0.200300 probability

回答 9

根据其他解决方案,您可以生成累积分布(任意形式为整数或浮点数),然后可以使用二等分来使其快速

这是一个简单的示例(我在这里使用了整数)

l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
    ret=[]
    c=0
    for i in l: c+=i[0]; ret.append((c, i[1]))
    return ret

def get_random_item(cdf):
    return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]

cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),

get_cdf函数会将其从20、60、10、10转换为20、20 + 60、20 + 60 + 10、20 + 60 + 10 + 10

现在我们使用选取最大为20 + 60 + 10 + 10的随机数,random.randint然后使用bisect快速获取实际值

based on other solutions, you generate accumulative distribution (as integer or float whatever you like), then you can use bisect to make it fast

this is a simple example (I used integers here)

l=[(20, 'foo'), (60, 'banana'), (10, 'monkey'), (10, 'monkey2')]
def get_cdf(l):
    ret=[]
    c=0
    for i in l: c+=i[0]; ret.append((c, i[1]))
    return ret

def get_random_item(cdf):
    return cdf[bisect.bisect_left(cdf, (random.randint(0, cdf[-1][0]),))][1]

cdf=get_cdf(l)
for i in range(100): print get_random_item(cdf),

the get_cdf function would convert it from 20, 60, 10, 10 into 20, 20+60, 20+60+10, 20+60+10+10

now we pick a random number up to 20+60+10+10 using random.randint then we use bisect to get the actual value in a fast way


回答 10

您可能想看看NumPy 随机抽样分布

you might want to have a look at NumPy Random sampling distributions


回答 11

这些答案都不是特别清楚或简单的。

这是保证可以正常工作的一种清晰,简单的方法。

accumulate_normalize_probabilities采用字典p将符号映射到概率频率。它输出要选择的元组的可用列表。

def accumulate_normalize_values(p):
        pi = p.items() if isinstance(p,dict) else p
        accum_pi = []
        accum = 0
        for i in pi:
                accum_pi.append((i[0],i[1]+accum))
                accum += i[1]
        if accum == 0:
                raise Exception( "You are about to explode the universe. Continue ? Y/N " )
        normed_a = []
        for a in accum_pi:
                normed_a.append((a[0],a[1]*1.0/accum))
        return normed_a

Yield:

>>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
[('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]

为什么运作

所述积累步骤变成每个符号到(在第一符号的情况下或0)本身和先前符号概率或频率之间的间隔。通过简单地逐步遍历列表,直到间隔0.0-> 1.0中的随机数(之前已准备好)小于或等于当前符号的间隔端点,可以使用这些间隔进行选择(从而对提供的分布进行采样)。

规范化释放我们从需求,以确保一切资金以一定的价值。归一化后,概率的“向量”总计为1.0。

下面用于从分布中选择并生成任意长样本的其余代码

def select(symbol_intervals,random):
        print symbol_intervals,random
        i = 0
        while random > symbol_intervals[i][1]:
                i += 1
                if i >= len(symbol_intervals):
                        raise Exception( "What did you DO to that poor list?" )
        return symbol_intervals[i][0]


def gen_random(alphabet,length,probabilities=None):
        from random import random
        from itertools import repeat
        if probabilities is None:
                probabilities = dict(zip(alphabet,repeat(1.0)))
        elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                probabilities = dict(zip(alphabet,probabilities)) #ordered
        usable_probabilities = accumulate_normalize_values(probabilities)
        gen = []
        while len(gen) < length:
                gen.append(select(usable_probabilities,random()))
        return gen

用法:

>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time

None of these answers is particularly clear or simple.

Here is a clear, simple method that is guaranteed to work.

accumulate_normalize_probabilities takes a dictionary p that maps symbols to probabilities OR frequencies. It outputs usable list of tuples from which to do selection.

def accumulate_normalize_values(p):
        pi = p.items() if isinstance(p,dict) else p
        accum_pi = []
        accum = 0
        for i in pi:
                accum_pi.append((i[0],i[1]+accum))
                accum += i[1]
        if accum == 0:
                raise Exception( "You are about to explode the universe. Continue ? Y/N " )
        normed_a = []
        for a in accum_pi:
                normed_a.append((a[0],a[1]*1.0/accum))
        return normed_a

Yields:

>>> accumulate_normalize_values( { 'a': 100, 'b' : 300, 'c' : 400, 'd' : 200  } )
[('a', 0.1), ('c', 0.5), ('b', 0.8), ('d', 1.0)]

Why it works

The accumulation step turns each symbol into an interval between itself and the previous symbols probability or frequency (or 0 in the case of the first symbol). These intervals can be used to select from (and thus sample the provided distribution) by simply stepping through the list until the random number in interval 0.0 -> 1.0 (prepared earlier) is less or equal to the current symbol’s interval end-point.

The normalization releases us from the need to make sure everything sums to some value. After normalization the “vector” of probabilities sums to 1.0.

The rest of the code for selection and generating a arbitrarily long sample from the distribution is below :

def select(symbol_intervals,random):
        print symbol_intervals,random
        i = 0
        while random > symbol_intervals[i][1]:
                i += 1
                if i >= len(symbol_intervals):
                        raise Exception( "What did you DO to that poor list?" )
        return symbol_intervals[i][0]


def gen_random(alphabet,length,probabilities=None):
        from random import random
        from itertools import repeat
        if probabilities is None:
                probabilities = dict(zip(alphabet,repeat(1.0)))
        elif len(probabilities) > 0 and isinstance(probabilities[0],(int,long,float)):
                probabilities = dict(zip(alphabet,probabilities)) #ordered
        usable_probabilities = accumulate_normalize_values(probabilities)
        gen = []
        while len(gen) < length:
                gen.append(select(usable_probabilities,random()))
        return gen

Usage :

>>> gen_random (['a','b','c','d'],10,[100,300,400,200])
['d', 'b', 'b', 'a', 'c', 'c', 'b', 'c', 'c', 'c']   #<--- some of the time

回答 12

这是一种更有效的方法

只需使用您的“权重”数组(假定索引为对应项)和否调用以下函数。需要的样本数。可以轻松修改此功能以处理有序对。

使用各自的概率返回采样/挑选(替换)的索引(或项目):

def resample(weights, n):
    beta = 0

    # Caveat: Assign max weight to max*2 for best results
    max_w = max(weights)*2

    # Pick an item uniformly at random, to start with
    current_item = random.randint(0,n-1)
    result = []

    for i in range(n):
        beta += random.uniform(0,max_w)

        while weights[current_item] < beta:
            beta -= weights[current_item]
            current_item = (current_item + 1) % n   # cyclic
        else:
            result.append(current_item)
    return result

关于while循环中使用的概念的简短说明。我们从累积beta减少当前项目的权重,该累积值是随机统一构造的累积值,并增加当前索引以找到其权重与beta值匹配的项目。

Here is a more effective way of doing this:

Just call the following function with your ‘weights’ array (assuming the indices as the corresponding items) and the no. of samples needed. This function can be easily modified to handle ordered pair.

Returns indexes (or items) sampled/picked (with replacement) using their respective probabilities:

def resample(weights, n):
    beta = 0

    # Caveat: Assign max weight to max*2 for best results
    max_w = max(weights)*2

    # Pick an item uniformly at random, to start with
    current_item = random.randint(0,n-1)
    result = []

    for i in range(n):
        beta += random.uniform(0,max_w)

        while weights[current_item] < beta:
            beta -= weights[current_item]
            current_item = (current_item + 1) % n   # cyclic
        else:
            result.append(current_item)
    return result

A short note on the concept used in the while loop. We reduce the current item’s weight from cumulative beta, which is a cumulative value constructed uniformly at random, and increment current index in order to find the item, the weight of which matches the value of beta.


从列表中随机选择50个项目写入文件

问题:从列表中随机选择50个项目写入文件

到目前为止,我已经弄清楚了如何导入文件,创建新文件以及使列表随机化。

我在从列表中随机选择50个项目以写入文件时遇到麻烦吗?

def randomizer(input,output1='random_1.txt',output2='random_2.txt',output3='random_3.txt',output4='random_total.txt'):

#Input file 
    query=open(input,'r').read().split()
    dir,file=os.path.split(input)

    temp1 = os.path.join(dir,output1)
    temp2 = os.path.join(dir,output2)
    temp3 = os.path.join(dir,output3)
    temp4 = os.path.join(dir,output4)


    out_file4=open(temp4,'w')

    random.shuffle(query)

    for item in query:
        out_file4.write(item+'\n')   

因此,如果总随机文件为

example:

random_total = ['9','2','3','1','5','6','8','7','0','4']

我想要3个文件(out_file1 | 2 | 3),其中第一个随机集为3,第二个随机集为3,第三个随机集为3(对于此示例,但我要创建的文件应该有50个)

random_1 = ['9','2','3']
random_2 = ['1','5','6']
random_3 = ['8','7','0']

因此,不会包含最后一个“ 4”,这很好。

如何从随机选择的列表中选择50?

更好的是,如何从原始列表中随机选择50个?

So far I have figured out how to import the file, create new files, and randomize the list.

I’m having trouble selecting only 50 items from the list randomly to write to a file?

def randomizer(input,output1='random_1.txt',output2='random_2.txt',output3='random_3.txt',output4='random_total.txt'):

#Input file 
    query=open(input,'r').read().split()
    dir,file=os.path.split(input)

    temp1 = os.path.join(dir,output1)
    temp2 = os.path.join(dir,output2)
    temp3 = os.path.join(dir,output3)
    temp4 = os.path.join(dir,output4)


    out_file4=open(temp4,'w')

    random.shuffle(query)

    for item in query:
        out_file4.write(item+'\n')   

So if the total randomization file was

example:

random_total = ['9','2','3','1','5','6','8','7','0','4']

I would want 3 files (out_file1|2|3) with the first random set of 3, second random set of 3, and third random set of 3 (for this example, but the one I want to create should have 50)

random_1 = ['9','2','3']
random_2 = ['1','5','6']
random_3 = ['8','7','0']

So the last ‘4’ will not be included which is fine.

How can I select 50 from the list that I randomized ?

Even better, how could I select 50 at random from the original list ?


回答 0

如果列表按随机顺序排列,则可以只取前50个。

否则,使用

import random
random.sample(the_list, 50)

random.sample 帮助文字:

sample(self, population, k) method of random.Random instance
    Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

If the list is in random order, you can just take the first 50.

Otherwise, use

import random
random.sample(the_list, 50)

random.sample help text:

sample(self, population, k) method of random.Random instance
    Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

回答 1

选择随机项目的一种简单方法是先随机播放然后切片。

import random
a = [1,2,3,4,5,6,7,8,9]
random.shuffle(a)
print a[:4] # prints 4 random variables

One easy way to select random items is to shuffle then slice.

import random
a = [1,2,3,4,5,6,7,8,9]
random.shuffle(a)
print a[:4] # prints 4 random variables

回答 2

我认为random.choice()是更好的选择。

import numpy as np

mylist = [13,23,14,52,6,23]

np.random.choice(mylist, 3, replace=False)

该函数从列表中返回3个随机选择的值的数组

I think random.choice() is a better option.

import numpy as np

mylist = [13,23,14,52,6,23]

np.random.choice(mylist, 3, replace=False)

the function returns an array of 3 randomly chosen values from the list


回答 3

假设您的列表包含100个元素,并且您想随机选择50个元素。以下是要遵循的步骤:

  1. 导入库
  2. 为随机数生成器创建种子,我将其放在2
  3. 准备一个数字列表,从中随机抽取
  4. 从数字列表中进行随机选择

码:

from random import seed
from random import choice

seed(2)
numbers = [i for i in range(100)]

print(numbers)

for _ in range(50):
    selection = choice(numbers)
    print(selection)

Say your list has 100 elements and you want to pick 50 of them in a random way. Here are the steps to follow:

  1. Import the libraries
  2. Create the seed for random number generator, I have put it at 2
  3. Prepare a list of numbers from which to pick up in a random way
  4. Make the random choices from the numbers list

Code:

from random import seed
from random import choice

seed(2)
numbers = [i for i in range(100)]

print(numbers)

for _ in range(50):
    selection = choice(numbers)
    print(selection)

创建随机数矩阵的简单方法

问题:创建随机数矩阵的简单方法

我正在尝试创建一个随机数矩阵,但是我的解决方案太长且看起来很丑

random_matrix = [[random.random() for e in range(2)] for e in range(3)]

看起来不错,但是在我的实现中

weights_h = [[random.random() for e in range(len(inputs[0]))] for e in range(hiden_neurons)]

这是非常不可读的,并且不能放在一行上。

I am trying to create a matrix of random numbers, but my solution is too long and looks ugly

random_matrix = [[random.random() for e in range(2)] for e in range(3)]

this looks ok, but in my implementation it is

weights_h = [[random.random() for e in range(len(inputs[0]))] for e in range(hiden_neurons)]

which is extremely unreadable and does not fit on one line.


回答 0

看看numpy.random.rand

文档字符串:rand(d0,d1,…,dn)

给定形状的随机值。

创建给定形状的数组,并使用上均匀分布的随机样本传播它[0, 1)


>>> import numpy as np
>>> np.random.rand(2,3)
array([[ 0.22568268,  0.0053246 ,  0.41282024],
       [ 0.68824936,  0.68086462,  0.6854153 ]])

Take a look at numpy.random.rand:

Docstring: rand(d0, d1, …, dn)

Random values in a given shape.

Create an array of the given shape and propagate it with random samples from a uniform distribution over [0, 1).


>>> import numpy as np
>>> np.random.rand(2,3)
array([[ 0.22568268,  0.0053246 ,  0.41282024],
       [ 0.68824936,  0.68086462,  0.6854153 ]])

回答 1

您可以删除range(len())

weights_h = [[random.random() for e in inputs[0]] for e in range(hiden_neurons)]

但实际上,您可能应该使用numpy。

In [9]: numpy.random.random((3, 3))
Out[9]:
array([[ 0.37052381,  0.03463207,  0.10669077],
       [ 0.05862909,  0.8515325 ,  0.79809676],
       [ 0.43203632,  0.54633635,  0.09076408]])

You can drop the range(len()):

weights_h = [[random.random() for e in inputs[0]] for e in range(hiden_neurons)]

But really, you should probably use numpy.

In [9]: numpy.random.random((3, 3))
Out[9]:
array([[ 0.37052381,  0.03463207,  0.10669077],
       [ 0.05862909,  0.8515325 ,  0.79809676],
       [ 0.43203632,  0.54633635,  0.09076408]])

回答 2

使用np.random.randint()numpy.random.random_integers()是过时

random_matrix = numpy.random.randint(min_val,max_val,(<num_rows>,<num_cols>))

use np.random.randint() as numpy.random.random_integers() is deprecated

random_matrix = numpy.random.randint(min_val,max_val,(<num_rows>,<num_cols>))

回答 3

看起来您正在执行Coursera机器学习神经网络练习的Python实现。这是我为randInitializeWeights(L_in,L_out)做的

#get a random array of floats between 0 and 1 as Pavel mentioned 
W = numpy.random.random((L_out, L_in +1))

#normalize so that it spans a range of twice epsilon
W = W * 2 * epsilon

#shift so that mean is at zero
W = W - epsilon

Looks like you are doing a Python implementation of the Coursera Machine Learning Neural Network exercise. Here’s what I did for randInitializeWeights(L_in, L_out)

#get a random array of floats between 0 and 1 as Pavel mentioned 
W = numpy.random.random((L_out, L_in +1))

#normalize so that it spans a range of twice epsilon
W = W * 2 * epsilon

#shift so that mean is at zero
W = W - epsilon

回答 4

首先,创建numpy数组,然后将其转换为matrix。请参见下面的代码:

import numpy

B = numpy.random.random((3, 4)) #its ndArray
C = numpy.matrix(B)# it is matrix
print(type(B))
print(type(C)) 
print(C)

First, create numpy array then convert it into matrix. See the code below:

import numpy

B = numpy.random.random((3, 4)) #its ndArray
C = numpy.matrix(B)# it is matrix
print(type(B))
print(type(C)) 
print(C)

回答 5

x = np.int_(np.random.rand(10) * 10)

对于10中的随机数,对于20中的随机数,我们必须乘以20。

x = np.int_(np.random.rand(10) * 10)

For random numbers out of 10. For out of 20 we have to multiply by 20.


回答 6

当您说“随机数矩阵”时,您可以使用numpy作为上述的Pavel https://stackoverflow.com/a/15451997/6169225,在这种情况下,我假设与您无关的是,这些分布是什么(伪)坚持随机数。

但是,如果您需要特定的分布(我想您对统一分布很感兴趣),则numpy.random有非常有用的方法为您服务。例如,假设您要一个3×2矩阵,其伪随机均匀分布以[low,high]为边界。您可以这样做:

numpy.random.uniform(low,high,(3,2))

注意,您可以用uniform此库支持的任意数量的发行版代替。

进一步阅读:https : //docs.scipy.org/doc/numpy/reference/routines.random.html

When you say “a matrix of random numbers”, you can use numpy as Pavel https://stackoverflow.com/a/15451997/6169225 mentioned above, in this case I’m assuming to you it is irrelevant what distribution these (pseudo) random numbers adhere to.

However, if you require a particular distribution (I imagine you are interested in the uniform distribution), numpy.random has very useful methods for you. For example, let’s say you want a 3×2 matrix with a pseudo random uniform distribution bounded by [low,high]. You can do this like so:

numpy.random.uniform(low,high,(3,2))

Note, you can replace uniform by any number of distributions supported by this library.

Further reading: https://docs.scipy.org/doc/numpy/reference/routines.random.html


回答 7

创建随机整数数组的一种简单方法是:

matrix = np.random.randint(maxVal, size=(rows, columns))

下面的代码输出从0到10的2到3的随机整数矩阵:

a = np.random.randint(10, size=(2,3))

A simple way of creating an array of random integers is:

matrix = np.random.randint(maxVal, size=(rows, columns))

The following outputs a 2 by 3 matrix of random integers from 0 to 10:

a = np.random.randint(10, size=(2,3))

回答 8

为了创建随机数数组,NumPy使用以下方法创建数组:

  1. 实数

  2. 整数

使用随机实数创建数组 有2个选项

  1. random.rand(用于均匀分布所生成的随机数)
  2. random.randn(用于所生成的随机数的正态分布)

随机兰德

import numpy as np 
arr = np.random.rand(row_size, column_size) 

随机兰德

import numpy as np 
arr = np.random.randn(row_size, column_size) 

使用随机整数创建数组

import numpy as np
numpy.random.randint(low, high=None, size=None, dtype='l')

哪里

  • low =要从分布中得出的最低(有符号)整数
  • 高(可选)=如果提供,则从分布中得出的最大(有符号)整数上方
  • size(可选)=输出形状,即如果给定形状为例如(m,n,k),则绘制m * n * k个样本
  • dtype(可选)=结果的所需dtype。

例如:

给定的示例将生成一个介于0和4之间的随机整数数组,其大小将为5 * 5,并具有25个整数

arr2 = np.random.randint(0,5,size = (5,5))

为了创建5 x 5矩阵,应将其修改为

arr2 = np.random.randint(0,5,size =(5,5)),将乘法符号*更改为逗号,#

[[2 1 1 0 1] [3 2 1 4 3] [2 3 0 3 3] [1 3 1 0 0] [4 1 2 0 1]]

eg2:

给定的示例将生成一个介于0和1之间的随机整数数组,其大小将为1 * 10并具有10个整数

arr3= np.random.randint(2, size = 10)

[0 0 0 0 1 1 0 0 1 1]

For creating an array of random numbers NumPy provides array creation using:

  1. Real numbers

  2. Integers

For creating array using random Real numbers: there are 2 options

  1. random.rand (for uniform distribution of the generated random numbers )
  2. random.randn (for normal distribution of the generated random numbers )

random.rand

import numpy as np 
arr = np.random.rand(row_size, column_size) 

random.randn

import numpy as np 
arr = np.random.randn(row_size, column_size) 

For creating array using random Integers:

import numpy as np
numpy.random.randint(low, high=None, size=None, dtype='l')

where

  • low = Lowest (signed) integer to be drawn from the distribution
  • high(optional)= If provided, one above the largest (signed) integer to be drawn from the distribution
  • size(optional) = Output shape i.e. if the given shape is, e.g., (m, n, k), then m * n * k samples are drawn
  • dtype(optional) = Desired dtype of the result.

eg:

The given example will produce an array of random integers between 0 and 4, its size will be 5*5 and have 25 integers

arr2 = np.random.randint(0,5,size = (5,5))

in order to create 5 by 5 matrix, it should be modified to

arr2 = np.random.randint(0,5,size = (5,5)), change the multiplication symbol* to a comma ,#

[[2 1 1 0 1][3 2 1 4 3][2 3 0 3 3][1 3 1 0 0][4 1 2 0 1]]

eg2:

The given example will produce an array of random integers between 0 and 1, its size will be 1*10 and will have 10 integers

arr3= np.random.randint(2, size = 10)

[0 0 0 0 1 1 0 0 1 1]


回答 9

random_matrix = [[random.random for j in range(collumns)] for i in range(rows)
for i in range(rows):
    print random_matrix[i]
random_matrix = [[random.random for j in range(collumns)] for i in range(rows)
for i in range(rows):
    print random_matrix[i]

回答 10

使用map-reduce的答案:-

map(lambda x: map(lambda y: ran(),range(len(inputs[0]))),range(hiden_neurons))

An answer using map-reduce:-

map(lambda x: map(lambda y: ran(),range(len(inputs[0]))),range(hiden_neurons))

回答 11

#this is a function for a square matrix so on the while loop rows does not have to be less than cols.
#you can make your own condition. But if you want your a square matrix, use this code.

import random

import numpy as np

def random_matrix(R, cols):

        matrix = []

        rows =  0

        while  rows < cols:

            N = random.sample(R, cols)

            matrix.append(N)

            rows = rows + 1

    return np.array(matrix)

print(random_matrix(range(10), 5))
#make sure you understand the function random.sample
#this is a function for a square matrix so on the while loop rows does not have to be less than cols.
#you can make your own condition. But if you want your a square matrix, use this code.

import random

import numpy as np

def random_matrix(R, cols):

        matrix = []

        rows =  0

        while  rows < cols:

            N = random.sample(R, cols)

            matrix.append(N)

            rows = rows + 1

    return np.array(matrix)

print(random_matrix(range(10), 5))
#make sure you understand the function random.sample

回答 12

numpy.random.rand(row,column)根据给定的指定(m,n)参数生成0到1之间的随机数。因此,使用它创建一个(m,n)矩阵,并将该矩阵乘以范围限制,然后将其与上限相加。

分析:如果生成零,则仅将保持下限,但是如果生成零,将仅保持上限。顺便说一句,使用rand numpy生成限制,您可以生成极端所需的数字。

import numpy as np

high = 10
low = 5
m,n = 2,2

a = (high - low)*np.random.rand(m,n) + low

输出:

a = array([[5.91580065, 8.1117106 ],
          [6.30986984, 5.720437  ]])

numpy.random.rand(row, column) generates random numbers between 0 and 1, according to the specified (m,n) parameters given. So use it to create a (m,n) matrix and multiply the matrix for the range limit and sum it with the high limit.

Analyzing: If zero is generated just the low limit will be held, but if one is generated just the high limit will be held. In order words, generating the limits using rand numpy you can generate the extreme desired numbers.

import numpy as np

high = 10
low = 5
m,n = 2,2

a = (high - low)*np.random.rand(m,n) + low

Output:

a = array([[5.91580065, 8.1117106 ],
          [6.30986984, 5.720437  ]])

在Python中生成随机文件名的最佳方法

问题:在Python中生成随机文件名的最佳方法

在Python中,生成一些随机文本以添加到我要保存到服务器的文件(名称)的一种好的方法或最佳方法是,只是确保它不会被覆盖。谢谢!

In Python, what is a good, or the best way to generate some random text to prepend to a file(name) that I’m saving to a server, just to make sure it does not overwrite. Thank you!


回答 0

Python具有生成临时文件名的功能,请参见http://docs.python.org/library/tempfile.html。例如:

In [4]: import tempfile

每次调用都会生成tempfile.NamedTemporaryFile()一个不同的临时文件,并且可以使用.name属性访问其名称,例如:

In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'

In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'

一旦拥有唯一的文件名,它就可以像任何常规文件一样使用。注意:默认情况下,文件在关闭时将被删除。但是,如果delete参数为False,则不会自动删除文件。

完整的参数集:

tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])

也可以指定临时文件的前缀(作为文件创建过程中可以提供的各种参数之一):

In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'

此处可以找到有关使用临时文件的其他示例。

Python has facilities to generate temporary file names, see http://docs.python.org/library/tempfile.html. For instance:

In [4]: import tempfile

Each call to tempfile.NamedTemporaryFile() results in a different temp file, and its name can be accessed with the .name attribute, e.g.:

In [5]: tf = tempfile.NamedTemporaryFile()
In [6]: tf.name
Out[6]: 'c:\\blabla\\locals~1\\temp\\tmptecp3i'

In [7]: tf = tempfile.NamedTemporaryFile()
In [8]: tf.name
Out[8]: 'c:\\blabla\\locals~1\\temp\\tmpr8vvme'

Once you have the unique filename it can be used like any regular file. Note: By default the file will be deleted when it is closed. However, if the delete parameter is False, the file is not automatically deleted.

Full parameter set:

tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])

it is also possible to specify the prefix for the temporary file (as one of the various parameters that can be supplied during the file creation):

In [9]: tf = tempfile.NamedTemporaryFile(prefix="zz")
In [10]: tf.name
Out[10]: 'c:\\blabla\\locals~1\\temp\\zzrc3pzk'

Additional examples for working with temporary files can be found here


回答 1

您可以使用UUID模块生成随机字符串:

import uuid
filename = str(uuid.uuid4())

鉴于UUID生成器极不可能产生重复的标识符(在这种情况下为文件名),因此这是一个有效的选择:

仅在接下来的100年中每秒生成10亿个UUID之后,仅创建一个副本的可能性就约为50%。如果地球上每个人都拥有6亿个UUID,则重复一次的可能性约为50%。

You could use the UUID module for generating a random string:

import uuid
filename = str(uuid.uuid4())

This is a valid choice, given that an UUID generator is extremely unlikely to produce a duplicate identifier (a file name, in this case):

Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.


回答 2

一种常见的方法是在文件名中添加时间戳作为前缀/后缀,以使其与文件具有某种时间关系。如果您需要更多的唯一性,您仍然可以在其中添加随机字符串。

import datetime
basename = "mylogfile"
suffix = datetime.datetime.now().strftime("%y%m%d_%H%M%S")
filename = "_".join([basename, suffix]) # e.g. 'mylogfile_120508_171442'

a common approach is to add a timestamp as a prefix/suffix to the filename to have some temporal relation to the file. If you need more uniqueness you can still add a random string to this.

import datetime
basename = "mylogfile"
suffix = datetime.datetime.now().strftime("%y%m%d_%H%M%S")
filename = "_".join([basename, suffix]) # e.g. 'mylogfile_120508_171442'

回答 3

OP要求创建随机文件名而不是随机文件。时间和UUID可能会发生冲突。如果您在单台机器(不是共享文件系统)上工作,并且进程/线程不会踩踏自身,请使用os.getpid()获取自己的PID,并将其用作唯一文件名的元素。其他进程显然不会获得相同的PID。如果您是多线程的,请获取线程ID。如果在代码的其他方面,单个线程或进程可能会生成多个不同的临时文件,则可能需要使用另一种技术。滚动索引可以正常工作(如果您没有将它们保持太久或使用太多文件,您将担心滚动)。在这种情况下,将全局散列/索引保留为“活动”文件就足够了。

对于冗长的解释,我们深表歉意,但这确实取决于您的确切用法。

The OP requested to create random filenames not random files. Times and UUIDs can collide. If you are working on a single machine (not a shared filesystem) and your process/thread will not stomp on itselfk, use os.getpid() to get your own PID and use this as an element of a unique filename. Other processes would obviously not get the same PID. If you are multithreaded, get the thread id. If you have other aspects of your code in which a single thread or process could generate multiple different tempfiles, you might need to use another technique. A rolling index can work (if you aren’t keeping them so long or using so many files you would worry about rollover). Keeping a global hash/index to “active” files would suffice in that case.

So sorry for the longwinded explanation, but it does depend on your exact usage.


回答 4

如果您不需要文件路径,而只需要具有预定义长度的随机字符串,则可以使用类似的内容。

>>> import random
>>> import string

>>> file_name = ''.join(random.choice(string.ascii_lowercase) for i in range(16))
>>> file_name
'ytrvmyhkaxlfaugx'

If you need no the file path, but only the random string having predefined length you can use something like this.

>>> import random
>>> import string

>>> file_name = ''.join(random.choice(string.ascii_lowercase) for i in range(16))
>>> file_name
'ytrvmyhkaxlfaugx'

回答 5

如果要将原始文件名保留为新文件名的一部分,则可以使用当前时间的MD5哈希值来生成统一长度的唯一前缀:

from hashlib import md5
from time import localtime

def add_prefix(filename):
    prefix = md5(str(localtime()).encode('utf-8')).hexdigest()
    return f"{prefix}_{filename}"

调用add_prefix(’style.css’)会生成如下序列:

a38ff35794ae366e442a0606e67035ba_style.css
7a5f8289323b0ebfdbc7c840ad3cb67b_style.css

If you want to preserve the original filename as a part of the new filename, unique prefixes of uniform length can be generated by using MD5 hashes of the current time:

from hashlib import md5
from time import localtime

def add_prefix(filename):
    prefix = md5(str(localtime()).encode('utf-8')).hexdigest()
    return f"{prefix}_{filename}"

Calls to the add_prefix(‘style.css’) generates sequence like:

a38ff35794ae366e442a0606e67035ba_style.css
7a5f8289323b0ebfdbc7c840ad3cb67b_style.css

回答 6

在这里加两分钱:

In [19]: tempfile.mkstemp('.png', 'bingo', '/tmp')[1]
Out[19]: '/tmp/bingoy6s3_k.png'

根据tempfile.mkstemp的python文档,它将以最安全的方式创建一个临时文件。请注意,该文件将在调用后存在:

In [20]: os.path.exists(tempfile.mkstemp('.png', 'bingo', '/tmp')[1])
Out[20]: True

Adding my two cents here:

In [19]: tempfile.mkstemp('.png', 'bingo', '/tmp')[1]
Out[19]: '/tmp/bingoy6s3_k.png'

According to the python doc for tempfile.mkstemp, it creates a temporary file in the most secure manner possible. Please note that the file will exist after this call:

In [20]: os.path.exists(tempfile.mkstemp('.png', 'bingo', '/tmp')[1])
Out[20]: True

回答 7

我个人更希望文本不仅是随机的/唯一的,而且也是美丽的,这就是为什么我喜欢hashids lib的原因,它可以从整数生成美观的随机文本。可以通过安装

pip install hashids

片段:

import hashids
hashids = hashids.Hashids(salt="this is my salt", )
print hashids.encode(1, 2, 3)
>>> laHquq

简短的介绍:

Hashids是一个小型的开放源代码库,可从数字生成短的,唯一的,非顺序的ID。

I personally prefer to have my text to not be only random/unique but beautiful as well, that’s why I like the hashids lib, which generates nice looking random text from integers. Can installed through

pip install hashids

Snippet:

import hashids
hashids = hashids.Hashids(salt="this is my salt", )
print hashids.encode(1, 2, 3)
>>> laHquq

Short Description:

Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.


回答 8

>>> import random
>>> import string    
>>> alias = ''.join(random.choice(string.ascii_letters) for _ in range(16))
>>> alias
'WrVkPmjeSOgTmCRG'

您可以将’string.ascii_letters’更改为任何字符串格式,以生成任何其他文本,例如移动NO,ID …

>>> import random
>>> import string    
>>> alias = ''.join(random.choice(string.ascii_letters) for _ in range(16))
>>> alias
'WrVkPmjeSOgTmCRG'

You could change ‘string.ascii_letters’ to any string format as you like to generate any other text, for example mobile NO, ID…


回答 9

import uuid
   imageName = '{}{:-%Y%m%d%H%M%S}.jpeg'.format(str(uuid.uuid4().hex), datetime.now())
import uuid
   imageName = '{}{:-%Y%m%d%H%M%S}.jpeg'.format(str(uuid.uuid4().hex), datetime.now())

回答 10

您可以使用随机包:

import random
file = random.random()

You could use the random package:

import random
file = random.random()

如何创建无重复的随机数列表?

问题:如何创建无重复的随机数列表?

我尝试使用random.randint(0, 100),但一些数字相同。有没有一种方法/模块来创建唯一的随机数列表?

注意:以下代码基于答案,并在发布答案后添加。这不是问题的一部分。这是解决方案。

def getScores():
    # open files to read and write
    f1 = open("page.txt", "r");
    p1 = open("pgRes.txt", "a");

    gScores = [];
    bScores = [];
    yScores = [];

    # run 50 tests of 40 random queries to implement "bootstrapping" method 
    for i in range(50):
        # get 40 random queries from the 50
        lines = random.sample(f1.readlines(), 40);

I tried using random.randint(0, 100), but some numbers were the same. Is there a method/module to create a list unique random numbers?

Note: The following code is based on an answer and has been added after the answer was posted. It is not a part of the question; it is the solution.

def getScores():
    # open files to read and write
    f1 = open("page.txt", "r");
    p1 = open("pgRes.txt", "a");

    gScores = [];
    bScores = [];
    yScores = [];

    # run 50 tests of 40 random queries to implement "bootstrapping" method 
    for i in range(50):
        # get 40 random queries from the 50
        lines = random.sample(f1.readlines(), 40);

回答 0

这将返回从0到99范围内选择的10个数字的列表,没有重复。

import random
random.sample(range(100), 10)

参考您的特定代码示例,您可能希望一次从文件中读取所有行,然后从内存中的已保存列表中选择随机行。例如:

all_lines = f1.readlines()
for i in range(50):
    lines = random.sample(all_lines, 40)

这样,您只需要在循环之前实际从文件中读取一次即可。与返回文件的开头并f1.readlines()为每次循环迭代再次调用相比,这样做的效率要高得多。

This will return a list of 10 numbers selected from the range 0 to 99, without duplicates.

import random
random.sample(range(100), 10)

With reference to your specific code example, you probably want to read all the lines from the file once and then select random lines from the saved list in memory. For example:

all_lines = f1.readlines()
for i in range(50):
    lines = random.sample(all_lines, 40)

This way, you only need to actually read from the file once, before your loop. It’s much more efficient to do this than to seek back to the start of the file and call f1.readlines() again for each loop iteration.


回答 1

您可以像下面这样使用随机模块中的随机播放功能:

import random

my_list = list(xrange(1,100)) # list of integers from 1 to 99
                              # adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers

请注意,在这里shuffle方法不会像预期的那样返回任何列表,它只会对通过引用传递的列表进行随机排序。

You can use the shuffle function from the random module like this:

import random

my_list = list(xrange(1,100)) # list of integers from 1 to 99
                              # adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers

Note here that the shuffle method doesn’t return any list as one may expect, it only shuffle the list passed by reference.


回答 2

您可以先创建一个从a到的数字列表b,其中ab分别是列表中的最小和最大数字,然后使用Fisher-Yates算法或使用Python的random.shuffle方法对其进行混洗。

You can first create a list of numbers from a to b, where a and b are respectively the smallest and greatest numbers in your list, then shuffle it with Fisher-Yates algorithm or using the Python’s random.shuffle method.


回答 3

此答案中给出的解决方案有效,但如果样本量很小但总数很大(例如random.sample(insanelyLargeNumber, 10)),则可能会在内存问题上产生问题。

为了解决这个问题,我可以这样做:

answer = set()
sampleSize = 10
answerSize = 0

while answerSize < sampleSize:
    r = random.randint(0,100)
    if r not in answer:
        answerSize += 1
        answer.add(r)

# answer now contains 10 unique, random integers from 0.. 100

The solution presented in this answer works, but it could become problematic with memory if the sample size is small, but the population is huge (e.g. random.sample(insanelyLargeNumber, 10)).

To fix that, I would go with this:

answer = set()
sampleSize = 10
answerSize = 0

while answerSize < sampleSize:
    r = random.randint(0,100)
    if r not in answer:
        answerSize += 1
        answer.add(r)

# answer now contains 10 unique, random integers from 0.. 100

回答 4

线性同余伪随机数生成器

O(1)记忆

O(k)运算

这个问题可以用一个简单的线性同余发生器来解决。这需要恒定的内存开销(8个整数)和最多2 *(序列长度)的计算。

所有其他解决方案都使用更多的内存和更多的计算资源!如果只需要几个随机序列,则此方法将便宜得多。对于大小范围N,如果要按N唯一k序列或更大的顺序生成,我建议使用内置方法接受的解决方案,random.sample(range(N),k)因为它在python中进行了速度优化

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for 
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
    import random, math
    # Set a default values the same way "range" does.
    if (stop == None): start, stop = 0, start
    if (step == None): step = 1
    # Use a mapping to convert a standard range into the desired range.
    mapping = lambda i: (i*step) + start
    # Compute the number of numbers in this range.
    maximum = (stop - start) // step
    # Seed range with a random integer.
    value = random.randint(0,maximum)
    # 
    # Construct an offset, multiplier, and modulus for a linear
    # congruential generator. These generators are cyclic and
    # non-repeating when they maintain the properties:
    # 
    #   1) "modulus" and "offset" are relatively prime.
    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
    # 
    offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
    multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
    modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
    # Track how many random numbers have been returned.
    found = 0
    while found < maximum:
        # If this is a valid value, yield it in generator fashion.
        if value < maximum:
            found += 1
            yield mapping(value)
        # Calculate the next value in the sequence.
        value = (value*multiplier + offset) % modulus

用法

此函数“ random_range”的用法与任何生成器(例如“ range”)相同。一个例子:

# Show off random range.
print()
for v in range(3,6):
    v = 2**v
    l = list(random_range(v))
    print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
    print("",l)
    print()

样品结果

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
 [1, 0, 7, 6, 5, 4, 3, 2]

Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
 [3, 5, 8, 7, 2, 6, 0, 1, 4]

Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
 [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]

Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
 [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]

Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
 [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]

Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
 [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

Linear Congruential Pseudo-random Number Generator

O(1) Memory

O(k) Operations

This problem can be solved with a simple Linear Congruential Generator. This requires constant memory overhead (8 integers) and at most 2*(sequence length) computations.

All other solutions use more memory and more compute! If you only need a few random sequences, this method will be significantly cheaper. For ranges of size N, if you want to generate on the order of N unique k-sequences or more, I recommend the accepted solution using the builtin methods random.sample(range(N),k) as this has been optimized in python for speed.

Code

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for 
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
    import random, math
    # Set a default values the same way "range" does.
    if (stop == None): start, stop = 0, start
    if (step == None): step = 1
    # Use a mapping to convert a standard range into the desired range.
    mapping = lambda i: (i*step) + start
    # Compute the number of numbers in this range.
    maximum = (stop - start) // step
    # Seed range with a random integer.
    value = random.randint(0,maximum)
    # 
    # Construct an offset, multiplier, and modulus for a linear
    # congruential generator. These generators are cyclic and
    # non-repeating when they maintain the properties:
    # 
    #   1) "modulus" and "offset" are relatively prime.
    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
    # 
    offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
    multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
    modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
    # Track how many random numbers have been returned.
    found = 0
    while found < maximum:
        # If this is a valid value, yield it in generator fashion.
        if value < maximum:
            found += 1
            yield mapping(value)
        # Calculate the next value in the sequence.
        value = (value*multiplier + offset) % modulus

Usage

The usage of this function “random_range” is the same as for any generator (like “range”). An example:

# Show off random range.
print()
for v in range(3,6):
    v = 2**v
    l = list(random_range(v))
    print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
    print("",l)
    print()

Sample Results

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
 [1, 0, 7, 6, 5, 4, 3, 2]

Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
 [3, 5, 8, 7, 2, 6, 0, 1, 4]

Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
 [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]

Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
 [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]

Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
 [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]

Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
 [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

回答 5

如果从1到N的N个数字的列表是随机生成的,则可以,某些数字可能会重复。

如果要按随机顺序排列从1到N的数字列表,请使用从1到N的整数填充数组,然后使用Fisher-Yates随机播放或Python的random.shuffle()

If the list of N numbers from 1 to N is randomly generated, then yes, there is a possibility that some numbers may be repeated.

If you want a list of numbers from 1 to N in a random order, fill an array with integers from 1 to N, and then use a Fisher-Yates shuffle or Python’s random.shuffle().


回答 6

如果需要采样非常大的数字,则不能使用 range

random.sample(range(10000000000000000000000000000000), 10)

因为它抛出:

OverflowError: Python int too large to convert to C ssize_t

另外,如果random.sample由于范围太小而无法生成所需数量的物品

 random.sample(range(2), 1000)

它抛出:

 ValueError: Sample larger than population

此函数解决了两个问题:

import random

def random_sample(count, start, stop, step=1):
    def gen_random():
        while True:
            yield random.randrange(start, stop, step)

    def gen_n_unique(source, n):
        seen = set()
        seenadd = seen.add
        for i in (i for i in source() if i not in seen and not seenadd(i)):
            yield i
            if len(seen) == n:
                break

    return [i for i in gen_n_unique(gen_random,
                                    min(count, int(abs(stop - start) / abs(step))))]

大量使用:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

样本结果:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943

范围小于所请求项目数的用法:

print(', '.join(map(str, random_sample(100000, 0, 3))))

样本结果:

2, 0, 1

它还适用于负范围和步骤:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))

样本结果:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3

If you need to sample extremely large numbers, you cannot use range

random.sample(range(10000000000000000000000000000000), 10)

because it throws:

OverflowError: Python int too large to convert to C ssize_t

Also, if random.sample cannot produce the number of items you want due to the range being too small

 random.sample(range(2), 1000)

it throws:

 ValueError: Sample larger than population

This function resolves both problems:

import random

def random_sample(count, start, stop, step=1):
    def gen_random():
        while True:
            yield random.randrange(start, stop, step)

    def gen_n_unique(source, n):
        seen = set()
        seenadd = seen.add
        for i in (i for i in source() if i not in seen and not seenadd(i)):
            yield i
            if len(seen) == n:
                break

    return [i for i in gen_n_unique(gen_random,
                                    min(count, int(abs(stop - start) / abs(step))))]

Usage with extremely large numbers:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

Sample result:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943

Usage where the range is smaller than the number of requested items:

print(', '.join(map(str, random_sample(100000, 0, 3))))

Sample result:

2, 0, 1

It also works with with negative ranges and steps:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))

Sample results:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3

回答 7

您可以使用Numpy库进行快速解答,如下所示-

给定的代码段列出了0到5之间的6个唯一数字。您可以根据需要调整参数。

import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)

输出量

[ 2.  1.  5.  3.  4.  0.]

它不把任何约束,因为我们在random.sample看到称为这里

希望这个对你有帮助。

You can use Numpy library for quick answer as shown below –

Given code snippet lists down 6 unique numbers between the range of 0 to 5. You can adjust the parameters for your comfort.

import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)

Output

[ 2.  1.  5.  3.  4.  0.]

It doesn’t put any constraints as we see in random.sample as referred here.

Hope this helps a bit.


回答 8

此处提供的答案时间和内存方面都非常有效,但由于它使用了诸如yield的高级python构造,因此有点复杂。在简单的答案行之有效的做法,但与回答的问题是,它可以实际构建所需的集之前产生许多虚假的整数。使用人口大小= 1000,样本大小= 999进行尝试。理论上,它有可能不会终止。

下面的答案解决了这两个问题,因为它是确定性的,虽然目前还不如其他两个效率高,但它还是有一定效率的。

def randomSample(populationSize, sampleSize):
  populationStr = str(populationSize)
  dTree, samples = {}, []
  for i in range(sampleSize):
    val, dTree = getElem(populationStr, dTree, '')
    samples.append(int(val))
  return samples, dTree

函数getElem,percolateUp的定义如下

import random

def getElem(populationStr, dTree, key):
  msd  = int(populationStr[0])
  if not key in dTree.keys():
    dTree[key] = range(msd + 1)
  idx = random.randint(0, len(dTree[key]) - 1)
  key = key +  str(dTree[key][idx])
  if len(populationStr) == 1:
    dTree[key[:-1]].pop(idx)
    return key, (percolateUp(dTree, key[:-1]))
  newPopulation = populationStr[1:]
  if int(key[-1]) != msd:
    newPopulation = str(10**(len(newPopulation)) - 1)
  return getElem(newPopulation, dTree, key)

def percolateUp(dTree, key):
  while (dTree[key] == []):
    dTree[key[:-1]].remove( int(key[-1]) )
    key = key[:-1]
  return dTree

最后,对于较大的n值,平均时间约为15毫秒,如下所示:

In [3]: n = 10000000000000000000000000000000

In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms

In [5]: l
Out[5]:
[10000000000000000000000000000000L,
 5731058186417515132221063394952L,
 85813091721736310254927217189L,
 6349042316505875821781301073204L,
 2356846126709988590164624736328L]

The answer provided here works very well with respect to time as well as memory but a bit more complicated as it uses advanced python constructs such as yield. The simpler answer works well in practice but, the issue with that answer is that it may generate many spurious integers before actually constructing the required set. Try it out with populationSize = 1000, sampleSize = 999. In theory, there is a chance that it doesn’t terminate.

The answer below addresses both issues, as it is deterministic and somewhat efficient though currently not as efficient as the other two.

def randomSample(populationSize, sampleSize):
  populationStr = str(populationSize)
  dTree, samples = {}, []
  for i in range(sampleSize):
    val, dTree = getElem(populationStr, dTree, '')
    samples.append(int(val))
  return samples, dTree

where the functions getElem, percolateUp are as defined below

import random

def getElem(populationStr, dTree, key):
  msd  = int(populationStr[0])
  if not key in dTree.keys():
    dTree[key] = range(msd + 1)
  idx = random.randint(0, len(dTree[key]) - 1)
  key = key +  str(dTree[key][idx])
  if len(populationStr) == 1:
    dTree[key[:-1]].pop(idx)
    return key, (percolateUp(dTree, key[:-1]))
  newPopulation = populationStr[1:]
  if int(key[-1]) != msd:
    newPopulation = str(10**(len(newPopulation)) - 1)
  return getElem(newPopulation, dTree, key)

def percolateUp(dTree, key):
  while (dTree[key] == []):
    dTree[key[:-1]].remove( int(key[-1]) )
    key = key[:-1]
  return dTree

Finally, the timing on average was about 15ms for a large value of n as shown below,

In [3]: n = 10000000000000000000000000000000

In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms

In [5]: l
Out[5]:
[10000000000000000000000000000000L,
 5731058186417515132221063394952L,
 85813091721736310254927217189L,
 6349042316505875821781301073204L,
 2356846126709988590164624736328L]

回答 9

为了获得确定性,高效且使用基本编程结构构建的,生成无重复值的随机值列表的程序,请考虑以下extractSamples定义的函数,

def extractSamples(populationSize, sampleSize, intervalLst) :
    import random
    if (sampleSize > populationSize) :
        raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
    samples = []
    while (len(samples) < sampleSize) :
        i = random.randint(0, (len(intervalLst)-1))
        (a,b) = intervalLst[i]
        sample = random.randint(a,b)
        if (a==b) :
            intervalLst.pop(i)
        elif (a == sample) : # shorten beginning of interval                                                                                                                                           
            intervalLst[i] = (sample+1, b)
        elif ( sample == b) : # shorten interval end                                                                                                                                                   
            intervalLst[i] = (a, sample - 1)
        else :
            intervalLst[i] = (a, sample - 1)
            intervalLst.append((sample+1, b))
        samples.append(sample)
    return samples

基本思想是跟踪intervalLst从中选择所需元素的可能值的间隔。从确定的意义上说,这是确定性的,我们可以保证在固定数量的步骤(仅取决于populationSizesampleSize)内生成样本。

要使用上述功能生成我们所需的列表,

In [3]: populationSize, sampleSize = 10**17, 10**5

In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms

我们还可以将其与较早的解决方案进行比较(以更低的种群数量值)

In [5]: populationSize, sampleSize = 10**8, 10**5

In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s

In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms

请注意,我降低了populationSize数值,因为使用random.sample解决方案时,它会为更高的值产生“内存错误” (也在此处此处的先前答案中提到)。对于上述值,我们还可以观察到其extractSamples性能优于该random.sample方法。

PS:尽管核心方法与我之前的回答类似,但是在实现和方法上都进行了实质性修改,同时还提高了清晰度。

In order to obtain a program that generates a list of random values without duplicates that is deterministic, efficient and built with basic programming constructs consider the function extractSamples defined below,

def extractSamples(populationSize, sampleSize, intervalLst) :
    import random
    if (sampleSize > populationSize) :
        raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
    samples = []
    while (len(samples) < sampleSize) :
        i = random.randint(0, (len(intervalLst)-1))
        (a,b) = intervalLst[i]
        sample = random.randint(a,b)
        if (a==b) :
            intervalLst.pop(i)
        elif (a == sample) : # shorten beginning of interval                                                                                                                                           
            intervalLst[i] = (sample+1, b)
        elif ( sample == b) : # shorten interval end                                                                                                                                                   
            intervalLst[i] = (a, sample - 1)
        else :
            intervalLst[i] = (a, sample - 1)
            intervalLst.append((sample+1, b))
        samples.append(sample)
    return samples

The basic idea is to keep track of intervals intervalLst for possible values from which to select our required elements from. This is deterministic in the sense that we are guaranteed to generate a sample within a fixed number of steps (solely dependent on populationSize and sampleSize).

To use the above function to generate our required list,

In [3]: populationSize, sampleSize = 10**17, 10**5

In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms

We may also compare with an earlier solution (for a lower value of populationSize)

In [5]: populationSize, sampleSize = 10**8, 10**5

In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s

In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms

Note that I reduced populationSize value as it produces Memory Error for higher values when using the random.sample solution (also mentioned in previous answers here and here). For above values, we can also observe that extractSamples outperforms the random.sample approach.

P.S. : Though the core approach is similar to my earlier answer, there are substantial modifications in implementation as well as approach alongwith improvement in clarity.


回答 10

一个非常简单的功能,也可以解决您的问题

from random import randint

data = []

def unique_rand(inicial, limit, total):

        data = []

        i = 0

        while i < total:
            number = randint(inicial, limit)
            if number not in data:
                data.append(number)
                i += 1

        return data


data = unique_rand(1, 60, 6)

print(data)


"""

prints something like 

[34, 45, 2, 36, 25, 32]

"""

A very simple function that also solves your problem

from random import randint

data = []

def unique_rand(inicial, limit, total):

        data = []

        i = 0

        while i < total:
            number = randint(inicial, limit)
            if number not in data:
                data.append(number)
                i += 1

        return data


data = unique_rand(1, 60, 6)

print(data)


"""

prints something like 

[34, 45, 2, 36, 25, 32]

"""

回答 11

基于集合的方法(“如果返回值中有随机值,请重试”)的问题是,由于冲突(需要另一次“重试”迭代),不确定它们的运行时间,尤其是当返回大量随机值时从范围。

以下是不倾向于这种不确定性运行时的替代方法:

import bisect
import random

def fast_sample(low, high, num):
    """ Samples :param num: integer numbers in range of
        [:param low:, :param high:) without replacement
        by maintaining a list of ranges of values that
        are permitted.

        This list of ranges is used to map a random number
        of a contiguous a range (`r_n`) to a permissible
        number `r` (from `ranges`).
    """
    ranges = [high]
    high_ = high - 1
    while len(ranges) - 1 < num:
        # generate a random number from an ever decreasing
        # contiguous range (which we'll map to the true
        # random number).
        # consider an example with low=0, high=10,
        # part way through this loop with:
        #
        # ranges = [0, 2, 3, 7, 9, 10]
        #
        # r_n :-> r
        #   0 :-> 1
        #   1 :-> 4
        #   2 :-> 5
        #   3 :-> 6
        #   4 :-> 8
        r_n = random.randint(low, high_)
        range_index = bisect.bisect_left(ranges, r_n)
        r = r_n + range_index
        for i in xrange(range_index, len(ranges)):
            if ranges[i] <= r:
                # as many "gaps" we iterate over, as much
                # is the true random value (`r`) shifted.
                r = r_n + i + 1
            elif ranges[i] > r_n:
                break
        # mark `r` as another "gap" of the original
        # [low, high) range.
        ranges.insert(i, r)
        # Fewer values possible.
        high_ -= 1
    # `ranges` happens to contain the result.
    return ranges[:-1]

The problem with the set based approaches (“if random value in return values, try again”) is that their runtime is undetermined due to collisions (which require another “try again” iteration), especially when a large amount of random values are returned from the range.

An alternative that isn’t prone to this non-deterministic runtime is the following:

import bisect
import random

def fast_sample(low, high, num):
    """ Samples :param num: integer numbers in range of
        [:param low:, :param high:) without replacement
        by maintaining a list of ranges of values that
        are permitted.

        This list of ranges is used to map a random number
        of a contiguous a range (`r_n`) to a permissible
        number `r` (from `ranges`).
    """
    ranges = [high]
    high_ = high - 1
    while len(ranges) - 1 < num:
        # generate a random number from an ever decreasing
        # contiguous range (which we'll map to the true
        # random number).
        # consider an example with low=0, high=10,
        # part way through this loop with:
        #
        # ranges = [0, 2, 3, 7, 9, 10]
        #
        # r_n :-> r
        #   0 :-> 1
        #   1 :-> 4
        #   2 :-> 5
        #   3 :-> 6
        #   4 :-> 8
        r_n = random.randint(low, high_)
        range_index = bisect.bisect_left(ranges, r_n)
        r = r_n + range_index
        for i in xrange(range_index, len(ranges)):
            if ranges[i] <= r:
                # as many "gaps" we iterate over, as much
                # is the true random value (`r`) shifted.
                r = r_n + i + 1
            elif ranges[i] > r_n:
                break
        # mark `r` as another "gap" of the original
        # [low, high) range.
        ranges.insert(i, r)
        # Fewer values possible.
        high_ -= 1
    # `ranges` happens to contain the result.
    return ranges[:-1]

回答 12

import random

sourcelist=[]
resultlist=[]

for x in range(100):
    sourcelist.append(x)

for y in sourcelist:
    resultlist.insert(random.randint(0,len(resultlist)),y)

print (resultlist)
import random

sourcelist=[]
resultlist=[]

for x in range(100):
    sourcelist.append(x)

for y in sourcelist:
    resultlist.insert(random.randint(0,len(resultlist)),y)

print (resultlist)

回答 13

如果希望确保添加的数字唯一,则可以使用Set对象

(如果使用2.7或更高版本),或者如果未使用,则导入sets模块。

正如其他人提到的那样,这意味着数字并不是真正的随机数。

If you wish to ensure that the numbers being added are unique, you could use a Set object

if using 2.7 or greater, or import the sets module if not.

As others have mentioned, this means the numbers are not truly random.


回答 14

采样整数而不在minval和之间进行替换maxval

import numpy as np

minval, maxval, n_samples = -50, 50, 10
generator = np.random.default_rng(seed=0)
samples = generator.permutation(np.arange(minval, maxval))[:n_samples]

# or, if minval is 0,
samples = generator.permutation(maxval)[:n_samples]

与贾克斯:

import jax

minval, maxval, n_samples = -50, 50, 10
key = jax.random.PRNGKey(seed=0)
samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]

to sample integers without replacement between minval and maxval:

import numpy as np

minval, maxval, n_samples = -50, 50, 10
generator = np.random.default_rng(seed=0)
samples = generator.permutation(np.arange(minval, maxval))[:n_samples]

# or, if minval is 0,
samples = generator.permutation(maxval)[:n_samples]

with jax:

import jax

minval, maxval, n_samples = -50, 50, 10
key = jax.random.PRNGKey(seed=0)
samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]

回答 15

从win xp中的CLI:

python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"

在加拿大,我们有6/49乐透。我只是将上面的代码包装在lotto.bat中,然后运行C:\home\lotto.bat或just C:\home\lotto

因为random.randint经常重复一个数字,所以我使用setrange(7)然后将其缩短为6。

有时,如果数字重复超过2倍,则结果列表长度将小于6。

编辑:但是,这random.sample(range(6,49),6)是正确的方法。

From the CLI in win xp:

python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"

In Canada we have the 6/49 Lotto. I just wrap the above code in lotto.bat and run C:\home\lotto.bat or just C:\home\lotto.

Because random.randint often repeats a number, I use set with range(7) and then shorten it to a length of 6.

Occasionally if a number repeats more than 2 times the resulting list length will be less than 6.

EDIT: However, random.sample(range(6,49),6) is the correct way to go.


回答 16

import random
result=[]
for i in range(1,50):
    rng=random.randint(1,20)
    result.append(rng)
import random
result=[]
for i in range(1,50):
    rng=random.randint(1,20)
    result.append(rng)

如何将字符串散列为8位数字?

问题:如何将字符串散列为8位数字?

无论如何,我可以将随机字符串散列为8位数字而无需自己实现任何算法吗?

Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?


回答 0

是的,您可以使用内置的hashlib模块或内置的hash函数。然后,对整数形式的哈希使用模运算或字符串切片运算来切掉最后八位数字:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

Yes, you can use the built-in hashlib modules or the built-in hash function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:

>>> s = 'she sells sea shells by the sea shore'

>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s).hexdigest(), 16) % (10 ** 8)
58097614L

>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

回答 1

Raymond的答案对于python2非常有用(尽管您不需要abs()或10 ** 8附近的括号)。但是,对于python3,有一些重要的警告。首先,您需要确保传递编码的字符串。如今,在大多数情况下,最好避开sha-1并改用sha-256之类的东西。因此,hashlib方法将是:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

如果要改用hash()函数,则需要注意的重要一点是,与Python 2.x不同,在Python 3.x中,hash()的结果将仅在进程内保持一致,而在整个python调用之间均保持一致。看这里:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

这意味着建议使用基于hash()的解决方案,可以将其简化为:

hash(s) % 10**8

将仅在给定的脚本运行中返回相同的值:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

因此,取决于这是否在您的应用程序中很重要(它在我的应用程序中确实如此),您可能希望坚持使用基于hashlib的方法。

Raymond’s answer is great for python2 (though, you don’t need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you’ll need to make sure you are passing an encoded string. These days, in most circumstances, it’s probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:

>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417

If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:

$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597

$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934

This means the hash()-based solution suggested, which can be shortened to just:

hash(s) % 10**8

will only return the same value within a given script run:

#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543

#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451

So, depending on if this matters in your application (it did in mine), you’ll probably want to stick to the hashlib-based approach.


回答 2

只是为了完成JJC答案,在python 3.5.3中,如果您通过以下方式使用hashlib,则行为是正确的:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:

$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded

$ python3 -V
Python 3.5.3

回答 3

我将分享由@Raymond Hettinger实施的解决方案的nodejs实施。

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

I am sharing our nodejs implementation of the solution as implemented by @Raymond Hettinger.

var crypto = require('crypto');
var s = 'she sells sea shells by the sea shore';
console.log(BigInt('0x' + crypto.createHash('sha1').update(s).digest('hex'))%(10n ** 8n));

如何在python中生成具有特定长度的随机数

问题:如何在python中生成具有特定长度的随机数

假设我需要一个3位数的数字,因此它类似于:

>>> random(3)
563

or

>>> random(5)
26748
>> random(2)
56

Let say I need a 3 digit number, so it would be something like:

>>> random(3)
563

or

>>> random(5)
26748
>> random(2)
56

回答 0

要获得一个随机的3位数字:

from random import randint
randint(100, 999)  # randint is inclusive at both ends

(假设您实际上是指三位数,而不是“最多三位数”。)

要使用任意数量的数字:

from random import randint

def random_with_N_digits(n):
    range_start = 10**(n-1)
    range_end = (10**n)-1
    return randint(range_start, range_end)

print random_with_N_digits(2)
print random_with_N_digits(3)
print random_with_N_digits(4)

输出:

33
124
5127

To get a random 3-digit number:

from random import randint
randint(100, 999)  # randint is inclusive at both ends

(assuming you really meant three digits, rather than “up to three digits”.)

To use an arbitrary number of digits:

from random import randint

def random_with_N_digits(n):
    range_start = 10**(n-1)
    range_end = (10**n)-1
    return randint(range_start, range_end)

print random_with_N_digits(2)
print random_with_N_digits(3)
print random_with_N_digits(4)

Output:

33
124
5127

回答 1

如果要将其作为字符串(例如10位电话号码),可以使用以下命令:

n = 10
''.join(["{}".format(randint(0, 9)) for num in range(0, n)])

If you want it as a string (for example, a 10-digit phone number) you can use this:

n = 10
''.join(["{}".format(randint(0, 9)) for num in range(0, n)])

回答 2

如果您需要一个3位数的数字并希望001-099是有效数字,则仍应使用randrange / randint,因为它比其他方法更快。转换为字符串时,只需在必要的前面加上零即可。

import random
num = random.randrange(1, 10**3)
# using format
num_with_zeros = '{:03}'.format(num)
# using string's zfill
num_with_zeros = str(num).zfill(3)

或者,如果您不想将随机数另存为int,则可以将其作为oneliner进行:

'{:03}'.format(random.randrange(1, 10**3))

python 3.6+只有oneliner:

f'{random.randrange(1, 10**3):03}'

上面的示例输出是:

  • ‘026’
  • ‘255’
  • ‘512’

实现为功能:

import random

def n_len_rand(len_, floor=1):
    top = 10**len_
    if floor > top:
        raise ValueError(f"Floor {floor} must be less than requested top {top}")
    return f'{random.randrange(floor, top):0{len_}}'

If you need a 3 digit number and want 001-099 to be valid numbers you should still use randrange/randint as it is quicker than alternatives. Just add the neccessary preceding zeros when converting to a string.

import random
num = random.randrange(1, 10**3)
# using format
num_with_zeros = '{:03}'.format(num)
# using string's zfill
num_with_zeros = str(num).zfill(3)

Alternatively if you don’t want to save the random number as an int you can just do it as a oneliner:

'{:03}'.format(random.randrange(1, 10**3))

python 3.6+ only oneliner:

f'{random.randrange(1, 10**3):03}'

Example outputs of the above are:

  • ‘026’
  • ‘255’
  • ‘512’

Implemented as a function:

import random

def n_len_rand(len_, floor=1):
    top = 10**len_
    if floor > top:
        raise ValueError(f"Floor {floor} must be less than requested top {top}")
    return f'{random.randrange(floor, top):0{len_}}'

回答 3

0是否算作可能的第一位数字?如果是这样,那么您需要random.randint(0,10**n-1)。如果没有,random.randint(10**(n-1),10**n-1)。如果永远不允许使用零,那么您将必须明确拒绝其中包含零的n random.randint(1,9)数字,或者绘制数字。

另外:有趣的是,randint(a,b)使用了一些非Python的“索引”来获取随机数a <= n <= b。有人可能希望它像这样工作range,并产生一个随机数a <= n < b。(请注意封闭的上限间隔。)

鉴于评论中的答复randrange,请注意可以用清洁剂代替它们random.randrange(0,10**n)random.randrange(10**(n-1),10**n)random.randrange(1,10)

Does 0 count as a possible first digit? If so, then you need random.randint(0,10**n-1). If not, random.randint(10**(n-1),10**n-1). And if zero is never allowed, then you’ll have to explicitly reject numbers with a zero in them, or draw n random.randint(1,9) numbers.

Aside: it is interesting that randint(a,b) uses somewhat non-pythonic “indexing” to get a random number a <= n <= b. One might have expected it to work like range, and produce a random number a <= n < b. (Note the closed upper interval.)

Given the responses in the comments about randrange, note that these can be replaced with the cleaner random.randrange(0,10**n), random.randrange(10**(n-1),10**n) and random.randrange(1,10).


回答 4

您可以为自己编写一个小函数来执行所需的操作:

import random
def randomDigits(digits):
    lower = 10**(digits-1)
    upper = 10**digits - 1
    return random.randint(lower, upper)

基本上,10**(digits-1)为您提供最小的{digit}位数字,并10**digits - 1为您提供最大的{digit}位数字(恰好是最小的{digit + 1}位数字减去1!)。然后,我们只是从该范围取一个随机整数。

You could write yourself a little function to do what you want:

import random
def randomDigits(digits):
    lower = 10**(digits-1)
    upper = 10**digits - 1
    return random.randint(lower, upper)

Basically, 10**(digits-1) gives you the smallest {digit}-digit number, and 10**digits - 1 gives you the largest {digit}-digit number (which happens to be the smallest {digit+1}-digit number minus 1!). Then we just take a random integer from that range.


回答 5

我真的很喜欢RichieHindle的答案,但是我很喜欢这个问题作为练习。这是使用字符串的蛮力实现:)

import random
first = random.randint(1,9)
first = str(first)
n = 5

nrs = [str(random.randrange(10)) for i in range(n-1)]
for i in range(len(nrs))    :
    first += str(nrs[i])

print str(first)

I really liked the answer of RichieHindle, however I liked the question as an exercise. Here’s a brute force implementation using strings:)

import random
first = random.randint(1,9)
first = str(first)
n = 5

nrs = [str(random.randrange(10)) for i in range(n-1)]
for i in range(len(nrs))    :
    first += str(nrs[i])

print str(first)

回答 6

官方文档来看,sample()方法似乎不适合此目的吗?

import random

def random_digits(n):
    num = range(0, 10)
    lst = random.sample(num, n)
    print str(lst).strip('[]')

输出:

>>>random_digits(5)
2, 5, 1, 0, 4

From the official documentation, does it not seem that the sample() method is appropriate for this purpose?

import random

def random_digits(n):
    num = range(0, 10)
    lst = random.sample(num, n)
    print str(lst).strip('[]')

Output:

>>>random_digits(5)
2, 5, 1, 0, 4

回答 7

您可以创建一个使用int列表,将字符串转换为连接并再次转换为int的函数,如下所示:

import random

def generate_random_number(length):
    return int(''.join([str(random.randint(0,10)) for _ in range(length)]))

You could create a function who consumes an list of int, transforms in string to concatenate and cast do int again, something like this:

import random

def generate_random_number(length):
    return int(''.join([str(random.randint(0,10)) for _ in range(length)]))

Python中numpy.random和random.random之间的区别

问题:Python中numpy.random和random.random之间的区别

我在Python中有一个大脚本。我从其他人的代码中获得启发,因此最终我将该numpy.random模块用于某些方面(例如,创建从二项式分布中获取的随机数数组),而在其他地方则使用该模块random.random

有人可以告诉我两者之间的主要区别吗?在这两个文档的文档网页上,我似乎numpy.random都拥有更多的方法,但是我不清楚随机数的生成方式有何不同。

我问的原因是因为我需要为调试目的播种我的主程序。但是,除非我在要导入的所有模块中使用相同的随机数生成器,否则它将无法正常工作?

另外,我在另一篇文章中阅读了有关不使用的讨论numpy.random.seed(),但是我并不真正理解为什么这是一个糟糕的主意。如果有人向我解释为什么会这样,我将不胜感激。

I have a big script in Python. I inspired myself in other people’s code so I ended up using the numpy.random module for some things (for example for creating an array of random numbers taken from a binomial distribution) and in other places I use the module random.random.

Can someone please tell me the major differences between the two? Looking at the doc webpage for each of the two it seems to me that numpy.random just has more methods, but I am unclear about how the generation of the random numbers is different.

The reason why I am asking is because I need to seed my main program for debugging purposes. But it doesn’t work unless I use the same random number generator in all the modules that I am importing, is this correct?

Also, I read here, in another post, a discussion about NOT using numpy.random.seed(), but I didn’t really understand why this was such a bad idea. I would really appreciate if someone explain me why this is the case.


回答 0

您已经做出了许多正确的观察!

除非您希望为两个随机生成器都作为种子,否则从长远来看选择一个或另一个生成器可能更简单。但是,如果您确实需要同时使用两者,那么是的,您还需要同时对两者进行播种,因为它们彼此独立地生成随机数。

对于numpy.random.seed(),主要的困难是它不是线程安全的-也就是说,如果您有许多不同的执行线程,则使用它是不安全的,因为如果两个不同的线程同时执行该函数,则不能保证它能正常工作。如果您不使用线程,并且可以合理地期望将来不需要以这种方式重写程序,那numpy.random.seed()应该很好。如果有任何理由怀疑您将来可能需要线程,那么从长远来看,按照建议进行操作并创建numpy.random.Random该类的本地实例会更加安全。据我所知,它random.random.seed()是线程安全的(或者至少没有找到相反的证据)。

numpy.random库包含一些科学研究中常用的额外概率分布,以及用于生成随机数据数组的几个便捷函数。该random.random库要轻巧一些,如果您不从事科学研究或其他统计工作,那应该很好。

否则,它们都使用Mersenne扭曲序列生成它们的随机数,并且它们都是完全确定性的-也就是说,如果您知道一些关键信息,则可以绝对确定地预测下一个数字。因此,numpy.random和random.random都不适合任何严重的加密用途。但是因为序列非常长,所以在您不必担心有人试图对数据进行反向工程的情况下,两者都适合生成随机数。这也是必须播种随机值的原因-如果每次都从同一位置开始,那么您将始终获得相同的随机数序列!

附带说明一下,如果您确实需要加密级别的随机性,则应该使用secrets模块,或者如果使用的是Python 3.6之前的Python版本,则应使用Crypto.Random之类的东西。

You have made many correct observations already!

Unless you’d like to seed both of the random generators, it’s probably simpler in the long run to choose one generator or the other. But if you do need to use both, then yes, you’ll also need to seed them both, because they generate random numbers independently of each other.

For numpy.random.seed(), the main difficulty is that it is not thread-safe – that is, it’s not safe to use if you have many different threads of execution, because it’s not guaranteed to work if two different threads are executing the function at the same time. If you’re not using threads, and if you can reasonably expect that you won’t need to rewrite your program this way in the future, numpy.random.seed() should be fine. If there’s any reason to suspect that you may need threads in the future, it’s much safer in the long run to do as suggested, and to make a local instance of the numpy.random.Random class. As far as I can tell, random.random.seed() is thread-safe (or at least, I haven’t found any evidence to the contrary).

The numpy.random library contains a few extra probability distributions commonly used in scientific research, as well as a couple of convenience functions for generating arrays of random data. The random.random library is a little more lightweight, and should be fine if you’re not doing scientific research or other kinds of work in statistics.

Otherwise, they both use the Mersenne twister sequence to generate their random numbers, and they’re both completely deterministic – that is, if you know a few key bits of information, it’s possible to predict with absolute certainty what number will come next. For this reason, neither numpy.random nor random.random is suitable for any serious cryptographic uses. But because the sequence is so very very long, both are fine for generating random numbers in cases where you aren’t worried about people trying to reverse-engineer your data. This is also the reason for the necessity to seed the random value – if you start in the same place each time, you’ll always get the same sequence of random numbers!

As a side note, if you do need cryptographic level randomness, you should use the secrets module, or something like Crypto.Random if you’re using a Python version earlier than Python 3.6.


回答 1

Python的数据分析中,该模块numpy.random对Python random进行了补充,以通过多种概率分布有效地生成样本值的整个数组。

相比之下,Python的内置random模块一次只能采样一个值,而numpy.random可以更快地生成非常大的采样。使用IPython魔术函数,%timeit可以看到哪个模块执行得更快:

In [1]: from random import normalvariate
In [2]: N = 1000000

In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop

In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop

From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

By contrast, Python’s built-in random module only samples one value at a time, while numpy.random can generate very large sample faster. Using IPython magic function %timeit one can see which module performs faster:

In [1]: from random import normalvariate
In [2]: N = 1000000

In [3]: %timeit samples = [normalvariate(0, 1) for _ in xrange(N)]
1 loop, best of 3: 963 ms per loop

In [4]: %timeit np.random.normal(size=N)
10 loops, best of 3: 38.5 ms per loop

回答 2

种子的来源和使用的分发配置文件将影响输出-如果您正在寻找加密随机性,则os.urandom()的种子将从设备颤动(即以太网或磁盘)中获得几乎真实的随机字节(即/ BSD上的dev / random)

这样可以避免您提供种子,从而避免生成确定的随机数。但是,随机调用然后允许您将数字拟合为一个分布(我称之为科学随机性-最终,您想要的只是一个随机数的钟形曲线分布,numpy最擅长于解决这个问题。

所以,是的,坚持使用一个生成器,但要确定您想要的随机数-随机,但是会偏离分布曲线,或者在没有量子设备的情况下尽可能地随机。

The source of the seed and the distribution profile used are going to affect the outputs – if you are looking for cryptgraphic randomness, seeding from os.urandom() will get nearly real random bytes from device chatter (ie ethernet or disk) (ie /dev/random on BSD)

this will avoid you giving a seed and so generating determinisitic random numbers. However the random calls then allow you to fit the numbers to a distribution (what I call scientific random ness – eventually all you want is a bell curve distribution of random numbers, numpy is best at delviering this.

SO yes, stick with one generator, but decide what random you want – random, but defitniely from a distrubtuion curve, or as random as you can get without a quantum device.


如何从python中的字典中获取随机值

问题:如何从python中的字典中获取随机值

如何从中获得随机对dict?我正在制作一款游戏,您需要猜测一个国家的首都,并且需要随机出现的问题。

dict模样{'VENEZUELA':'CARACAS'}

我怎样才能做到这一点?

How can I get a random pair from a dict? I’m making a game where you need to guess a capital of a country and I need questions to appear randomly.

The dict looks like {'VENEZUELA':'CARACAS'}

How can I do this?


回答 0

一种方法是:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'OTTAWA'}
random.choice(list(d.values()))

编辑:该问题在原始帖子发布后的几年内已更改,现在要求使用一对,而不是单个物品。现在的最后一行应该是:

country, capital = random.choice(list(d.items()))

One way would be:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'OTTAWA'}
random.choice(list(d.values()))

EDIT: The question was changed a couple years after the original post, and now asks for a pair, rather than a single item. The final line should now be:

country, capital = random.choice(list(d.items()))

回答 1

我写这个试图解决同样的问题:

https://github.com/robtandy/randomdict

它具有O(1)对键,值和项的随机访问。

I wrote this trying to solve the same problem:

https://github.com/robtandy/randomdict

It has O(1) random access to keys, values, and items.


回答 2

试试这个:

import random
a = dict(....) # a is some dictionary
random_key = random.sample(a, 1)[0]

这绝对有效。

Try this:

import random
a = dict(....) # a is some dictionary
random_key = random.sample(a, 1)[0]

This definitely works.


回答 3

如果您不想使用该random模块,也可以尝试popitem()

>> d = {'a': 1, 'b': 5, 'c': 7}
>>> d.popitem()
('a', 1)
>>> d
{'c': 7, 'b': 5}
>>> d.popitem()
('c', 7)

由于dict 不保留订单,因此使用popitem可以从中获得任意(但不是严格随机)顺序的项目。

还请记住popitem,如docs中所述,从字典中删除键值对。

popitem()可用于破坏性地迭代字典

If you don’t want to use the random module, you can also try popitem():

>> d = {'a': 1, 'b': 5, 'c': 7}
>>> d.popitem()
('a', 1)
>>> d
{'c': 7, 'b': 5}
>>> d.popitem()
('c', 7)

Since the dict doesn’t preserve order, by using popitem you get items in an arbitrary (but not strictly random) order from it.

Also keep in mind that popitem removes the key-value pair from dictionary, as stated in the docs.

popitem() is useful to destructively iterate over a dictionary


回答 4

>>> import random
>>> d = dict(Venezuela = 1, Spain = 2, USA = 3, Italy = 4)
>>> random.choice(d.keys())
'Venezuela'
>>> random.choice(d.keys())
'USA'

通过在字典(国家/地区)的上调用random.choicekeys

>>> import random
>>> d = dict(Venezuela = 1, Spain = 2, USA = 3, Italy = 4)
>>> random.choice(d.keys())
'Venezuela'
>>> random.choice(d.keys())
'USA'

By calling random.choice on the keys of the dictionary (the countries).


回答 5

这适用于Python 2和Python 3:

随机密钥:

random.choice(list(d.keys()))

随机值

random.choice(list(d.values()))

随机键和值

random.choice(list(d.items()))

This works in Python 2 and Python 3:

A random key:

random.choice(list(d.keys()))

A random value

random.choice(list(d.values()))

A random key and value

random.choice(list(d.items()))

回答 6

如果您不想使用random.choice(),可以尝试以下方式:

>>> list(myDictionary)[i]
'VENEZUELA'
>>> myDictionary = {'VENEZUELA':'CARACAS', 'IRAN' : 'TEHRAN'}
>>> import random
>>> i = random.randint(0, len(myDictionary) - 1)
>>> myDictionary[list(myDictionary)[i]]
'TEHRAN'
>>> list(myDictionary)[i]
'IRAN'

If you don’t want to use random.choice() you can try this way:

>>> list(myDictionary)[i]
'VENEZUELA'
>>> myDictionary = {'VENEZUELA':'CARACAS', 'IRAN' : 'TEHRAN'}
>>> import random
>>> i = random.randint(0, len(myDictionary) - 1)
>>> myDictionary[list(myDictionary)[i]]
'TEHRAN'
>>> list(myDictionary)[i]
'IRAN'

回答 7

由于原始帖子想要这

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}
country, capital = random.choice(list(d.items()))

(python 3样式)

Since the original post wanted the pair:

import random
d = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}
country, capital = random.choice(list(d.items()))

(python 3 style)


回答 8

由于这是家庭作业:

找出random.sample()哪个将选择并从列表中返回一个随机元素。您可以使用来获得字典键列表和来获得dict.keys()字典值列表dict.values()

Since this is homework:

Check out random.sample() which will select and return a random element from an list. You can get a list of dictionary keys with dict.keys() and a list of dictionary values with dict.values().


回答 9

我假设您正在做一种测验的应用程序。对于这种应用程序,我编写了一个函数,如下所示:

def shuffle(q):
"""
The input of the function will 
be the dictionary of the question
and answers. The output will
be a random question with answer
"""
selected_keys = []
i = 0
while i < len(q):
    current_selection = random.choice(q.keys())
    if current_selection not in selected_keys:
        selected_keys.append(current_selection)
        i = i+1
        print(current_selection+'? '+str(q[current_selection]))

如果我将给出的输入questions = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'}并调用函数shuffle(questions),则输出将如下所示:

委内瑞拉?卡拉卡斯
加拿大?多伦多

您还可以通过改组选项进一步扩展此范围

I am assuming that you are making a quiz kind of application. For this kind of application I have written a function which is as follows:

def shuffle(q):
"""
The input of the function will 
be the dictionary of the question
and answers. The output will
be a random question with answer
"""
selected_keys = []
i = 0
while i < len(q):
    current_selection = random.choice(q.keys())
    if current_selection not in selected_keys:
        selected_keys.append(current_selection)
        i = i+1
        print(current_selection+'? '+str(q[current_selection]))

If I will give the input of questions = {'VENEZUELA':'CARACAS', 'CANADA':'TORONTO'} and call the function shuffle(questions) Then the output will be as follows:

VENEZUELA? CARACAS
CANADA? TORONTO

You can extend this further more by shuffling the options also


回答 10

试试这个(使用来自项目的random.choice)

import random

a={ "str" : "sda" , "number" : 123, 55 : "num"}
random.choice(list(a.items()))
#  ('str', 'sda')
random.choice(list(a.items()))[1] # getting a value
#  'num'

Try this (using random.choice from items)

import random

a={ "str" : "sda" , "number" : 123, 55 : "num"}
random.choice(list(a.items()))
#  ('str', 'sda')
random.choice(list(a.items()))[1] # getting a value
#  'num'

回答 11

与Python(自3)的现代版本,对象的方法返回dict.keys()dict.values()dict.items()在视图对象*。嘿可以迭代,因此直接使用random.choice是不可能的,因为现在它们不是列表或集合。

一种选择是使用列表理解来完成以下工作random.choice

import random

colors = {
    'purple': '#7A4198',
    'turquoise':'#9ACBC9',
    'orange': '#EF5C35',
    'blue': '#19457D',
    'green': '#5AF9B5',
    'red': ' #E04160',
    'yellow': '#F9F985'
}

color=random.choice([hex_color for color_value in colors.values()]

print(f'The new color is: {color}')

参考文献:

With modern versions of Python(since 3), the objects returned by methods dict.keys(), dict.values() and dict.items() are view objects*. And hey can be iterated, so using directly random.choice is not possible as now they are not a list or set.

One option is to use list comprehension to do the job with random.choice:

import random

colors = {
    'purple': '#7A4198',
    'turquoise':'#9ACBC9',
    'orange': '#EF5C35',
    'blue': '#19457D',
    'green': '#5AF9B5',
    'red': ' #E04160',
    'yellow': '#F9F985'
}

color=random.choice([hex_color for color_value in colors.values()]

print(f'The new color is: {color}')

References:


回答 12

b = { 'video':0, 'music':23,"picture":12 } 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('picture', 12) 
random.choice(tuple(b.items())) ('video', 0) 
b = { 'video':0, 'music':23,"picture":12 } 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('music', 23) 
random.choice(tuple(b.items())) ('picture', 12) 
random.choice(tuple(b.items())) ('video', 0) 

回答 13

我通过寻找一个相当可比的解决方案找到了这篇文章。为了从一个字典中挑选多个元素,可以使用:

idx_picks = np.random.choice(len(d), num_of_picks, replace=False) #(Don't pick the same element twice)
result = dict ()
c_keys = [d.keys()] #not so efficient - unfortunately .keys() returns a non-indexable object because dicts are unordered
for i in idx_picks:
    result[c_keys[i]] = d[i]

I found this post by looking for a rather comparable solution. For picking multiple elements out of a dict, this can be used:

idx_picks = np.random.choice(len(d), num_of_picks, replace=False) #(Don't pick the same element twice)
result = dict ()
c_keys = [d.keys()] #not so efficient - unfortunately .keys() returns a non-indexable object because dicts are unordered
for i in idx_picks:
    result[c_keys[i]] = d[i]