分类目录归档:知识问答

如何从字典中获取值列表?

问题:如何从字典中获取值列表?

如何在Python中获取字典中的值列表?

在Java中,将Map的值作为List变得容易list = map.values();。我想知道Python中是否有类似的简单方法可以从字典中获取值列表。

How can I get a list of the values in a dict in Python?

In Java, getting the values of a Map as a List is as easy as doing list = map.values();. I’m wondering if there is a similarly simple way in Python to get a list of values from a dict.


回答 0

是的,这与Python 2完全相同:

d.values()

Python 3中(在其中dict.values返回字典值的视图):

list(d.values())

Yes it’s the exact same thing in Python 2:

d.values()

In Python 3 (where dict.values returns a view of the dictionary’s values instead):

list(d.values())

回答 1

您可以使用*运算符解压缩dict_values:

>>> d = {1: "a", 2: "b"}
>>> [*d.values()]
['a', 'b']

或列出对象

>>> d = {1: "a", 2: "b"}
>>> list(d.values())
['a', 'b']

You can use * operator to unpack dict_values:

>>> d = {1: "a", 2: "b"}
>>> [*d.values()]
['a', 'b']

or list object

>>> d = {1: "a", 2: "b"}
>>> list(d.values())
['a', 'b']

回答 2

应该有一种方法,最好只有一种方法。

因此list(dictionary.values())一种方法

但是,考虑到Python3,更快的方法是什么?

[*L]vs. [].extend(L)vs.list(L)

small_ds = {x: str(x+42) for x in range(10)}
small_df = {x: float(x+42) for x in range(10)}

print('Small Dict(str)')
%timeit [*small_ds.values()]
%timeit [].extend(small_ds.values())
%timeit list(small_ds.values())

print('Small Dict(float)')
%timeit [*small_df.values()]
%timeit [].extend(small_df.values())
%timeit list(small_df.values())

big_ds = {x: str(x+42) for x in range(1000000)}
big_df = {x: float(x+42) for x in range(1000000)}

print('Big Dict(str)')
%timeit [*big_ds.values()]
%timeit [].extend(big_ds.values())
%timeit list(big_ds.values())

print('Big Dict(float)')
%timeit [*big_df.values()]
%timeit [].extend(big_df.values())
%timeit list(big_df.values())
Small Dict(str)
256 ns ± 3.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
338 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
336 ns ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Small Dict(float)
268 ns ± 0.297 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
343 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
336 ns ± 0.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Big Dict(str)
17.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.5 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.2 ms ± 19.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Big Dict(float)
13.2 ms ± 41 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.1 ms ± 919 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
12.8 ms ± 578 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

在Intel®Core™i7-8650U CPU @ 1.90GHz上完成。

# Name                    Version                   Build
ipython                   7.5.0            py37h24bf2e0_0

结果

  1. 对于小词典* operator更快
  2. 对于重要的大字典来说,list()可能会更快

There should be one ‒ and preferably only one ‒ obvious way to do it.

Therefore list(dictionary.values()) is the one way.

Yet, considering Python3, what is quicker?

[*L] vs. [].extend(L) vs. list(L)

small_ds = {x: str(x+42) for x in range(10)}
small_df = {x: float(x+42) for x in range(10)}

print('Small Dict(str)')
%timeit [*small_ds.values()]
%timeit [].extend(small_ds.values())
%timeit list(small_ds.values())

print('Small Dict(float)')
%timeit [*small_df.values()]
%timeit [].extend(small_df.values())
%timeit list(small_df.values())

big_ds = {x: str(x+42) for x in range(1000000)}
big_df = {x: float(x+42) for x in range(1000000)}

print('Big Dict(str)')
%timeit [*big_ds.values()]
%timeit [].extend(big_ds.values())
%timeit list(big_ds.values())

print('Big Dict(float)')
%timeit [*big_df.values()]
%timeit [].extend(big_df.values())
%timeit list(big_df.values())
Small Dict(str)
256 ns ± 3.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
338 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
336 ns ± 1.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Small Dict(float)
268 ns ± 0.297 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
343 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
336 ns ± 0.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Big Dict(str)
17.5 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.5 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16.2 ms ± 19.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Big Dict(float)
13.2 ms ± 41 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.1 ms ± 919 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
12.8 ms ± 578 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Done on Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz.

# Name                    Version                   Build
ipython                   7.5.0            py37h24bf2e0_0

The result

  1. For small dictionaries * operator is quicker
  2. For big dictionaries where it matters list() is maybe slightly quicker

回答 3

请按照以下示例-

songs = [
{"title": "happy birthday", "playcount": 4},
{"title": "AC/DC", "playcount": 2},
{"title": "Billie Jean", "playcount": 6},
{"title": "Human Touch", "playcount": 3}
]

print("====================")
print(f'Songs --> {songs} \n')
title = list(map(lambda x : x['title'], songs))
print(f'Print Title --> {title}')

playcount = list(map(lambda x : x['playcount'], songs))
print(f'Print Playcount --> {playcount}')
print (f'Print Sorted playcount --> {sorted(playcount)}')

# Aliter -
print(sorted(list(map(lambda x: x['playcount'],songs))))

Follow the below example —

songs = [
{"title": "happy birthday", "playcount": 4},
{"title": "AC/DC", "playcount": 2},
{"title": "Billie Jean", "playcount": 6},
{"title": "Human Touch", "playcount": 3}
]

print("====================")
print(f'Songs --> {songs} \n')
title = list(map(lambda x : x['title'], songs))
print(f'Print Title --> {title}')

playcount = list(map(lambda x : x['playcount'], songs))
print(f'Print Playcount --> {playcount}')
print (f'Print Sorted playcount --> {sorted(playcount)}')

# Aliter -
print(sorted(list(map(lambda x: x['playcount'],songs))))

回答 4

out: dict_values([{1:a, 2:b}])

in:  str(dict.values())[14:-3]    
out: 1:a, 2:b

纯粹出于视觉目的。不会产生有用的产品…仅在您希望长字典以段落类型形式打印时才有用。

out: dict_values([{1:a, 2:b}])

in:  str(dict.values())[14:-3]    
out: 1:a, 2:b

Purely for visual purposes. Does not produce a useful product… Only useful if you want a long dictionary to print in a paragraph type form.


为什么Python中没有元组理解?

问题:为什么Python中没有元组理解?

众所周知,列表理解

[i for i in [1, 2, 3, 4]]

并且有字典理解,例如

{i:j for i, j in {1: 'a', 2: 'b'}.items()}

(i for i in (1, 2, 3))

最终将成为生成器,而不是tuple理解力。这是为什么?

我的猜测是a tuple是不可变的,但这似乎并不是答案。

As we all know, there’s list comprehension, like

[i for i in [1, 2, 3, 4]]

and there is dictionary comprehension, like

{i:j for i, j in {1: 'a', 2: 'b'}.items()}

but

(i for i in (1, 2, 3))

will end up in a generator, not a tuple comprehension. Why is that?

My guess is that a tuple is immutable, but this does not seem to be the answer.


回答 0

您可以使用生成器表达式:

tuple(i for i in (1, 2, 3))

但是对于…生成器表达式,已经使用了括号。

You can use a generator expression:

tuple(i for i in (1, 2, 3))

but parentheses were already taken for … generator expressions.


回答 1

Raymond Hettinger(Python核心开发人员之一)在最近的一条推文中曾这样说过元组:

#python提示:通常,列表用于循环;结构的元组。列表是同质的;元组异构。列出可变长度。

(对我来说)支持这样的想法:如果序列中的项目相关性足以由生成器生成,那么它应该是一个列表。尽管元组是可迭代的,并且看起来只是一个不可变的列表,但它实际上与C结构的Python等效:

struct {
    int a;
    char b;
    float c;
} foo;

struct foo x = { 3, 'g', 5.9 };

成为Python

x = (3, 'g', 5.9)

Raymond Hettinger (one of the Python core developers) had this to say about tuples in a recent tweet:

#python tip: Generally, lists are for looping; tuples for structs. Lists are homogeneous; tuples heterogeneous. Lists for variable length.

This (to me) supports the idea that if the items in a sequence are related enough to be generated by a, well, generator, then it should be a list. Although a tuple is iterable and seems like simply a immutable list, it’s really the Python equivalent of a C struct:

struct {
    int a;
    char b;
    float c;
} foo;

struct foo x = { 3, 'g', 5.9 };

becomes in Python

x = (3, 'g', 5.9)

回答 2

从Python 3.5开始,您还可以使用splat *解包语法来解压缩生成器表达式:

*(x for x in range(10)),

Since Python 3.5, you can also use splat * unpacking syntax to unpack a generator expresion:

*(x for x in range(10)),

回答 3

正如另一位发帖人macm所述,从生成器创建元组的最快方法是tuple([generator])


性能比较

  • 清单理解:

    $ python3 -m timeit "a = [i for i in range(1000)]"
    10000 loops, best of 3: 27.4 usec per loop
    
  • 来自列表理解的元组:

    $ python3 -m timeit "a = tuple([i for i in range(1000)])"
    10000 loops, best of 3: 30.2 usec per loop
    
  • 生成器中的元组:

    $ python3 -m timeit "a = tuple(i for i in range(1000))"
    10000 loops, best of 3: 50.4 usec per loop
    
  • 打开包装的元组:

    $ python3 -m timeit "a = *(i for i in range(1000)),"
    10000 loops, best of 3: 52.7 usec per loop
    

我的python版本

$ python3 --version
Python 3.6.3

因此,除非性能不是问题,否则应始终从列表理解中创建一个元组。

As another poster macm mentioned, the fastest way to create a tuple from a generator is tuple([generator]).


Performance Comparison

  • List comprehension:

    $ python3 -m timeit "a = [i for i in range(1000)]"
    10000 loops, best of 3: 27.4 usec per loop
    
  • Tuple from list comprehension:

    $ python3 -m timeit "a = tuple([i for i in range(1000)])"
    10000 loops, best of 3: 30.2 usec per loop
    
  • Tuple from generator:

    $ python3 -m timeit "a = tuple(i for i in range(1000))"
    10000 loops, best of 3: 50.4 usec per loop
    
  • Tuple from unpacking:

    $ python3 -m timeit "a = *(i for i in range(1000)),"
    10000 loops, best of 3: 52.7 usec per loop
    

My version of python:

$ python3 --version
Python 3.6.3

So you should always create a tuple from a list comprehension unless performance is not an issue.


回答 4

理解通过循环或迭代项并将它们分配到容器中来工作,元组无法接收分配。

创建元组后,将无法对其进行追加,扩展或分配。修改元组的唯一方法是是否可以将其对象之一本身分配给它(是非元组容器)。因为元组仅持有对此类对象的引用。

另外-元组有自己的构造函数tuple(),您可以提供任何迭代器。这意味着要创建一个元组,您可以执行以下操作:

tuple(i for i in (1,2,3))

Comprehension works by looping or iterating over items and assigning them into a container, a Tuple is unable to receive assignments.

Once a Tuple is created, it can not be appended to, extended, or assigned to. The only way to modify a Tuple is if one of its objects can itself be assigned to (is a non-tuple container). Because the Tuple is only holding a reference to that kind of object.

Also – a tuple has its own constructor tuple() which you can give any iterator. Which means that to create a tuple, you could do:

tuple(i for i in (1,2,3))

回答 5

我最好的猜测是,他们用完了括号,并认为这对警告添加“丑陋”语法没有足够的帮助…

My best guess is that they ran out of brackets and didn’t think it would be useful enough to warrent adding an “ugly” syntax …


回答 6

元组不能像列表一样有效地附加。

因此,元组理解将需要在内部使用列表,然后转换为元组。

那将与您现在所做的相同:tuple([comprehension])

Tuples cannot efficiently be appended like a list.

So a tuple comprehension would need to use a list internally and then convert to a tuple.

That would be the same as what you do now : tuple( [ comprehension ] )


回答 7

括号不会创建元组。又名一个=(两个)不是元组。唯一的解决方法是一个=(两个)或一个=元组(两个)。因此,解决方案是:

tuple(i for i in myothertupleorlistordict) 

Parentheses do not create a tuple. aka one = (two) is not a tuple. The only way around is either one = (two,) or one = tuple(two). So a solution is:

tuple(i for i in myothertupleorlistordict) 

回答 8

我相信这只是为了清楚起见,我们不想用太多不同的符号来使语言混乱。同样tuple也不需要理解,可以使用列表,而速度差异可以忽略不计,而不是像dict理解而不是list理解。

I believe it’s simply for the sake of clarity, we do not want to clutter the language with too many different symbols. Also a tuple comprehension is never necessary, a list can just be used instead with negligible speed differences, unlike a dict comprehension as opposed to a list comprehension.


回答 9

我们可以从列表理解中生成元组。下一个将两个数字顺序加到一个元组中,并给出一个从0-9的列表。

>>> print k
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> r= [tuple(k[i:i+2]) for i in xrange(10) if not i%2]
>>> print r
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

We can generate tuples from a list comprehension. The following one adds two numbers sequentially into a tuple and gives a list from numbers 0-9.

>>> print k
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> r= [tuple(k[i:i+2]) for i in xrange(10) if not i%2]
>>> print r
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

使用“时间”模块测量经过时间

问题:使用“时间”模块测量经过时间

使用python中的时间模块,可以测量经过的时间吗?如果是这样,我该怎么做?

我需要这样做,以便如果光标在小部件中已存在一定时间,则会发生事件。

With the Time module in python is it possible to measure elapsed time? If so, how do I do that?

I need to do this so that if the cursor has been in a widget for a certain duration an event happens.


回答 0

start_time = time.time()
# your code
elapsed_time = time.time() - start_time

您还可以编写简单的装饰器来简化各种功能的执行时间的度量:

import time
from functools import wraps

PROF_DATA = {}

def profile(fn):
    @wraps(fn)
    def with_profiling(*args, **kwargs):
        start_time = time.time()

        ret = fn(*args, **kwargs)

        elapsed_time = time.time() - start_time

        if fn.__name__ not in PROF_DATA:
            PROF_DATA[fn.__name__] = [0, []]
        PROF_DATA[fn.__name__][0] += 1
        PROF_DATA[fn.__name__][1].append(elapsed_time)

        return ret

    return with_profiling

def print_prof_data():
    for fname, data in PROF_DATA.items():
        max_time = max(data[1])
        avg_time = sum(data[1]) / len(data[1])
        print "Function %s called %d times. " % (fname, data[0]),
        print 'Execution time max: %.3f, average: %.3f' % (max_time, avg_time)

def clear_prof_data():
    global PROF_DATA
    PROF_DATA = {}

用法:

@profile
def your_function(...):
    ...

您可以同时分析多个功能。然后要打印测量值,只需调用print_prof_data():

start_time = time.time()
# your code
elapsed_time = time.time() - start_time

You can also write simple decorator to simplify measurement of execution time of various functions:

import time
from functools import wraps

PROF_DATA = {}

def profile(fn):
    @wraps(fn)
    def with_profiling(*args, **kwargs):
        start_time = time.time()

        ret = fn(*args, **kwargs)

        elapsed_time = time.time() - start_time

        if fn.__name__ not in PROF_DATA:
            PROF_DATA[fn.__name__] = [0, []]
        PROF_DATA[fn.__name__][0] += 1
        PROF_DATA[fn.__name__][1].append(elapsed_time)

        return ret

    return with_profiling

def print_prof_data():
    for fname, data in PROF_DATA.items():
        max_time = max(data[1])
        avg_time = sum(data[1]) / len(data[1])
        print "Function %s called %d times. " % (fname, data[0]),
        print 'Execution time max: %.3f, average: %.3f' % (max_time, avg_time)

def clear_prof_data():
    global PROF_DATA
    PROF_DATA = {}

Usage:

@profile
def your_function(...):
    ...

You can profile more then one function simultaneously. Then to print measurements just call the print_prof_data():


回答 1

time.time() 会做的工作。

import time

start = time.time()
# run your code
end = time.time()

elapsed = end - start

您可能想看一下这个问题,但是我认为没有必要。

time.time() will do the job.

import time

start = time.time()
# run your code
end = time.time()

elapsed = end - start

You may want to look at this question, but I don’t think it will be necessary.


回答 2

对于想要更好格式的用户,

import time
start_time = time.time()
# your script
elapsed_time = time.time() - start_time
time.strftime("%H:%M:%S", time.gmtime(elapsed_time))

将打印出2秒钟:

'00:00:02'

一秒钟持续7分钟:

'00:07:01'

请注意,使用gmtime的最小时间单位是秒。如果需要微秒,请考虑以下事项:

import datetime
start = datetime.datetime.now()
# some code
end = datetime.datetime.now()
elapsed = end - start
print(elapsed)
# or
print(elapsed.seconds,":",elapsed.microseconds) 

strftime 文档

For users that want better formatting,

import time
start_time = time.time()
# your script
elapsed_time = time.time() - start_time
time.strftime("%H:%M:%S", time.gmtime(elapsed_time))

will print out, for 2 seconds:

'00:00:02'

and for 7 minutes one second:

'00:07:01'

note that the minimum time unit with gmtime is seconds. If you need microseconds consider the following:

import datetime
start = datetime.datetime.now()
# some code
end = datetime.datetime.now()
elapsed = end - start
print(elapsed)
# or
print(elapsed.seconds,":",elapsed.microseconds) 

strftime documentation


回答 3

为了获得最佳的经过时间度量(自Python 3.3起),请使用time.perf_counter()

返回性能计数器的值(以小数秒为单位),即具有最高可用分辨率的时钟以测量较短的持续时间。它确实包括整个系统的睡眠时间。返回值的参考点是不确定的,因此仅连续调用结果之间的差有效。

对于小时/天量级的测量,您不必担心亚秒级分辨率,请time.monotonic()改用。

返回单调时钟的值(以小数秒为单位),即不能向后移动的时钟。时钟不受系统时钟更新的影响。返回值的参考点是不确定的,因此仅连续调用结果之间的差有效。

在许多实现中,这些实际上可能是同一件事。

在3.3之前,您会受困于time.clock()

在Unix上,以秒为单位返回当前处理器时间,以浮点数表示。精度,实际上是“处理器时间”含义的确切定义,取决于同名C函数的精度。

在Windows上,此函数将基于Win32函数QueryPerformanceCounter()返回自第一次调用此函数以来经过的时间(以秒为单位)的浮点数。分辨率通常优于一微秒。


Python 3.7更新

PEP 564是Python 3.7中的新功能-添加具有纳秒分辨率的新时间函数。

使用这些可以进一步消除舍入和浮点错误,尤其是在测量周期很短或应用程序(或Windows计算机)正在长时间运行时。

perf_counter()大约100天后,分辨率开始下降。因此,例如,经过一年的正常运行时间后,它可以测量的最短间隔(大于0)将大于开始时的间隔。


Python 3.8更新

time.clock 现在不见了。

For the best measure of elapsed time (since Python 3.3), use time.perf_counter().

Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

For measurements on the order of hours/days, you don’t care about sub-second resolution so use time.monotonic() instead.

Return the value (in fractional seconds) of a monotonic clock, i.e. a clock that cannot go backwards. The clock is not affected by system clock updates. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid.

In many implementations, these may actually be the same thing.

Before 3.3, you’re stuck with time.clock().

On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of “processor time”, depends on that of the C function of the same name.

On Windows, this function returns wall-clock seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter(). The resolution is typically better than one microsecond.


Update for Python 3.7

New in Python 3.7 is PEP 564 — Add new time functions with nanosecond resolution.

Use of these can further eliminate rounding and floating-point errors, especially if you’re measuring very short periods, or your application (or Windows machine) is long-running.

Resolution starts breaking down on perf_counter() after around 100 days. So for example after a year of uptime, the shortest interval (greater than 0) it can measure will be bigger than when it started.


Update for Python 3.8

time.clock is now gone.


回答 4

更长的时间。

import time
start_time = time.time()
...
e = int(time.time() - start_time)
print('{:02d}:{:02d}:{:02d}'.format(e // 3600, (e % 3600 // 60), e % 60))

会打印

00:03:15

如果超过24小时

25:33:57

这是受到罗格·霍夫斯特(Rutger Hofste)的回答的启发。谢谢罗格!

For a longer period.

import time
start_time = time.time()
...
e = int(time.time() - start_time)
print('{:02d}:{:02d}:{:02d}'.format(e // 3600, (e % 3600 // 60), e % 60))

would print

00:03:15

if more than 24 hours

25:33:57

That is inspired by Rutger Hofste’s answer. Thank you Rutger!


回答 5

您需要导入时间,然后使用time.time()方法知道当前时间。

import time

start_time=time.time() #taking current time as starting time

#here your code

elapsed_time=time.time()-start_time #again taking current time - starting time 

You need to import time and then use time.time() method to know current time.

import time

start_time=time.time() #taking current time as starting time

#here your code

elapsed_time=time.time()-start_time #again taking current time - starting time 

回答 6

安排时间的另一种不错的方法是使用with python结构。

具有结构的对象会自动调用__enter____exit__方法,这正是我们计时所需的时间。

让我们创建一个Timer类。

from time import time

class Timer():
    def __init__(self, message):
        self.message = message
    def __enter__(self):
        self.start = time()
        return None  # could return anything, to be used like this: with Timer("Message") as value:
    def __exit__(self, type, value, traceback):
        elapsed_time = (time() - self.start) * 1000
        print(self.message.format(elapsed_time))

然后,可以使用Timer类,如下所示:

with Timer("Elapsed time to compute some prime numbers: {}ms"):
    primes = []
    for x in range(2, 500):
        if not any(x % p == 0 for p in primes):
            primes.append(x)
    print("Primes: {}".format(primes))

结果如下:

素数:[2、3、5、7、11、13、17、19、23、29、31、37、41、43、47、53、59、61、67、71、73、79、83、89 ,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227 ,229、233、239、241、251、257、263、269、271、277、281、283、293、307、311、313、317、331、337、347、349、353、359、367、373 ,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499]

计算一些质数所需的时间:5.01704216003418ms

Another nice way to time things is to use the with python structure.

with structure is automatically calling __enter__ and __exit__ methods which is exactly what we need to time things.

Let’s create a Timer class.

from time import time

class Timer():
    def __init__(self, message):
        self.message = message
    def __enter__(self):
        self.start = time()
        return None  # could return anything, to be used like this: with Timer("Message") as value:
    def __exit__(self, type, value, traceback):
        elapsed_time = (time() - self.start) * 1000
        print(self.message.format(elapsed_time))

Then, one can use the Timer class like this:

with Timer("Elapsed time to compute some prime numbers: {}ms"):
    primes = []
    for x in range(2, 500):
        if not any(x % p == 0 for p in primes):
            primes.append(x)
    print("Primes: {}".format(primes))

The result is the following:

Primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499]

Elapsed time to compute some prime numbers: 5.01704216003418ms


回答 7

Vadim Shender的反应很棒。您还可以使用以下更简单的装饰器:

import datetime
def calc_timing(original_function):                            
    def new_function(*args,**kwargs):                        
        start = datetime.datetime.now()                     
        x = original_function(*args,**kwargs)                
        elapsed = datetime.datetime.now()                      
        print("Elapsed Time = {0}".format(elapsed-start))     
        return x                                             
    return new_function()  

@calc_timing
def a_func(*variables):
    print("do something big!")

Vadim Shender response is great. You can also use a simpler decorator like below:

import datetime
def calc_timing(original_function):                            
    def new_function(*args,**kwargs):                        
        start = datetime.datetime.now()                     
        x = original_function(*args,**kwargs)                
        elapsed = datetime.datetime.now()                      
        print("Elapsed Time = {0}".format(elapsed-start))     
        return x                                             
    return new_function()  

@calc_timing
def a_func(*variables):
    print("do something big!")

回答 8

在编程中,有两种主要的时间测量方法,结果不同:

>>> print(time.process_time()); time.sleep(10); print(time.process_time())
0.11751394000000001
0.11764988400000001  # took  0 seconds and a bit
>>> print(time.perf_counter()); time.sleep(10); print(time.perf_counter())
3972.465770326
3982.468109075       # took 10 seconds and a bit
  • 处理器时间:这是该特定进程在CPU上主动执行所花费的时间。睡眠,等待Web请求或仅执行其他进程的时间不会对此有所帮助。

    • 采用 time.process_time()
  • 墙上时钟时间:这指的是“挂在墙上的时钟上”经过了多少时间,即不是实时时间。

    • 采用 time.perf_counter()

      • time.time() 还可以测量挂钟时间,但可以重置,因此您可以返回到过去
      • time.monotonic() 无法重置(单调=仅前进),但精度低于 time.perf_counter()

In programming, there are 2 main ways to measure time, with different results:

>>> print(time.process_time()); time.sleep(10); print(time.process_time())
0.11751394000000001
0.11764988400000001  # took  0 seconds and a bit
>>> print(time.perf_counter()); time.sleep(10); print(time.perf_counter())
3972.465770326
3982.468109075       # took 10 seconds and a bit
  • Processor Time: This is how long this specific process spends actively being executed on the CPU. Sleep, waiting for a web request, or time when only other processes are executed will not contribute to this.

    • Use time.process_time()
  • Wall-Clock Time: This refers to how much time has passed “on a clock hanging on the wall”, i.e. outside real time.

    • Use time.perf_counter()

      • time.time() also measures wall-clock time but can be reset, so you could go back in time
      • time.monotonic() cannot be reset (monotonic = only goes forward) but has lower precision than time.perf_counter()

回答 9

这是Vadim Shender的巧妙代码的更新,带有表格输出:

import collections
import time
from functools import wraps

PROF_DATA = collections.defaultdict(list)

def profile(fn):
    @wraps(fn)
    def with_profiling(*args, **kwargs):
        start_time = time.time()
        ret = fn(*args, **kwargs)
        elapsed_time = time.time() - start_time
        PROF_DATA[fn.__name__].append(elapsed_time)
        return ret
    return with_profiling

Metrics = collections.namedtuple("Metrics", "sum_time num_calls min_time max_time avg_time fname")

def print_profile_data():
    results = []
    for fname, elapsed_times in PROF_DATA.items():
        num_calls = len(elapsed_times)
        min_time = min(elapsed_times)
        max_time = max(elapsed_times)
        sum_time = sum(elapsed_times)
        avg_time = sum_time / num_calls
        metrics = Metrics(sum_time, num_calls, min_time, max_time, avg_time, fname)
        results.append(metrics)
    total_time = sum([m.sum_time for m in results])
    print("\t".join(["Percent", "Sum", "Calls", "Min", "Max", "Mean", "Function"]))
    for m in sorted(results, reverse=True):
        print("%.1f\t%.3f\t%d\t%.3f\t%.3f\t%.3f\t%s" % (100 * m.sum_time / total_time, m.sum_time, m.num_calls, m.min_time, m.max_time, m.avg_time, m.fname))
    print("%.3f Total Time" % total_time)

Here is an update to Vadim Shender’s clever code with tabular output:

import collections
import time
from functools import wraps

PROF_DATA = collections.defaultdict(list)

def profile(fn):
    @wraps(fn)
    def with_profiling(*args, **kwargs):
        start_time = time.time()
        ret = fn(*args, **kwargs)
        elapsed_time = time.time() - start_time
        PROF_DATA[fn.__name__].append(elapsed_time)
        return ret
    return with_profiling

Metrics = collections.namedtuple("Metrics", "sum_time num_calls min_time max_time avg_time fname")

def print_profile_data():
    results = []
    for fname, elapsed_times in PROF_DATA.items():
        num_calls = len(elapsed_times)
        min_time = min(elapsed_times)
        max_time = max(elapsed_times)
        sum_time = sum(elapsed_times)
        avg_time = sum_time / num_calls
        metrics = Metrics(sum_time, num_calls, min_time, max_time, avg_time, fname)
        results.append(metrics)
    total_time = sum([m.sum_time for m in results])
    print("\t".join(["Percent", "Sum", "Calls", "Min", "Max", "Mean", "Function"]))
    for m in sorted(results, reverse=True):
        print("%.1f\t%.3f\t%d\t%.3f\t%.3f\t%.3f\t%s" % (100 * m.sum_time / total_time, m.sum_time, m.num_calls, m.min_time, m.max_time, m.avg_time, m.fname))
    print("%.3f Total Time" % total_time)

从Python的字符串中剥离字母数字字符以外的所有内容

问题:从Python的字符串中剥离字母数字字符以外的所有内容

使用Python从字符串中剥离所有非字母数字字符的最佳方法是什么?

这个问题PHP变体中提供的解决方案可能会进行一些小的调整,但对我来说似乎并不是很“ pythonic”。

作为记录,我不仅要删除句点和逗号(和其他标点符号),而且还要删除引号,方括号等。

What is the best way to strip all non alphanumeric characters from a string, using Python?

The solutions presented in the PHP variant of this question will probably work with some minor adjustments, but don’t seem very ‘pythonic’ to me.

For the record, I don’t just want to strip periods and commas (and other punctuation), but also quotes, brackets, etc.


回答 0

我只是出于好奇而对某些功能进行了计时。在这些测试中,我从字符串string.printable(内置string模块的一部分)中删除了非字母数字字符。发现使用已编译'[\W_]+'pattern.sub('', str)最快。

$ python -m timeit -s \
     "import string" \
     "''.join(ch for ch in string.printable if ch.isalnum())" 
10000 loops, best of 3: 57.6 usec per loop

$ python -m timeit -s \
    "import string" \
    "filter(str.isalnum, string.printable)"                 
10000 loops, best of 3: 37.9 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]+', '', string.printable)"                
100000 loops, best of 3: 15 usec per loop

$ python -m timeit -s \
    "import re, string; pattern = re.compile('[\W_]+')" \
    "pattern.sub('', string.printable)" 
100000 loops, best of 3: 11.2 usec per loop

I just timed some functions out of curiosity. In these tests I’m removing non-alphanumeric characters from the string string.printable (part of the built-in string module). The use of compiled '[\W_]+' and pattern.sub('', str) was found to be fastest.

$ python -m timeit -s \
     "import string" \
     "''.join(ch for ch in string.printable if ch.isalnum())" 
10000 loops, best of 3: 57.6 usec per loop

$ python -m timeit -s \
    "import string" \
    "filter(str.isalnum, string.printable)"                 
10000 loops, best of 3: 37.9 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]', '', string.printable)"
10000 loops, best of 3: 27.5 usec per loop

$ python -m timeit -s \
    "import re, string" \
    "re.sub('[\W_]+', '', string.printable)"                
100000 loops, best of 3: 15 usec per loop

$ python -m timeit -s \
    "import re, string; pattern = re.compile('[\W_]+')" \
    "pattern.sub('', string.printable)" 
100000 loops, best of 3: 11.2 usec per loop

回答 1

救援的正则表达式:

import re
re.sub(r'\W+', '', your_string)

通过Python的定义'\W== [^a-zA-Z0-9_],其中不包括所有的numbersletters_

Regular expressions to the rescue:

import re
re.sub(r'\W+', '', your_string)

By Python definition '\W == [^a-zA-Z0-9_], which excludes all numbers, letters and _


回答 2

使用str.translate()方法。

假设您经常这样做:

(1)创建一个包含您要删除的所有字符的字符串:

delchars = ''.join(c for c in map(chr, range(256)) if not c.isalnum())

(2)每当要收缩字符串时:

scrunched = s.translate(None, delchars)

设置成本可能比re.compile好。边际成本要低得多:

C:\junk>\python26\python -mtimeit -s"import string;d=''.join(c for c in map(chr,range(256)) if not c.isalnum());s=string.printable" "s.translate(None,d)"
100000 loops, best of 3: 2.04 usec per loop

C:\junk>\python26\python -mtimeit -s"import re,string;s=string.printable;r=re.compile(r'[\W_]+')" "r.sub('',s)"
100000 loops, best of 3: 7.34 usec per loop

注意:使用string.printable作为基准数据会给'[\ W _] +’模式带来不公平的优势;所有非字母数字字符都是一堆的…在典型数据中,将有不止一个替换操作:

C:\junk>\python26\python -c "import string; s = string.printable; print len(s),repr(s)"
100 '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

如果您给re.sub做更多的工作,将会发生以下情况:

C:\junk>\python26\python -mtimeit -s"d=''.join(c for c in map(chr,range(256)) if not c.isalnum());s='foo-'*25" "s.translate(None,d)"
1000000 loops, best of 3: 1.97 usec per loop

C:\junk>\python26\python -mtimeit -s"import re;s='foo-'*25;r=re.compile(r'[\W_]+')" "r.sub('',s)"
10000 loops, best of 3: 26.4 usec per loop

Use the str.translate() method.

Presuming you will be doing this often:

(1) Once, create a string containing all the characters you wish to delete:

delchars = ''.join(c for c in map(chr, range(256)) if not c.isalnum())

(2) Whenever you want to scrunch a string:

scrunched = s.translate(None, delchars)

The setup cost probably compares favourably with re.compile; the marginal cost is way lower:

C:\junk>\python26\python -mtimeit -s"import string;d=''.join(c for c in map(chr,range(256)) if not c.isalnum());s=string.printable" "s.translate(None,d)"
100000 loops, best of 3: 2.04 usec per loop

C:\junk>\python26\python -mtimeit -s"import re,string;s=string.printable;r=re.compile(r'[\W_]+')" "r.sub('',s)"
100000 loops, best of 3: 7.34 usec per loop

Note: Using string.printable as benchmark data gives the pattern ‘[\W_]+’ an unfair advantage; all the non-alphanumeric characters are in one bunch … in typical data there would be more than one substitution to do:

C:\junk>\python26\python -c "import string; s = string.printable; print len(s),repr(s)"
100 '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Here’s what happens if you give re.sub a bit more work to do:

C:\junk>\python26\python -mtimeit -s"d=''.join(c for c in map(chr,range(256)) if not c.isalnum());s='foo-'*25" "s.translate(None,d)"
1000000 loops, best of 3: 1.97 usec per loop

C:\junk>\python26\python -mtimeit -s"import re;s='foo-'*25;r=re.compile(r'[\W_]+')" "r.sub('',s)"
10000 loops, best of 3: 26.4 usec per loop

回答 3

您可以尝试:

print ''.join(ch for ch in some_string if ch.isalnum())

You could try:

print ''.join(ch for ch in some_string if ch.isalnum())

回答 4

>>> import re
>>> string = "Kl13@£$%[};'\""
>>> pattern = re.compile('\W')
>>> string = re.sub(pattern, '', string)
>>> print string
Kl13
>>> import re
>>> string = "Kl13@£$%[};'\""
>>> pattern = re.compile('\W')
>>> string = re.sub(pattern, '', string)
>>> print string
Kl13

回答 5

怎么样:

def ExtractAlphanumeric(InputString):
    from string import ascii_letters, digits
    return "".join([ch for ch in InputString if ch in (ascii_letters + digits)])

这是通过使用列表中理解产生字符的列表中InputString,如果它们存在于合并ascii_lettersdigits字符串。然后,它将列表连接到一个字符串中。

How about:

def ExtractAlphanumeric(InputString):
    from string import ascii_letters, digits
    return "".join([ch for ch in InputString if ch in (ascii_letters + digits)])

This works by using list comprehension to produce a list of the characters in InputString if they are present in the combined ascii_letters and digits strings. It then joins the list together into a string.


回答 6

作为此处其他答案的补充,我提供了一种非常简单而灵活的方法来定义一组您希望将字符串内容限制为的字符。在这种情况下,我允许使用字母数字加破折号和下划线。只需PERMITTED_CHARS根据自己的用例从我添加或删除字符。

PERMITTED_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-" 
someString = "".join(c for c in someString if c in PERMITTED_CHARS)

As a spin off from some other answers here, I offer a really simple and flexible way to define a set of characters that you want to limit a string’s content to. In this case, I’m allowing alphanumerics PLUS dash and underscore. Just add or remove characters from my PERMITTED_CHARS as suits your use case.

PERMITTED_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-" 
someString = "".join(c for c in someString if c in PERMITTED_CHARS)

回答 7

sent = "".join(e for e in sent if e.isalpha())
sent = "".join(e for e in sent if e.isalpha())

回答 8

for char in my_string:
    if not char.isalnum():
        my_string = my_string.replace(char,"")
for char in my_string:
    if not char.isalnum():
        my_string = my_string.replace(char,"")

回答 9

使用ASCII可打印字符的随机字符串进行计时:

from inspect import getsource
from random import sample
import re
from string import printable
from timeit import timeit

pattern_single = re.compile(r'[\W]')
pattern_repeat = re.compile(r'[\W]+')
translation_tb = str.maketrans('', '', ''.join(c for c in map(chr, range(256)) if not c.isalnum()))


def generate_test_string(length):
    return ''.join(sample(printable, length))


def main():
    for i in range(0, 60, 10):
        for test in [
            lambda: ''.join(c for c in generate_test_string(i) if c.isalnum()),
            lambda: ''.join(filter(str.isalnum, generate_test_string(i))),
            lambda: re.sub(r'[\W]', '', generate_test_string(i)),
            lambda: re.sub(r'[\W]+', '', generate_test_string(i)),
            lambda: pattern_single.sub('', generate_test_string(i)),
            lambda: pattern_repeat.sub('', generate_test_string(i)),
            lambda: generate_test_string(i).translate(translation_tb),

        ]:
            print(timeit(test), i, getsource(test).lstrip('            lambda: ').rstrip(',\n'), sep='\t')


if __name__ == '__main__':
    main()

结果(Python 3.7):

       Time       Length                           Code                           
6.3716264850008880  00  ''.join(c for c in generate_test_string(i) if c.isalnum())
5.7285426190064750  00  ''.join(filter(str.isalnum, generate_test_string(i)))
8.1875841680011940  00  re.sub(r'[\W]', '', generate_test_string(i))
8.0002205439959650  00  re.sub(r'[\W]+', '', generate_test_string(i))
5.5290945199958510  00  pattern_single.sub('', generate_test_string(i))
5.4417179649972240  00  pattern_repeat.sub('', generate_test_string(i))
4.6772285089973590  00  generate_test_string(i).translate(translation_tb)
23.574712151996210  10  ''.join(c for c in generate_test_string(i) if c.isalnum())
22.829975890002970  10  ''.join(filter(str.isalnum, generate_test_string(i)))
27.210196289997840  10  re.sub(r'[\W]', '', generate_test_string(i))
27.203713296003116  10  re.sub(r'[\W]+', '', generate_test_string(i))
24.008979928999906  10  pattern_single.sub('', generate_test_string(i))
23.945240008994006  10  pattern_repeat.sub('', generate_test_string(i))
21.830899796994345  10  generate_test_string(i).translate(translation_tb)
38.731336012999236  20  ''.join(c for c in generate_test_string(i) if c.isalnum())
37.942474347000825  20  ''.join(filter(str.isalnum, generate_test_string(i)))
42.169366310001350  20  re.sub(r'[\W]', '', generate_test_string(i))
41.933375883003464  20  re.sub(r'[\W]+', '', generate_test_string(i))
38.899814646996674  20  pattern_single.sub('', generate_test_string(i))
38.636144253003295  20  pattern_repeat.sub('', generate_test_string(i))
36.201238164998360  20  generate_test_string(i).translate(translation_tb)
49.377356811004574  30  ''.join(c for c in generate_test_string(i) if c.isalnum())
48.408927293996385  30  ''.join(filter(str.isalnum, generate_test_string(i)))
53.901889764994850  30  re.sub(r'[\W]', '', generate_test_string(i))
52.130339455994545  30  re.sub(r'[\W]+', '', generate_test_string(i))
50.061149017004940  30  pattern_single.sub('', generate_test_string(i))
49.366573111998150  30  pattern_repeat.sub('', generate_test_string(i))
46.649754120997386  30  generate_test_string(i).translate(translation_tb)
63.107938601999194  40  ''.join(c for c in generate_test_string(i) if c.isalnum())
65.116287978999030  40  ''.join(filter(str.isalnum, generate_test_string(i)))
71.477421126997800  40  re.sub(r'[\W]', '', generate_test_string(i))
66.027950693998720  40  re.sub(r'[\W]+', '', generate_test_string(i))
63.315361931003280  40  pattern_single.sub('', generate_test_string(i))
62.342320287003530  40  pattern_repeat.sub('', generate_test_string(i))
58.249303059004890  40  generate_test_string(i).translate(translation_tb)
73.810345625002810  50  ''.join(c for c in generate_test_string(i) if c.isalnum())
72.593953348005020  50  ''.join(filter(str.isalnum, generate_test_string(i)))
76.048324580995540  50  re.sub(r'[\W]', '', generate_test_string(i))
75.106637657001560  50  re.sub(r'[\W]+', '', generate_test_string(i))
74.681338128997600  50  pattern_single.sub('', generate_test_string(i))
72.430461594005460  50  pattern_repeat.sub('', generate_test_string(i))
69.394243567003290  50  generate_test_string(i).translate(translation_tb)

str.maketransstr.translate最快,但包含所有非ASCII字符。 re.compilepattern.sub较慢,但比''.join&更快filter

Timing with random strings of ASCII printables:

from inspect import getsource
from random import sample
import re
from string import printable
from timeit import timeit

pattern_single = re.compile(r'[\W]')
pattern_repeat = re.compile(r'[\W]+')
translation_tb = str.maketrans('', '', ''.join(c for c in map(chr, range(256)) if not c.isalnum()))


def generate_test_string(length):
    return ''.join(sample(printable, length))


def main():
    for i in range(0, 60, 10):
        for test in [
            lambda: ''.join(c for c in generate_test_string(i) if c.isalnum()),
            lambda: ''.join(filter(str.isalnum, generate_test_string(i))),
            lambda: re.sub(r'[\W]', '', generate_test_string(i)),
            lambda: re.sub(r'[\W]+', '', generate_test_string(i)),
            lambda: pattern_single.sub('', generate_test_string(i)),
            lambda: pattern_repeat.sub('', generate_test_string(i)),
            lambda: generate_test_string(i).translate(translation_tb),

        ]:
            print(timeit(test), i, getsource(test).lstrip('            lambda: ').rstrip(',\n'), sep='\t')


if __name__ == '__main__':
    main()

Result (Python 3.7):

       Time       Length                           Code                           
6.3716264850008880  00  ''.join(c for c in generate_test_string(i) if c.isalnum())
5.7285426190064750  00  ''.join(filter(str.isalnum, generate_test_string(i)))
8.1875841680011940  00  re.sub(r'[\W]', '', generate_test_string(i))
8.0002205439959650  00  re.sub(r'[\W]+', '', generate_test_string(i))
5.5290945199958510  00  pattern_single.sub('', generate_test_string(i))
5.4417179649972240  00  pattern_repeat.sub('', generate_test_string(i))
4.6772285089973590  00  generate_test_string(i).translate(translation_tb)
23.574712151996210  10  ''.join(c for c in generate_test_string(i) if c.isalnum())
22.829975890002970  10  ''.join(filter(str.isalnum, generate_test_string(i)))
27.210196289997840  10  re.sub(r'[\W]', '', generate_test_string(i))
27.203713296003116  10  re.sub(r'[\W]+', '', generate_test_string(i))
24.008979928999906  10  pattern_single.sub('', generate_test_string(i))
23.945240008994006  10  pattern_repeat.sub('', generate_test_string(i))
21.830899796994345  10  generate_test_string(i).translate(translation_tb)
38.731336012999236  20  ''.join(c for c in generate_test_string(i) if c.isalnum())
37.942474347000825  20  ''.join(filter(str.isalnum, generate_test_string(i)))
42.169366310001350  20  re.sub(r'[\W]', '', generate_test_string(i))
41.933375883003464  20  re.sub(r'[\W]+', '', generate_test_string(i))
38.899814646996674  20  pattern_single.sub('', generate_test_string(i))
38.636144253003295  20  pattern_repeat.sub('', generate_test_string(i))
36.201238164998360  20  generate_test_string(i).translate(translation_tb)
49.377356811004574  30  ''.join(c for c in generate_test_string(i) if c.isalnum())
48.408927293996385  30  ''.join(filter(str.isalnum, generate_test_string(i)))
53.901889764994850  30  re.sub(r'[\W]', '', generate_test_string(i))
52.130339455994545  30  re.sub(r'[\W]+', '', generate_test_string(i))
50.061149017004940  30  pattern_single.sub('', generate_test_string(i))
49.366573111998150  30  pattern_repeat.sub('', generate_test_string(i))
46.649754120997386  30  generate_test_string(i).translate(translation_tb)
63.107938601999194  40  ''.join(c for c in generate_test_string(i) if c.isalnum())
65.116287978999030  40  ''.join(filter(str.isalnum, generate_test_string(i)))
71.477421126997800  40  re.sub(r'[\W]', '', generate_test_string(i))
66.027950693998720  40  re.sub(r'[\W]+', '', generate_test_string(i))
63.315361931003280  40  pattern_single.sub('', generate_test_string(i))
62.342320287003530  40  pattern_repeat.sub('', generate_test_string(i))
58.249303059004890  40  generate_test_string(i).translate(translation_tb)
73.810345625002810  50  ''.join(c for c in generate_test_string(i) if c.isalnum())
72.593953348005020  50  ''.join(filter(str.isalnum, generate_test_string(i)))
76.048324580995540  50  re.sub(r'[\W]', '', generate_test_string(i))
75.106637657001560  50  re.sub(r'[\W]+', '', generate_test_string(i))
74.681338128997600  50  pattern_single.sub('', generate_test_string(i))
72.430461594005460  50  pattern_repeat.sub('', generate_test_string(i))
69.394243567003290  50  generate_test_string(i).translate(translation_tb)

str.maketrans & str.translate is fastest, but includes all non-ASCII characters. re.compile & pattern.sub is slower, but is somehow faster than ''.join & filter.


回答 10

如果我正确理解,最简单的方法是使用正则表达式,因为它为您提供了很大的灵活性,但是另一种简单的方法是用于循环跟踪的是带有示例的代码,我也计算了单词的出现次数并存储在字典中。

s = """An... essay is, generally, a piece of writing that gives the author's own 
argument — but the definition is vague, 
overlapping with those of a paper, an article, a pamphlet, and a short story. Essays 
have traditionally been 
sub-classified as formal and informal. Formal essays are characterized by "serious 
purpose, dignity, logical 
organization, length," whereas the informal essay is characterized by "the personal 
element (self-revelation, 
individual tastes and experiences, confidential manner), humor, graceful style, 
rambling structure, unconventionality 
or novelty of theme," etc.[1]"""

d = {}      # creating empty dic      
words = s.split() # spliting string and stroing in list
for word in words:
    new_word = ''
    for c in word:
        if c.isalnum(): # checking if indiviual chr is alphanumeric or not
            new_word = new_word + c
    print(new_word, end=' ')
    # if new_word not in d:
    #     d[new_word] = 1
    # else:
    #     d[new_word] = d[new_word] +1
print(d)

如果此答案有用,请评价此!

If i understood correctly the easiest way is to use regular expression as it provides you lots of flexibility but the other simple method is to use for loop following is the code with example I also counted the occurrence of word and stored in dictionary..

s = """An... essay is, generally, a piece of writing that gives the author's own 
argument — but the definition is vague, 
overlapping with those of a paper, an article, a pamphlet, and a short story. Essays 
have traditionally been 
sub-classified as formal and informal. Formal essays are characterized by "serious 
purpose, dignity, logical 
organization, length," whereas the informal essay is characterized by "the personal 
element (self-revelation, 
individual tastes and experiences, confidential manner), humor, graceful style, 
rambling structure, unconventionality 
or novelty of theme," etc.[1]"""

d = {}      # creating empty dic      
words = s.split() # spliting string and stroing in list
for word in words:
    new_word = ''
    for c in word:
        if c.isalnum(): # checking if indiviual chr is alphanumeric or not
            new_word = new_word + c
    print(new_word, end=' ')
    # if new_word not in d:
    #     d[new_word] = 1
    # else:
    #     d[new_word] = d[new_word] +1
print(d)

please rate this if this answer is useful!


在numpy数组上映射函数的最有效方法

问题:在numpy数组上映射函数的最有效方法

在numpy数组上映射函数的最有效方法是什么?我在当前项目中一直采用的方式如下:

import numpy as np 

x = np.array([1, 2, 3, 4, 5])

# Obtain array of square of each element in x
squarer = lambda t: t ** 2
squares = np.array([squarer(xi) for xi in x])

但是,这似乎效率很低,因为我正在使用列表推导将新数组构造为Python列表,然后再将其转换回numpy数组。

我们可以做得更好吗?

What is the most efficient way to map a function over a numpy array? The way I’ve been doing it in my current project is as follows:

import numpy as np 

x = np.array([1, 2, 3, 4, 5])

# Obtain array of square of each element in x
squarer = lambda t: t ** 2
squares = np.array([squarer(xi) for xi in x])

However, this seems like it is probably very inefficient, since I am using a list comprehension to construct the new array as a Python list before converting it back to a numpy array.

Can we do better?


回答 0

我测试过的所有建议的方法,加上np.array(map(f, x))perfplot(我的一个小项目)。

消息1:如果可以使用numpy的本机函数,请执行此操作。

如果你想已经矢量化功能矢量(如x**2在原岗位的例子),使用的是比什么都更快(注意对数标度):

在此处输入图片说明

如果您确实需要向量化,那么使用哪种变体并不重要。

在此处输入图片说明


复制图的代码:

import numpy as np
import perfplot
import math


def f(x):
    # return math.sqrt(x)
    return np.sqrt(x)


vf = np.vectorize(f)


def array_for(x):
    return np.array([f(xi) for xi in x])


def array_map(x):
    return np.array(list(map(f, x)))


def fromiter(x):
    return np.fromiter((f(xi) for xi in x), x.dtype)


def vectorize(x):
    return np.vectorize(f)(x)


def vectorize_without_init(x):
    return vf(x)


perfplot.show(
    setup=lambda n: np.random.rand(n),
    n_range=[2 ** k for k in range(20)],
    kernels=[f, array_for, array_map, fromiter, vectorize, vectorize_without_init],
    xlabel="len(x)",
)

I’ve tested all suggested methods plus np.array(map(f, x)) with perfplot (a small project of mine).

Message #1: If you can use numpy’s native functions, do that.

If the function you’re trying to vectorize already is vectorized (like the x**2 example in the original post), using that is much faster than anything else (note the log scale):

enter image description here

If you actually need vectorization, it doesn’t really matter much which variant you use.

enter image description here


Code to reproduce the plots:

import numpy as np
import perfplot
import math


def f(x):
    # return math.sqrt(x)
    return np.sqrt(x)


vf = np.vectorize(f)


def array_for(x):
    return np.array([f(xi) for xi in x])


def array_map(x):
    return np.array(list(map(f, x)))


def fromiter(x):
    return np.fromiter((f(xi) for xi in x), x.dtype)


def vectorize(x):
    return np.vectorize(f)(x)


def vectorize_without_init(x):
    return vf(x)


perfplot.show(
    setup=lambda n: np.random.rand(n),
    n_range=[2 ** k for k in range(20)],
    kernels=[f, array_for, array_map, fromiter, vectorize, vectorize_without_init],
    xlabel="len(x)",
)

回答 1

如何使用numpy.vectorize

import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1,  4,  9, 16, 25])

How about using numpy.vectorize.

import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1,  4,  9, 16, 25])

回答 2

TL; DR

@ user2357112所述,应用函数的“直接”方法始终是在Numpy数组上映射函数的最快,最简单的方法:

import numpy as np
x = np.array([1, 2, 3, 4, 5])
f = lambda x: x ** 2
squares = f(x)

通常应避免使用np.vectorize它,因为它运行不佳,并且有(或遇到)许多问题。如果要处理其他数据类型,则可能需要研究以下所示的其他方法。

方法比较

以下是一些简单的测试,用于比较三种映射函数的方法,本示例在Python 3.6和NumPy 1.15.4中使用。首先,用于测试的设置功能:

import timeit
import numpy as np

f = lambda x: x ** 2
vf = np.vectorize(f)

def test_array(x, n):
    t = timeit.timeit(
        'np.array([f(xi) for xi in x])',
        'from __main__ import np, x, f', number=n)
    print('array: {0:.3f}'.format(t))

def test_fromiter(x, n):
    t = timeit.timeit(
        'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))',
        'from __main__ import np, x, f', number=n)
    print('fromiter: {0:.3f}'.format(t))

def test_direct(x, n):
    t = timeit.timeit(
        'f(x)',
        'from __main__ import x, f', number=n)
    print('direct: {0:.3f}'.format(t))

def test_vectorized(x, n):
    t = timeit.timeit(
        'vf(x)',
        'from __main__ import x, vf', number=n)
    print('vectorized: {0:.3f}'.format(t))

用五个元素(从最快到最慢排序)进行测试:

x = np.array([1, 2, 3, 4, 5])
n = 100000
test_direct(x, n)      # 0.265
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.865
test_vectorized(x, n)  # 2.906

具有100多个元素:

x = np.arange(100)
n = 10000
test_direct(x, n)      # 0.030
test_array(x, n)       # 0.501
test_vectorized(x, n)  # 0.670
test_fromiter(x, n)    # 0.883

并且具有1000或更多的数组元素:

x = np.arange(1000)
n = 1000
test_direct(x, n)      # 0.007
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.516
test_vectorized(x, n)  # 0.945

不同版本的Python / NumPy和编译器优化将产生不同的结果,因此请针对您的环境进行类似的测试。

TL;DR

As noted by @user2357112, a “direct” method of applying the function is always the fastest and simplest way to map a function over Numpy arrays:

import numpy as np
x = np.array([1, 2, 3, 4, 5])
f = lambda x: x ** 2
squares = f(x)

Generally avoid np.vectorize, as it does not perform well, and has (or had) a number of issues. If you are handling other data types, you may want to investigate the other methods shown below.

Comparison of methods

Here are some simple tests to compare three methods to map a function, this example using with Python 3.6 and NumPy 1.15.4. First, the set-up functions for testing:

import timeit
import numpy as np

f = lambda x: x ** 2
vf = np.vectorize(f)

def test_array(x, n):
    t = timeit.timeit(
        'np.array([f(xi) for xi in x])',
        'from __main__ import np, x, f', number=n)
    print('array: {0:.3f}'.format(t))

def test_fromiter(x, n):
    t = timeit.timeit(
        'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))',
        'from __main__ import np, x, f', number=n)
    print('fromiter: {0:.3f}'.format(t))

def test_direct(x, n):
    t = timeit.timeit(
        'f(x)',
        'from __main__ import x, f', number=n)
    print('direct: {0:.3f}'.format(t))

def test_vectorized(x, n):
    t = timeit.timeit(
        'vf(x)',
        'from __main__ import x, vf', number=n)
    print('vectorized: {0:.3f}'.format(t))

Testing with five elements (sorted from fastest to slowest):

x = np.array([1, 2, 3, 4, 5])
n = 100000
test_direct(x, n)      # 0.265
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.865
test_vectorized(x, n)  # 2.906

With 100s of elements:

x = np.arange(100)
n = 10000
test_direct(x, n)      # 0.030
test_array(x, n)       # 0.501
test_vectorized(x, n)  # 0.670
test_fromiter(x, n)    # 0.883

And with 1000s of array elements or more:

x = np.arange(1000)
n = 1000
test_direct(x, n)      # 0.007
test_fromiter(x, n)    # 0.479
test_array(x, n)       # 0.516
test_vectorized(x, n)  # 0.945

Different versions of Python/NumPy and compiler optimization will have different results, so do a similar test for your environment.


回答 3

numexprnumbacython周围,此答案的目的是考虑这些可能性。

但是首先让我们说明一个显而易见的事实:无论您如何将Python函数映射到numpy数组,它都会保留为Python函数,这意味着每次评估:

  • numpy-array元素必须转换为Python对象(例如, Float)。
  • 所有的计算都是使用Python对象完成的,这意味着要占用解释器,动态分配和不可变对象的开销。

因此,由于上面提到的开销,实际上用于循环遍历数组的机制不会发挥很大的作用-它比使用numpy的内置功能要慢得多。

让我们看下面的例子:

# numpy-functionality
def f(x):
    return x+2*x*x+4*x*x*x

# python-function as ufunc
import numpy as np
vf=np.vectorize(f)
vf.__name__="vf"

np.vectorize被选为方法的纯Python函数类的代表。使用perfplot(请参阅此答案的附录中的代码),我们得到以下运行时间:

在此处输入图片说明

我们可以看到,numpy方法比纯python版本快10到100倍。对于更大的数组大小,性能下降可能是因为数据不再适合高速缓存。

值得一提的是,vectorize它还占用大量内存,因此内存使用常常是瓶颈(请参阅相关的SO问题)。还要注意,numpy的文档np.vectorize指出“主要是为了方便而不是性能而提供”。

需要性能时,应使用其他工具,除了从头开始编写C扩展名外,还有以下可能性:


人们经常听到,numpy性能是最好的,因为它是纯C语言。但是还有很多改进的空间!

向量化的numpy版本使用大量额外的内存和内存访问。Numexp库尝试对numpy数组进行平铺,从而获得更好的缓存利用率:

# less cache misses than numpy-functionality
import numexpr as ne
def ne_f(x):
    return ne.evaluate("x+2*x*x+4*x*x*x")

导致以下比较:

在此处输入图片说明

我无法解释上面图表中的所有内容:一开始我们会看到numexpr-library的开销更大,但是因为它更好地利用了缓存,所以对于较大的数组,它的速度要快大约10倍!


另一种方法是通过jit编译功能,从而获得真正的纯C UFunc。这是numba的方法:

# runtime generated C-function as ufunc
import numba as nb
@nb.vectorize(target="cpu")
def nb_vf(x):
    return x+2*x*x+4*x*x*x

它比原始的numpy方法快10倍:

在此处输入图片说明


但是,该任务可尴尬地可并行化,因此我们也可以使用prange它来并行计算循环:

@nb.njit(parallel=True)
def nb_par_jitf(x):
    y=np.empty(x.shape)
    for i in nb.prange(len(x)):
        y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y

不出所料,并行功能对于较小的输入而言较慢,但对于较大的输入则较快(几乎为2倍):

在此处输入图片说明


虽然numba专门研究使用numpy数组优化操作,但Cython是更通用的工具。提取与numba相同的性能更加复杂-相对于本地编译器(gcc / MSVC),通常归结为llvm(numba):

%%cython -c=/openmp -a
import numpy as np
import cython

#single core:
@cython.boundscheck(False) 
@cython.wraparound(False) 
def cy_f(double[::1] x):
    y_out=np.empty(len(x))
    cdef Py_ssize_t i
    cdef double[::1] y=y_out
    for i in range(len(x)):
        y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y_out

#parallel:
from cython.parallel import prange
@cython.boundscheck(False) 
@cython.wraparound(False)  
def cy_par_f(double[::1] x):
    y_out=np.empty(len(x))
    cdef double[::1] y=y_out
    cdef Py_ssize_t i
    cdef Py_ssize_t n = len(x)
    for i in prange(n, nogil=True):
        y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y_out

Cython导致功能变慢:

在此处输入图片说明


结论

显然,仅测试一个功能并不能证明任何事情。还要记住的是,对于所选的功能示例,内存的带宽是大于10 ^ 5个元素的瓶颈-因此,在该区域中,numba,numexpr和cython的性能相同。

最后,最终答案取决于函数的类型,硬件,Python分布和其他因素。例如,Anaconda-distribution使用Intel的VML来实现numpy的功能,从而在超越性功能(如,和类似功能)方面的性能要优于numba(除非它使用SVML,请参见此SO-post),例如exp,请参见以下SO-postsincos

但是从这次调查和到目前为止的经验来看,只要不涉及先验功能,numba似乎是性能最佳的最简单工具。


使用perfplot -package绘制运行时间:

import perfplot
perfplot.show(
    setup=lambda n: np.random.rand(n),
    n_range=[2**k for k in range(0,24)],
    kernels=[
        f, 
        vf,
        ne_f, 
        nb_vf, nb_par_jitf,
        cy_f, cy_par_f,
        ],
    logx=True,
    logy=True,
    xlabel='len(x)'
    )

There are numexpr, numba and cython around, the goal of this answer is to take these possibilities into consideration.

But first let’s state the obvious: no matter how you map a Python-function onto a numpy-array, it stays a Python function, that means for every evaluation:

  • numpy-array element must be converted to a Python-object (e.g. a Float).
  • all calculations are done with Python-objects, which means to have the overhead of interpreter, dynamic dispatch and immutable objects.

So which machinery is used to actually loop through the array doesn’t play a big role because of the overhead mentioned above – it stays much slower than using numpy’s built-in functionality.

Let’s take a look at the following example:

# numpy-functionality
def f(x):
    return x+2*x*x+4*x*x*x

# python-function as ufunc
import numpy as np
vf=np.vectorize(f)
vf.__name__="vf"

np.vectorize is picked as a representative of the pure-python function class of approaches. Using perfplot (see code in the appendix of this answer) we get the following running times:

enter image description here

We can see, that the numpy-approach is 10x-100x faster than the pure python version. The decrease of performance for bigger array-sizes is probably because data no longer fits the cache.

It is worth also mentioning, that vectorize also uses a lot of memory, so often memory-usage is the bottle-neck (see related SO-question). Also note, that numpy’s documentation on np.vectorize states that it is “provided primarily for convenience, not for performance”.

Other tools should be used, when performance is desired, beside writing a C-extension from the scratch, there are following possibilities:


One often hears, that the numpy-performance is as good as it gets, because it is pure C under the hood. Yet there is a lot room for improvement!

The vectorized numpy-version uses a lot of additional memory and memory-accesses. Numexp-library tries to tile the numpy-arrays and thus get a better cache utilization:

# less cache misses than numpy-functionality
import numexpr as ne
def ne_f(x):
    return ne.evaluate("x+2*x*x+4*x*x*x")

Leads to the following comparison:

enter image description here

I cannot explain everything in the plot above: we can see bigger overhead for numexpr-library at the beginning, but because it utilize the cache better it is about 10 time faster for bigger arrays!


Another approach is to jit-compile the function and thus getting a real pure-C UFunc. This is numba’s approach:

# runtime generated C-function as ufunc
import numba as nb
@nb.vectorize(target="cpu")
def nb_vf(x):
    return x+2*x*x+4*x*x*x

It is 10 times faster than the original numpy-approach:

enter image description here


However, the task is embarrassingly parallelizable, thus we also could use prange in order to calculate the loop in parallel:

@nb.njit(parallel=True)
def nb_par_jitf(x):
    y=np.empty(x.shape)
    for i in nb.prange(len(x)):
        y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y

As expected, the parallel function is slower for smaller inputs, but faster (almost factor 2) for larger sizes:

enter image description here


While numba specializes on optimizing operations with numpy-arrays, Cython is a more general tool. It is more complicated to extract the same performance as with numba – often it is down to llvm (numba) vs local compiler (gcc/MSVC):

%%cython -c=/openmp -a
import numpy as np
import cython

#single core:
@cython.boundscheck(False) 
@cython.wraparound(False) 
def cy_f(double[::1] x):
    y_out=np.empty(len(x))
    cdef Py_ssize_t i
    cdef double[::1] y=y_out
    for i in range(len(x)):
        y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y_out

#parallel:
from cython.parallel import prange
@cython.boundscheck(False) 
@cython.wraparound(False)  
def cy_par_f(double[::1] x):
    y_out=np.empty(len(x))
    cdef double[::1] y=y_out
    cdef Py_ssize_t i
    cdef Py_ssize_t n = len(x)
    for i in prange(n, nogil=True):
        y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
    return y_out

Cython results in somewhat slower functions:

enter image description here


Conclusion

Obviously, testing only for one function doesn’t prove anything. Also one should keep in mind, that for the choosen function-example, the bandwidth of the memory was the bottle neck for sizes larger than 10^5 elements – thus we had the same performance for numba, numexpr and cython in this region.

In the end, the ultimative answer depends on the type of function, hardware, Python-distribution and other factors. For example Anaconda-distribution uses Intel’s VML for numpy’s functions and thus outperforms numba (unless it uses SVML, see this SO-post) easily for transcendental functions like exp, sin, cos and similar – see e.g. the following SO-post.

Yet from this investigation and from my experience so far, I would state, that numba seems to be the easiest tool with best performance as long as no transcendental functions are involved.


Plotting running times with perfplot-package:

import perfplot
perfplot.show(
    setup=lambda n: np.random.rand(n),
    n_range=[2**k for k in range(0,24)],
    kernels=[
        f, 
        vf,
        ne_f, 
        nb_vf, nb_par_jitf,
        cy_f, cy_par_f,
        ],
    logx=True,
    logy=True,
    xlabel='len(x)'
    )

回答 4

squares = squarer(x)

数组上的算术运算会自动按元素进行应用,并使用高效的C级循环,避免了所有适用于Python级循环或理解的解释器开销。

您想将所有元素应用于NumPy数组的大多数功能都可以使用,尽管有些功能可能需要更改。例如,if不能逐个元素地工作。您想要将其转换为使用类似numpy.where以下的构造:

def using_if(x):
    if x < 5:
        return x
    else:
        return x**2

变成

def using_where(x):
    return numpy.where(x < 5, x, x**2)
squares = squarer(x)

Arithmetic operations on arrays are automatically applied elementwise, with efficient C-level loops that avoid all the interpreter overhead that would apply to a Python-level loop or comprehension.

Most of the functions you’d want to apply to a NumPy array elementwise will just work, though some may need changes. For example, if doesn’t work elementwise. You’d want to convert those to use constructs like numpy.where:

def using_if(x):
    if x < 5:
        return x
    else:
        return x**2

becomes

def using_where(x):
    return numpy.where(x < 5, x, x**2)

回答 5

我相信在numpy的较新版本(我使用1.13)中,您可以通过将numpy数组传递给您为标量类型编写的函数来调用函数,它将自动将函数调用应用于numpy数组上的每个元素并返回另一个numpy数组

>>> import numpy as np
>>> squarer = lambda t: t ** 2
>>> x = np.array([1, 2, 3, 4, 5])
>>> squarer(x)
array([ 1,  4,  9, 16, 25])

I believe in newer version( I use 1.13) of numpy you can simply call the function by passing the numpy array to the fuction that you wrote for scalar type, it will automatically apply the function call to each element over the numpy array and return you another numpy array

>>> import numpy as np
>>> squarer = lambda t: t ** 2
>>> x = np.array([1, 2, 3, 4, 5])
>>> squarer(x)
array([ 1,  4,  9, 16, 25])

回答 6

在许多情况下,numpy.apply_along_axis是最佳选择。与其他方法相比,它的性能提高了约100倍-不仅对于微不足道的测试功能,而且对于numpy和scipy的更复杂的功能组成。

当我添加方法时:

def along_axis(x):
    return np.apply_along_axis(f, 0, x)

到perfplot代码,我得到以下结果: 在此处输入图片说明

In many cases, numpy.apply_along_axis will be the best choice. It increases the performance by about 100x compared to the other approaches – and not only for trivial test functions, but also for more complex function compositions from numpy and scipy.

When I add the method:

def along_axis(x):
    return np.apply_along_axis(f, 0, x)

to the perfplot code, I get the following results: enter image description here


回答 7

似乎没有人提到过内置的工厂生产ufuncnumpy软件包的方法:np.frompyfunc我再次进行了测试np.vectorize,其性能要比其高出20%到30%。当然,它可以按规定的C代码甚至numba(我还没有测试过的)性能很好,但是比起更好的选择np.vectorize

f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)

%timeit f_arr(arr, arr) # 307ms
%timeit vf(arr, arr) # 450ms

我还测试了较大的样本,并且改进成比例。另请参阅文档在这里

It seems no one has mentioned a built-in factory method of producing ufunc in numpy package: np.frompyfunc which I have tested again np.vectorize and have outperformed it by about 20~30%. Of course it will perform well as prescribed C code or even numba(which I have not tested), but it can a better alternative than np.vectorize

f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)

%timeit f_arr(arr, arr) # 307ms
%timeit vf(arr, arr) # 450ms

I have also tested larger samples, and the improvement is proportional. See the documentation also here


回答 8

正如提到的这篇文章,只是使用生成器表达式如下所示:

numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)

As mentioned in this post, just use generator expressions like so:

numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)

回答 9

以上所有答案比较都不错,但是如果您需要使用自定义函数进行映射,并且 numpy.ndarray,则需要保留数组的形状。

我只比较了两个,但它将保留的形状ndarray。我已经将数组与100万个条目进行比较。在这里,我使用平方函数,该函数也是内置在numpy中的,并且具有很大的性能提升,因为有需要,您可以使用自己选择的函数。

import numpy, time
def timeit():
    y = numpy.arange(1000000)
    now = time.time()
    numpy.array([x * x for x in y.reshape(-1)]).reshape(y.shape)        
    print(time.time() - now)
    now = time.time()
    numpy.fromiter((x * x for x in y.reshape(-1)), y.dtype).reshape(y.shape)
    print(time.time() - now)
    now = time.time()
    numpy.square(y)  
    print(time.time() - now)

输出量

>>> timeit()
1.162431240081787    # list comprehension and then building numpy array
1.0775556564331055   # from numpy.fromiter
0.002948284149169922 # using inbuilt function

在这里,您可以清楚地看到numpy.fromiter采用简单方法的效果很好,如果内置功能可用,请使用它。

All above answers compares well, but if you need to use custom function for mapping, and you have numpy.ndarray, and you need to retain the shape of array.

I have compare just two, but it will retain the shape of ndarray. I have used the array with 1 million entries for comparison. Here I use square function, which is also inbuilt in numpy and has great performance boost, since there as was need of something, you can use function of your choice.

import numpy, time
def timeit():
    y = numpy.arange(1000000)
    now = time.time()
    numpy.array([x * x for x in y.reshape(-1)]).reshape(y.shape)        
    print(time.time() - now)
    now = time.time()
    numpy.fromiter((x * x for x in y.reshape(-1)), y.dtype).reshape(y.shape)
    print(time.time() - now)
    now = time.time()
    numpy.square(y)  
    print(time.time() - now)

Output

>>> timeit()
1.162431240081787    # list comprehension and then building numpy array
1.0775556564331055   # from numpy.fromiter
0.002948284149169922 # using inbuilt function

here you can clearly see numpy.fromiter works great considering to simple approach, and if inbuilt function is available please use that.


回答 10

采用 numpy.fromfunction(function, shape, **kwargs)

参见“ https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfunction.html


Python装饰器有哪些常见用途?[关闭]

问题:Python装饰器有哪些常见用途?[关闭]

尽管我喜欢将自己视为一个相当称职的Python编码器,但我从来没有想到过的语言的一方面是装饰器。

我知道它们是什么(表面上),我已经阅读了有关堆栈溢出的教程,示例和问题,而且我了解语法,可以编写自己的语法,偶尔使用@classmethod和@staticmethod,但是使用装饰器来解决我自己的Python代码中的问题。我从来没有遇到过这样的问题,我想:“嗯……这看起来像是装饰工的工作!”

因此,我想知道你们是否可以提供一些示例,说明您在自己的程序中使用装饰器的位置,并希望我会收到“ A-ha!”字样。片刻,他们。

While I like to think of myself as a reasonably competent Python coder, one aspect of the language I’ve never been able to grok is decorators.

I know what they are (superficially), I’ve read tutorials, examples, questions on Stack Overflow, and I understand the syntax, can write my own, occasionally use @classmethod and @staticmethod, but it never occurs to me to use a decorator to solve a problem in my own Python code. I never encounter a problem where I think, “Hmm…this looks like a job for a decorator!”

So, I’m wondering if you guys might offer some examples of where you’ve used decorators in your own programs, and hopefully I’ll have an “A-ha!” moment and get them.


回答 0

我将装饰器主要用于计时目的

def time_dec(func):

  def wrapper(*arg):
      t = time.clock()
      res = func(*arg)
      print func.func_name, time.clock()-t
      return res

  return wrapper


@time_dec
def myFunction(n):
    ...

I use decorators mainly for timing purposes

def time_dec(func):

  def wrapper(*arg):
      t = time.clock()
      res = func(*arg)
      print func.func_name, time.clock()-t
      return res

  return wrapper


@time_dec
def myFunction(n):
    ...

回答 1

我已使用它们进行同步。

import functools

def synchronized(lock):
    """ Synchronization decorator """
    def wrap(f):
        @functools.wraps(f)
        def newFunction(*args, **kw):
            lock.acquire()
            try:
                return f(*args, **kw)
            finally:
                lock.release()
        return newFunction
    return wrap

正如评论中指出的那样,从Python 2.5开始,您可以将with语句与threading.Lock(或multiprocessing.Lock从2.6版本开始)对象结合使用, 以将装饰器的实现简化为:

import functools

def synchronized(lock):
    """ Synchronization decorator """
    def wrap(f):
        @functools.wraps(f)
        def newFunction(*args, **kw):
            with lock:
                return f(*args, **kw)
        return newFunction
    return wrap

无论如何,您都可以这样使用它:

import threading
lock = threading.Lock()

@synchronized(lock)
def do_something():
  # etc

@synchronzied(lock)
def do_something_else():
  # etc

基本上,它只是将lock.acquire()/ lock.release()放在函数调用的任一侧。

I’ve used them for synchronization.

import functools

def synchronized(lock):
    """ Synchronization decorator """
    def wrap(f):
        @functools.wraps(f)
        def newFunction(*args, **kw):
            lock.acquire()
            try:
                return f(*args, **kw)
            finally:
                lock.release()
        return newFunction
    return wrap

As pointed out in the comments, since Python 2.5 you can use a with statement in conjunction with a threading.Lock (or multiprocessing.Lock since version 2.6) object to simplify the decorator’s implementation to just:

import functools

def synchronized(lock):
    """ Synchronization decorator """
    def wrap(f):
        @functools.wraps(f)
        def newFunction(*args, **kw):
            with lock:
                return f(*args, **kw)
        return newFunction
    return wrap

Regardless, you then use it like this:

import threading
lock = threading.Lock()

@synchronized(lock)
def do_something():
  # etc

@synchronzied(lock)
def do_something_else():
  # etc

Basically it just puts lock.acquire() / lock.release() on either side of the function call.


回答 2

我使用装饰器进行类型检查参数,这些参数通过一些RMI传递给我的Python方法。因此,与其重复相同的参数计数,不如一次又一次地引发异常。

例如,代替:

def myMethod(ID, name):
    if not (myIsType(ID, 'uint') and myIsType(name, 'utf8string')):
        raise BlaBlaException() ...

我只声明:

@accepts(uint, utf8string)
def myMethod(ID, name):
    ...

accepts()为我完成所有工作。

I use decorators for type checking parameters which are passed to my Python methods via some RMI. So instead of repeating the same parameter counting, exception-raising mumbo-jumbo again and again.

For example, instead of:

def myMethod(ID, name):
    if not (myIsType(ID, 'uint') and myIsType(name, 'utf8string')):
        raise BlaBlaException() ...

I just declare:

@accepts(uint, utf8string)
def myMethod(ID, name):
    ...

and accepts() does all the work for me.


回答 3

装饰器用于您要透明“包装”其他功能的任何物品。

Django使用它们将“需要登录”功能包装在视图功能上,以及注册过滤器功能

您可以使用类装饰器将命名日志添加到类

您可以“掌握”现有类或功能行为的任何足够通用的功能,对于装饰来说都是公平的。

PEP 318指向的Python-Dev新闻组上讨论了用例-函数和方法的装饰器

Decorators are used for anything that you want to transparently “wrap” with additional functionality.

Django uses them for wrapping “login required” functionality on view functions, as well as for registering filter functions.

You can use class decorators for adding named logs to classes.

Any sufficiently generic functionality that you can “tack on” to an existing class or function’s behavior is fair game for decoration.

There’s also a discussion of use cases on the Python-Dev newsgroup pointed to by PEP 318 — Decorators for Functions and Methods.


回答 4

对于鼻子测试,您可以编写一个装饰器,该装饰器为单元测试功能或方法提供几组参数:

@parameters(
   (2, 4, 6),
   (5, 6, 11),
)
def test_add(a, b, expected):
    assert a + b == expected

For nosetests, you can write a decorator that supplies a unit test function or method with several sets of parameters:

@parameters(
   (2, 4, 6),
   (5, 6, 11),
)
def test_add(a, b, expected):
    assert a + b == expected

回答 5

Twisted库结合使用装饰器和生成器,给人一种异步函数是同步的错觉。例如:

@inlineCallbacks
def asyncf():
    doStuff()
    yield someAsynchronousCall()
    doStuff()
    yield someAsynchronousCall()
    doStuff()

使用此功能,原本可以分解成许多小回调函数的代码可以很自然地作为一个单独的块编写,从而使理解和维护变得更加容易。

The Twisted library uses decorators combined with generators to give the illusion that an asynchronous function is synchronous. For example:

@inlineCallbacks
def asyncf():
    doStuff()
    yield someAsynchronousCall()
    doStuff()
    yield someAsynchronousCall()
    doStuff()

Using this, code that would have been broken up into a ton of little callback functions can be written quite naturally as a single block, making it a lot easier to understand and maintain.


回答 6

当然,一种明显的用途是用于日志记录:

import functools

def log(logger, level='info'):
    def log_decorator(fn):
        @functools.wraps(fn)
        def wrapper(*a, **kwa):
            getattr(logger, level)(fn.__name__)
            return fn(*a, **kwa)
        return wrapper
    return log_decorator

# later that day ...
@log(logging.getLogger('main'), level='warning')
def potentially_dangerous_function(times):
    for _ in xrange(times): rockets.get_rocket(NUCLEAR=True).fire()

One obvious use is for logging, of course:

import functools

def log(logger, level='info'):
    def log_decorator(fn):
        @functools.wraps(fn)
        def wrapper(*a, **kwa):
            getattr(logger, level)(fn.__name__)
            return fn(*a, **kwa)
        return wrapper
    return log_decorator

# later that day ...
@log(logging.getLogger('main'), level='warning')
def potentially_dangerous_function(times):
    for _ in xrange(times): rockets.get_rocket(NUCLEAR=True).fire()

回答 7

我主要将它们用于调试(包装打印其参数和结果的函数)和验证(例如,检查参数的类型是否正确,或者对于Web应用程序,如果用户具有足够的特权来调用特定参数方法)。

I use them mainly for debugging (wrapper around a function that prints its arguments and result) and verification (e.g. to check if an argument is of correct type or, in the case of web application, if the user has sufficient privileges to call a particular method).


回答 8

我正在使用以下装饰器来使函数成为线程安全的。它使代码更具可读性。它几乎与John Fouhy提出的类似,但是区别在于,一个函数可以处理单个函数,因此无需显式创建锁对象。

def threadsafe_function(fn):
    """decorator making sure that the decorated function is thread safe"""
    lock = threading.Lock()
    def new(*args, **kwargs):
        lock.acquire()
        try:
            r = fn(*args, **kwargs)
        except Exception as e:
            raise e
        finally:
            lock.release()
        return r
    return new

class X:
    var = 0

    @threadsafe_function     
    def inc_var(self):
        X.var += 1    
        return X.var

I am using the following decorator for making a function threadsafe. It makes the code more readable. It is almost similar to the one proposed by John Fouhy but the difference is that one work on a single function and that there is no need to create a lock object explicitely.

def threadsafe_function(fn):
    """decorator making sure that the decorated function is thread safe"""
    lock = threading.Lock()
    def new(*args, **kwargs):
        lock.acquire()
        try:
            r = fn(*args, **kwargs)
        except Exception as e:
            raise e
        finally:
            lock.release()
        return r
    return new

class X:
    var = 0

    @threadsafe_function     
    def inc_var(self):
        X.var += 1    
        return X.var

回答 9

装饰器用于定义函数的属性或用作更改函数的样板。对于他们来说,返回完全不同的功能是可能的,但违反直觉。在这里查看其他响应,似乎最常见的用途之一是限制其他过程的范围-日志记录,性能分析,安全检查等。

CherryPy使用对象分派将URL与对象以及方法进行匹配。这些方法上的装饰器会发出信号,表明是否甚至允许 CherryPy 使用这些方法。例如,根据本教程改编而成:

class HelloWorld:

    ...

    def secret(self):
        return "You shouldn't be here."

    @cherrypy.expose
    def index(self):
        return "Hello world!"

cherrypy.quickstart(HelloWorld())

Decorators are used either to define a function’s properties or as boilerplate that alters it; it’s possible but counter-intuitive for them to return completely different functions. Looking at the other responses here, it seems like one of the most common uses is to limit the scope of some other process – be it logging, profiling, security checks, etc.

CherryPy uses object-dispatching to match URLs to objects and, eventually, methods. Decorators on those methods signal whether or not CherryPy is even allowed to use those methods. For example, adapted from the tutorial:

class HelloWorld:

    ...

    def secret(self):
        return "You shouldn't be here."

    @cherrypy.expose
    def index(self):
        return "Hello world!"

cherrypy.quickstart(HelloWorld())

回答 10

我最近在社交网络Web应用程序上使用它们。对于社区/小组,我应该授予成员资格以创建新的讨论并回复您必须是该特定小组成员的消息。因此,我写了一个装饰器,@membership_required并把它放在我认为需要的地方。

I used them recently, while working on social networking web application. For Community/Groups, I was supposed to give membership authorization to create new discussion and reply to a message you have to be the member of that particular group. So, I wrote a decorator @membership_required and put that where I required in my view.


回答 11

我用这个装饰器来修复参数

def fill_it(arg):
    if isinstance(arg, int):
        return "wan" + str(arg)
    else:
        try:
            # number present as string
            if str(int(arg)) == arg:
                return "wan" + arg
            else:
                # This should never happened
                raise Exception("I dont know this " + arg)
                print "What arg?"
        except ValueError, e:
            return arg

def fill_wanname(func):
    def wrapper(arg):
        filled = fill_it(arg)
        return func(filled)
    return wrapper

@fill_wanname
def get_iface_of(wanname):
    global __iface_config__
    return __iface_config__[wanname]['iface']

当我重构某些函数需要传递参数“ wanN”时编写的此代码,但是在我的旧代码中,我仅传递了N或’N’

I use this decorator to fix parameter

def fill_it(arg):
    if isinstance(arg, int):
        return "wan" + str(arg)
    else:
        try:
            # number present as string
            if str(int(arg)) == arg:
                return "wan" + arg
            else:
                # This should never happened
                raise Exception("I dont know this " + arg)
                print "What arg?"
        except ValueError, e:
            return arg

def fill_wanname(func):
    def wrapper(arg):
        filled = fill_it(arg)
        return func(filled)
    return wrapper

@fill_wanname
def get_iface_of(wanname):
    global __iface_config__
    return __iface_config__[wanname]['iface']

this written when I refactor some functions need to passed argument “wanN” but in my old codes, I passed N or ‘N’ only


回答 12

装饰器可用于轻松创建函数方法变量。

def static_var(varname, value):
    '''
    Decorator to create a static variable for the specified function
    @param varname: static variable name
    @param value: initial value for the variable
    '''
    def decorate(func):
        setattr(func, varname, value)
        return func
    return decorate

@static_var("count", 0)
def mainCallCount():
    mainCallCount.count += 1

Decorator can be used to easily create function method variables.

def static_var(varname, value):
    '''
    Decorator to create a static variable for the specified function
    @param varname: static variable name
    @param value: initial value for the variable
    '''
    def decorate(func):
        setattr(func, varname, value)
        return func
    return decorate

@static_var("count", 0)
def mainCallCount():
    mainCallCount.count += 1

按列对NumPy中的数组排序

问题:按列对NumPy中的数组排序

如何按第n列对NumPy中的数组排序?

例如,

a = array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

我想按第二列对行进行排序,以便返回:

array([[7, 0, 5],
       [9, 2, 3],
       [4, 5, 6]])

How can I sort an array in NumPy by the nth column?

For example,

a = array([[9, 2, 3],
           [4, 5, 6],
           [7, 0, 5]])

I’d like to sort rows by the second column, such that I get back:

array([[7, 0, 5],
       [9, 2, 3],
       [4, 5, 6]])

回答 0

@steve答案实际上是最优雅的方法。

对于“正确”的方式,请参见numpy.ndarray.sort的order关键字参数。

但是,您需要将数组视为具有字段的数组(结构化数组)。

如果您最初没有使用字段定义数组,那么“正确”的方法就很难看了。

作为一个简单的示例,对其进行排序并返回副本:

In [1]: import numpy as np

In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])

In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

对其进行原位排序:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None

In [7]: a
Out[7]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

据我所知,@ Steve确实是最优雅的方式…

此方法的唯一优点是,“ order”参数是用来对搜索进行排序的字段列表。例如,您可以通过提供order = [‘f1’,’f2’,’f0’]来对第二列,第三列,第一列进行排序。

@steve‘s is actually the most elegant way of doing it.

For the “correct” way see the order keyword argument of numpy.ndarray.sort

However, you’ll need to view your array as an array with fields (a structured array).

The “correct” way is quite ugly if you didn’t initially define your array with fields…

As a quick example, to sort it and return a copy:

In [1]: import numpy as np

In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])

In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

To sort it in-place:

In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None

In [7]: a
Out[7]: 
array([[0, 0, 1],
       [1, 2, 3],
       [4, 5, 6]])

@Steve’s really is the most elegant way to do it, as far as I know…

The only advantage to this method is that the “order” argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=[‘f1′,’f2′,’f0’].


回答 1

我想这可行: a[a[:,1].argsort()]

这表示的第二列,a并据此对其进行排序。

I suppose this works: a[a[:,1].argsort()]

This indicates the second column of a and sort it based on it accordingly.


回答 2

您可以按照Steve Tjoa的方法对多个列进行排序,方法是使用诸如mergesort之类的稳定排序并对索引从最低有效列到最高有效列进行排序:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]

排序方式为:第0列,然后是1,然后是2。

You can sort on multiple columns as per Steve Tjoa’s method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:

a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]

This sorts by column 0, then 1, then 2.


回答 3

我认为您可以从Python文档Wiki中进行以下操作:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); 
a = sorted(a, key=lambda a_entry: a_entry[1]) 
print a

输出为:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

From the Python documentation wiki, I think you can do:

a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]); 
a = sorted(a, key=lambda a_entry: a_entry[1]) 
print a

The output is:

[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]

回答 4

如果有人想在他们程序的关键部分使用排序,下面是对不同提案的性能比较:

import numpy as np
table = np.random.rand(5000, 10)

%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop

%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

因此,似乎使用argsort进行索引是迄今为止最快的方法…

In case someone wants to make use of sorting at a critical part of their programs here’s a performance comparison for the different proposals:

import numpy as np
table = np.random.rand(5000, 10)

%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop

%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop

import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop

So, it looks like indexing with argsort is the quickest method so far…


回答 5

该NumPy的邮件列表,这里是另一种解决方案:

>>> a
array([[1, 2],
       [0, 0],
       [1, 0],
       [0, 2],
       [2, 1],
       [1, 0],
       [1, 0],
       [0, 0],
       [1, 0],
      [2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
       [0, 0],
       [0, 2],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 2],
       [2, 1],
       [2, 2]])

From the NumPy mailing list, here’s another solution:

>>> a
array([[1, 2],
       [0, 0],
       [1, 0],
       [0, 2],
       [2, 1],
       [1, 0],
       [1, 0],
       [0, 0],
       [1, 0],
      [2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
       [0, 0],
       [0, 2],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 2],
       [2, 1],
       [2, 2]])

回答 6

我有一个类似的问题。

我的问题:

我想计算SVD,需要按降序对我的特征值进行排序。但是我想保留特征值和特征向量之间的映射。我的特征值在第一行中,而对应的特征向量在同一列中。

因此,我想按降序按第一行在列中对二维数组进行排序。

我的解决方案

a = a[::, a[0,].argsort()[::-1]]

那么这是如何工作的呢?

a[0,] 只是我要排序的第一行。

现在,我使用argsort来获取索引的顺序。

我用 [::-1]是因为我需要降序排列。

最后,我使用a[::, ...]正确的顺序查看各列。

I had a similar problem.

My Problem:

I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors. My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.

So I want to sort a two-dimensional array column-wise by the first row in descending order.

My Solution

a = a[::, a[0,].argsort()[::-1]]

So how does this work?

a[0,] is just the first row I want to sort by.

Now I use argsort to get the order of indices.

I use [::-1] because I need descending order.

Lastly I use a[::, ...] to get a view with the columns in the right order.


回答 7

稍微复杂一点的lexsort例子-在第一列下降,在第二列上升。的窍门lexsort是,它对行进行排序(因此.T),并优先考虑最后一行。

In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]: 
array([[1, 2, 1],
       [3, 1, 2],
       [1, 1, 3],
       [2, 3, 4],
       [3, 2, 5],
       [2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]: 
array([[3, 1, 2],
       [3, 2, 5],
       [2, 1, 6],
       [2, 3, 4],
       [1, 1, 3],
       [1, 2, 1]])

A little more complicated lexsort example – descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.

In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]: 
array([[1, 2, 1],
       [3, 1, 2],
       [1, 1, 3],
       [2, 3, 4],
       [3, 2, 5],
       [2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]: 
array([[3, 1, 2],
       [3, 2, 5],
       [2, 1, 6],
       [2, 3, 4],
       [1, 1, 3],
       [1, 2, 1]])

回答 8

这是考虑所有列的另一种解决方案(JJ的答案的更紧凑方式);

ar=np.array([[0, 0, 0, 1],
             [1, 0, 1, 0],
             [0, 1, 0, 0],
             [1, 0, 0, 1],
             [0, 0, 1, 0],
             [1, 1, 0, 0]])

用lexsort排序,

ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]

输出:

array([[0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 1, 0, 0]])

Here is another solution considering all columns (more compact way of J.J‘s answer);

ar=np.array([[0, 0, 0, 1],
             [1, 0, 1, 0],
             [0, 1, 0, 0],
             [1, 0, 0, 1],
             [0, 0, 1, 0],
             [1, 1, 0, 0]])

Sort with lexsort,

ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]

Output:

array([[0, 0, 0, 1],
       [0, 0, 1, 0],
       [0, 1, 0, 0],
       [1, 0, 0, 1],
       [1, 0, 1, 0],
       [1, 1, 0, 0]])

回答 9

只需使用排序,即可使用要排序的列号。

a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a=a.tolist() 
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)

Simply using sort, use coloumn number based on which you want to sort.

a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a=a.tolist() 
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)

回答 10

这是一个古老的问题,但是如果您需要将其推广到2维以上的数组,则可以采用以下解决方案:

np.einsum('ij->ij', a[a[:,1].argsort(),:])

这对于两个维度来说是一个过大的杀伤力,并且a[a[:,1].argsort()]每个@steve的答案就足够了,但是不能将该答案推广到更高的维度。您可以在此问题中找到3D阵列的示例。

输出:

[[7 0 5]
 [9 2 3]
 [4 5 6]]

It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:

np.einsum('ij->ij', a[a[:,1].argsort(),:])

This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per @steve’s answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.

Output:

[[7 0 5]
 [9 2 3]
 [4 5 6]]

Python setup.py开发与安装

问题:Python setup.py开发与安装

在setup.py两个选项develop,并install混淆了我。根据此站点,使用develop创建到site-packages目录的特殊链接。

人们建议我使用python setup.py install全新安装,并且python setup.py develop对安装文件进行任何更改之后。

谁能阐明这些命令的用法?

Two options in setup.py develop and install are confusing me. According to this site, using develop creates a special link to site-packages directory.

People have suggested that I use python setup.py install for a fresh installation and python setup.py develop after any changes have been made to the setup file.

Can anyone shed some light on the usage of these commands?


回答 0

python setup.py install 用于安装(通常是第三方)您不会自行开发/修改/调试的软件包。

对于您自己的东西,您想要首先安装您的软件包,然后能够频繁编辑代码不必每次都重新安装该软件包-正是python setup.py develop这样:安装软件包(通常只是一个源文件夹)以某种方式可以让您在将代码安装到(虚拟)环境后方便地编辑代码,并使更改立即生效。

请注意,强烈建议使用pip install .(安装)和pip install -e .(开发人员安装)来安装软件包,因为setup.py直接调用将对许多依赖项(例如pull prereleases和不兼容的软件包版本)做错事,或者使软件包难以使用卸载pip

python setup.py install is used to install (typically third party) packages that you’re not going to develop/modify/debug yourself.

For your own stuff, you want to first install your package and then be able to frequently edit the code without having to re-install the package every time — and that is exactly what python setup.py develop does: it installs the package (typically just a source folder) in a way that allows you to conveniently edit your code after it’s installed to the (virtual) environment, and have the changes take effect immediately.

Note that it is highly recommended to use pip install . (install) and pip install -e . (developer install) to install packages, as invoking setup.py directly will do the wrong things for many dependencies, such as pull prereleases and incompatible package versions, or make the package hard to uninstall with pip.


回答 1

文档中。该develop不会安装软件包,但它会创建一个.egg-link部署目录回项目源代码目录。

因此,这就像安装,而不是复制到其中,site-packages而是添加了一个符号链接(.egg-link充当多平台符号链接)。

这样,您可以编辑源代码并直接查看更改,无需在每次进行少量更改时都重新安装。如果您是该项目的开发者,因此很有用,名称为develop。如果您只是在安装别人的软件包,则应使用install

From the documentation. The develop will not install the package but it will create a .egg-link in the deployment directory back to the project source code directory.

So it’s like installing but instead of copying to the site-packages it adds a symbolic link (the .egg-link acts as a multiplatform symbolic link).

That way you can edit the source code and see the changes directly without having to reinstall every time that you make a little change. This is useful when you are the developer of that project hence the name develop. If you are just installing someone else’s package you should use install


回答 2

人们在使用该develop方法时可能会发现有用的另一件事是可以--user选择不使用sudo进行安装。例如:

python setup.py develop --user

代替

sudo python setup.py develop

Another thing that people may find useful when using the develop method is the --user option to install without sudo. Ex:

python setup.py develop --user

instead of

sudo python setup.py develop

在numpy数组中查找最接近的值

问题:在numpy数组中查找最接近的值

是否有numpy-thonic方法(例如函数)在数组中查找最接近的值

例:

np.find_nearest( array, value )

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

回答 0

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261
import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]

value = 0.5

print(find_nearest(array, value))
# 0.568743859261

回答 1

如果您的数组已排序并且非常大,则这是一个更快的解决方案:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

这可以扩展到非常大的阵列。如果您不能假定数组已经排序,则可以轻松修改上面的内容以对方法进行排序。对于小型阵列而言,这是过大的杀伤力,但是一旦阵列变大,速度就会更快。

IF your array is sorted and is very large, this is a much faster solution:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

This scales to very large arrays. You can easily modify the above to sort in the method if you can’t assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.


回答 2

稍作修改,上面的答案就可以用于任意维数(1d,2d,3d等)的数组:

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

或者,写成一行:

a.flat[np.abs(a - a0).argmin()]

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, …):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]

回答 3

答案摘要:如果已排序,array则二等分代码(如下所示)执行最快。大型阵列的速度要快〜100-1000倍,小型阵列的速度要快〜2-100倍。它也不需要numpy。如果您有一个未排序的,array则如果array为大,则应首先考虑使用O(n logn)排序,然后再按等分;如果array为小,则方法2似乎是最快的。

首先,您应该弄清最近值的含义。通常人们想要一个横坐标的间隔,例如array = [0,0.7,2.1],value = 1.95,答案将是idx = 1。我怀疑您是这种情况(否则,一旦找到间隔,可以使用后续条件语句很容易地修改以下内容)。我将注意到,执行此操作的最佳方法是使用二分法(我将首先提供它-请注意,它根本不需要numpy,并且比使用numpy函数要快,因为它们执行冗余操作)。然后,我将与其他用户在此处介绍的其他项目进行时间比较。

二等分:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

现在,我将从其他答案中定义代码,它们每个都返回一个索引:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

现在,我将对代码进行计时: 注意方法1,2,4,5没有正确给出间隔。方法1,2,4舍入到数组中的最近点(例如> = 1.5-> 2),方法5始终舍入(例如1.45-> 2)。只有方法3和6,当然还有二等分,才能正确给出间隔。

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

对于大型阵列,二等分与次优的180us和最长的1.21ms相比较给出4us(约快100-1000倍)。对于较小的阵列,速度要快2到100倍。

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first – note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I’ll define the code from the other answers, they each return an index:

import math
import numpy as np

def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx

def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices

def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices

def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx


def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I’ll time the codes: Note methods 1,2,4,5 don’t correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)

(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 – 1000 times faster). For smaller arrays it’s ~2-100 times faster.


回答 4

这是在向量数组中查找最近的向量的扩展。

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

Here’s an extension to find the nearest vector in an array of vectors.

import numpy as np

def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]

A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]  
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])

回答 5

如果您不想使用numpy,可以这样做:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

If you don’t want to use numpy this will do it:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]

回答 6

这是将处理非标量“值”数组的版本:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

如果输入是标量,则返回一个数字类型(例如,int,float)的版本:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

Here’s a version that will handle a non-scalar “values” array:

import numpy as np

def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]

回答 7

这是@Ari Onasafari的scipy版本,请回答“ 在向量数组中查找最近的向量

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

Here is a version with scipy for @Ari Onasafari, answer “to find the nearest vector in an array of vectors

In [1]: from scipy import spatial

In [2]: import numpy as np

In [3]: A = np.random.random((10,2))*100

In [4]: A
Out[4]:
array([[ 68.83402637,  38.07632221],
       [ 76.84704074,  24.9395109 ],
       [ 16.26715795,  98.52763827],
       [ 70.99411985,  67.31740151],
       [ 71.72452181,  24.13516764],
       [ 17.22707611,  20.65425362],
       [ 43.85122458,  21.50624882],
       [ 76.71987125,  44.95031274],
       [ 63.77341073,  78.87417774],
       [  8.45828909,  30.18426696]])

In [5]: pt = [6, 30]  # <-- the point to find

In [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point 
Out[6]: array([  8.45828909,  30.18426696])

#how it works!
In [7]: distance,index = spatial.KDTree(A).query(pt)

In [8]: distance # <-- The distances to the nearest neighbors
Out[8]: 2.4651855048258393

In [9]: index # <-- The locations of the neighbors
Out[9]: 9

#then 
In [10]: A[index]
Out[10]: array([  8.45828909,  30.18426696])

回答 8

如果您有很多values要搜索的东西,这是@Dimitri解决方案的快速向量化版本(values可以是多维数组):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

基准测试

比使用for@Demitri解决方案的循环快100倍以上

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

Here is a fast vectorized version of @Dimitri’s solution if you have many values to search for (values can be multi-dimensional array):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)

    # get insert positions
    idxs = np.searchsorted(array, values, side="left")

    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1

    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri’s solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds

回答 9

对于大型数组,@ Demitri给出的(出色)答案远远快于当前标记为最佳的答案。我通过以下两种方式调整了他的确切算法:

  1. 无论输入数组是否已排序,下面的函数均有效。

  2. 下面的函数返回与最接近的值相对应的输入数组的索引,该值更为通用。

请注意,下面的函数还处理特定的边缘情况,这会导致@Demitri编写的原始函数存在错误。否则,我的算法与他的算法相同。

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

For large arrays, the (excellent) answer given by @Demitri is far faster than the answer currently marked as best. I’ve adapted his exact algorithm in the following two ways:

  1. The function below works whether or not the input array is sorted.

  2. The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.

Note that the function below also handles a specific edge case that would lead to a bug in the original function written by @Demitri. Otherwise, my algorithm is identical to his.

def find_idx_nearest_val(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest

回答 10

这是unutbu答案的矢量化版本:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

This is a vectorized version of unutbu’s answer:

def find_nearest(array, values):
    array = np.asarray(array)

    # the last dim must be 1 to broadcast in (array - values) below.
    values = np.expand_dims(values, axis=-1) 

    indices = np.abs(array - values).argmin(axis=-1)

    return array[indices]


image = plt.imread('example_3_band_image.jpg')

print(image.shape) # should be (nrows, ncols, 3)

quantiles = np.linspace(0, 255, num=2 ** 2, dtype=np.uint8)

quantiled_image = find_nearest(quantiles, image)

print(quantiled_image.shape) # should be (nrows, ncols, 3)

回答 11

我认为最Python化的方式是:

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

这是基本代码。您可以根据需要将其用作功能

I think the most pythonic way would be:

 num = 65 # Input number
 array = n.random.random((10))*100 # Given array 
 nearest_idx = n.where(abs(array-num)==abs(array-num).min())[0] # If you want the index of the element of array (array) nearest to the the given number (num)
 nearest_val = array[abs(array-num)==abs(array-num).min()] # If you directly want the element of array (array) nearest to the given number (num)

This is the basic code. You can use it as a function if you want


回答 12

所有答案都有助于收集信息以编写高效的代码。但是,我编写了一个小的Python脚本来针对各种情况进行优化。如果对提供的数组进行了排序,那将是最好的情况。如果搜索指定值最近点的索引,则bisect模块是最省时的。当一个搜索索引对应于一个数组时,numpy searchsorted效率最高。

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

在[63]中:%time bisect.bisect_left(xlist,0.3)CPU时间:用户0 ns,sys:0 ns,总计:0 ns墙壁时间:22.2 µs

np.searchsorted(xar, 0.3, side="left")

在[64]中:%time np.searchsorted(xar,0.3,side =“ left”)CPU时间:用户0 ns,sys:0 ns,总计:0 ns挂墙时间:98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar,randpts,side =“ left”)CPU时间:用户4 ms,sys:0 ns,总计:4 ms挂墙时间:1.2 ms

如果我们遵循乘法规则,那么numpy应该花费〜100毫秒,这意味着〜83X更快。

All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

import numpy as np
import bisect
xarr = np.random.rand(int(1e7))

srt_ind = xarr.argsort()
xar = xarr.copy()[srt_ind]
xlist = xar.tolist()
bisect.bisect_left(xlist, 0.3)

In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

np.searchsorted(xar, 0.3, side="left")

In [64]: %time np.searchsorted(xar, 0.3, side=”left”) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

randpts = np.random.rand(1000)
np.searchsorted(xar, randpts, side="left")

%time np.searchsorted(xar, randpts, side=”left”) CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.


回答 13

对于2d数组,确定最近元素的i,j位置:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

For 2d array, to determine the i, j position of nearest element:

import numpy as np
def find_nearest(a, a0):
    idx = (np.abs(a - a0)).argmin()
    w = a.shape[1]
    i = idx // w
    j = idx - i * w
    return a[i,j], i, j

回答 14

import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))
import numpy as np
def find_nearest(array, value):
    array = np.array(array)
    z=np.abs(array-value)
    y= np.where(z == z.min())
    m=np.array(y)
    x=m[0,0]
    y=m[1,0]
    near_value=array[x,y]

    return near_value

array =np.array([[60,200,30],[3,30,50],[20,1,-50],[20,-500,11]])
print(array)
value = 0
print(find_nearest(array, value))

回答 15

可能对ndarrays

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

Maybe helpful for ndarrays:

def find_nearest(X, value):
    return X[np.unravel_index(np.argmin(np.abs(X - value)), X.shape)]

‘pip’不被识别为内部或外部命令

问题:’pip’不被识别为内部或外部命令

尝试在计算机上安装Django时遇到奇怪的错误。

这是我在命令行中键入的序列:

C:\Python34>python get-pip.py
Requirement already up-to-date: pip in c:\python34\lib\site-packages
Cleaning up...

C:\Python34>pip install Django
'pip' is not recognized as an internal or external command,
operable program or batch file.

C:\Python34>lib\site-packages\pip install Django
'lib\site-packages\pip' is not recognized as an internal or external command,
operable program or batch file. 

是什么原因造成的?

编辑_______________________

根据要求,这是我输入echo%PATH%时得到的结果

C:\Python34>echo %PATH%
C:\Program Files\ImageMagick-6.8.8-Q16;C:\Program Files (x86)\Intel\iCLS Client\
;C:\Program Files\Intel\iCLS Client\;C:\Windows\system32;C:\Windows;C:\Windows\S
ystem32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\
Windows Live\Shared;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x86;C:\Progr
am Files (x86)\Intel\OpenCL SDK\2.0\bin\x64;C:\Program Files\Intel\Intel(R) Mana
gement Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine C
omponents\IPT;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components
\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\P
rogram Files (x86)\nodejs\;C:\Program Files (x86)\Heroku\bin;C:\Program Files (x
86)\git\cmd;C:\RailsInstaller\Ruby2.0.0\bin;C:\RailsInstaller\Git\cmd;C:\RailsIn
staller\Ruby1.9.3\bin;C:\Users\Javi\AppData\Roaming\npm

I’m running into a weird error when trying to install Django on my computer.

This is the sequence that I typed into my command line:

C:\Python34>python get-pip.py
Requirement already up-to-date: pip in c:\python34\lib\site-packages
Cleaning up...

C:\Python34>pip install Django
'pip' is not recognized as an internal or external command,
operable program or batch file.

C:\Python34>lib\site-packages\pip install Django
'lib\site-packages\pip' is not recognized as an internal or external command,
operable program or batch file. 

What could be causing this?

EDIT _______________________

As requested this is what I get when I type in echo %PATH%

C:\Python34>echo %PATH%
C:\Program Files\ImageMagick-6.8.8-Q16;C:\Program Files (x86)\Intel\iCLS Client\
;C:\Program Files\Intel\iCLS Client\;C:\Windows\system32;C:\Windows;C:\Windows\S
ystem32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\
Windows Live\Shared;C:\Program Files (x86)\Intel\OpenCL SDK\2.0\bin\x86;C:\Progr
am Files (x86)\Intel\OpenCL SDK\2.0\bin\x64;C:\Program Files\Intel\Intel(R) Mana
gement Engine Components\DAL;C:\Program Files\Intel\Intel(R) Management Engine C
omponents\IPT;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components
\DAL;C:\Program Files (x86)\Intel\Intel(R) Management Engine Components\IPT;C:\P
rogram Files (x86)\nodejs\;C:\Program Files (x86)\Heroku\bin;C:\Program Files (x
86)\git\cmd;C:\RailsInstaller\Ruby2.0.0\bin;C:\RailsInstaller\Git\cmd;C:\RailsIn
staller\Ruby1.9.3\bin;C:\Users\Javi\AppData\Roaming\npm

回答 0

您需要将pip安装的路径添加到PATH系统变量中。默认情况下,pip已安装C:\Python34\Scripts\pip(pip现在与python的新版本捆绑在一起),因此需要将路径“ C:\ Python34 \ Scripts”添加到PATH变量中。

要检查它是否已经在PATH变量中,请echo %PATH%在CMD提示符下键入

要将pip安装的路径添加到PATH变量中,可以使用“控制面板”或setx命令。例如:

setx PATH "%PATH%;C:\Python34\Scripts"

注意:根据官方文档,“使用setx变量设置的[v]变量仅在以后的命令窗口中可用,而在当前命令窗口中不可用”。特别是,输入上述命令后,您将需要启动一个新的cmd.exe实例,以利用新的环境变量。

感谢Scott Bartell指出这一点。

You need to add the path of your pip installation to your PATH system variable. By default, pip is installed to C:\Python34\Scripts\pip (pip now comes bundled with new versions of python), so the path “C:\Python34\Scripts” needs to be added to your PATH variable.

To check if it is already in your PATH variable, type echo %PATH% at the CMD prompt

To add the path of your pip installation to your PATH variable, you can use the Control Panel or the setx command. For example:

setx PATH "%PATH%;C:\Python34\Scripts"

Note: According to the official documentation, “[v]ariables set with setx variables are available in future command windows only, not in the current command window”. In particular, you will need to start a new cmd.exe instance after entering the above command in order to utilize the new environment variable.

Thanks to Scott Bartell for pointing this out.


回答 1

对于Windows,在安装软件包时,键入:

python -m pip install [packagename]

For windows when you install a package you type:

python -m pip install [packagename]

回答 2

对我而言:

set PATH=%PATH%;C:\Python34\Scripts

立即工作(尝试echo%PATH%之后,您将看到路径值为C:\ Python34 \ Scripts)。

感谢:https : //stackoverflow.com/a/9546345/1766166

For me command:

set PATH=%PATH%;C:\Python34\Scripts

worked immediately (try after echo %PATH% and you will see that your path has the value C:\Python34\Scripts).

Thanks to: https://stackoverflow.com/a/9546345/1766166


回答 3

到目前为止,在3.7.3版中,获取正确的系统变量存在一点问题。

尝试这个:

  1. 在cmd中输入“ start%appdata%”。

  2. 之后,该文件浏览器应在“ ../AppData/Roaming”中弹出。

返回一个目录,然后导航到“本地/程序/ Python / Python37-32 /脚本”。

注意:版本号可能会有所不同,因此,如果您复制并粘贴上述文件路径,它将无法正常工作。

完成此操作后,您现在具有已下载Python的正确位置。通过在地址栏中选择整个目录来复制文件路径。

在此处输入图片说明

完成此操作后,单击开始图标,然后导航到“控制面板”>“系统和安全性”>“系统”。然后单击面板左侧的“高级系统设置”。

单击右下角的环境变量后,将有两个框,一个上框和一个下框。在上方的框中:单击“路径”变量,然后单击右侧的“编辑”。单击“新建”,然后粘贴目录路径。它看起来应该像这样:

在此处输入图片说明

单击确定三次,打开cmd的新窗口,然后键入:pip。看看是否可行。

祝好运!

As of now, version 3.7.3 I had a little bit of an issue with getting the right system variable.

Try this:

  1. Type ‘start %appdata%’ in cmd.

  2. After that file explorer should pop up in ‘../AppData/Roaming’.

Go back one directory and navigate to ‘Local/Programs/Python/Python37-32/Scripts’.

NOTE: The version number may be different so if you copy and paste the above file path it could not work.

After you do this you now have the correct location of your downloaded Python. Copy your file path by selecting the whole directory in the address bar.

enter image description here

Once you do that click the start icon and navigate to the Control Panel > System and Security > System. Then click “Advanced System Settings” on the left side of the panel.

Once there click environment Variables on the bottom right and there will be two boxes, an upper and a lower box. In the upper box: Click on the ‘Path’ Variable and click ‘edit’ located on the right. Click ‘New’ and paste your directory Path. It should look something like this:

enter image description here

Click Ok three times, open a NEW window of cmd and type: pip. See if it works.

Good Luck!


回答 4

替代方式。

如果您不想添加PATH(如先前的书面回答所指出的那样),

但是您想执行pip作为命令,则可以使用py -m前缀做。

鉴于您必须一次又一次地执行此操作。

例如。

py -m <command>

py -m pip install --upgrade pip setuptools

另外,还要确保有pippy安装

在此处输入图片说明

Alternate way.

If you don’t want to add the PATH as the previous well written answers pointed out,

but you want to execute pip as your command then you can do that with py -m as prefix.

Given that you have to do it again and again.

eg.

py -m <command>

as in

py -m pip install --upgrade pip setuptools

Also make sure to have pip and py installed

enter image description here


回答 5

同样,长方法-在尝试了上述所有项目之后,这是最后的选择:

c:\python27\scripts\pip.exe install [package].whl

cd在车轮所在目录中之后

also, the long method – it was a last resort after trying all items above:

c:\python27\scripts\pip.exe install [package].whl

this after cd in directory where the wheel is located


回答 6

根据Python 3.6文档

默认情况下,可能未安装pip。一种可能的解决方法是:

python -m ensurepip --default-pip

As per Python 3.6 Documentation

It is possible that pip does not get installed by default. One potential fix is:

python -m ensurepip --default-pip

回答 7

转到控制面板>>卸载 更改程序,然后双击Python XXX修改安装。确保已检查并安装了PIP组件。

在此处输入图片说明

Go to control Panel >> Uninstall or change Program and double click on Python XXX to modify install. Make sure PIP component is checked and install.

enter image description here


回答 8

我意识到这是一个古老的问题,但是我刚才也遇到了同样的问题。将正确的文件夹(C:\Python33\Scripts)添加到路径后,我仍然无法pip运行。它只需要运行pip.exe install -package-而不是 运行 pip install -package-。只是一个想法。

I realize this is an old question, but I was having the same problem just now. After adding the proper folder (C:\Python33\Scripts) to the path, I still could not get pip to run. All it took was running pip.exe install -package- instead of pip install -package-. Just a thought.


回答 9

设置路径= %PATH%;C:\Python34\;C:\Python27\Scripts

set Path = %PATH%;C:\Python34\;C:\Python27\Scripts
Source


回答 10

控制面板->添加/删除程序-> Python->修改->可选功能(您可以单击所有内容),然后按下一步->选中“将python添加到环境变量”->安装

在此处输入图片说明

这应该可以解决您的路径问题,因此请跳至命令提示符,您现在可以使用pip。

Control Panel -> add/remove programs -> Python -> Modify -> optional Features (you can click everything) then press next -> Check “Add python to environment variables” -> Install

enter image description here

And that should solve your path issues, so jump to command prompt and you can use pip now.


回答 11

尝试进入Windows powershell或cmd提示符并输入:

python -m pip install openpyxl

Try going to windows powershell or cmd prompt and typing:

python -m pip install openpyxl


回答 12

在最新版本的python 3.6.2及更高版本中,提供了

C:\ Program Files(x86)\ Python36-32 \ Scripts

您可以将路径添加到我们的环境变量路径,如下所示 在此处输入图片说明

设置路径后,请确保关闭命令提示符或git。还应该在管理员模式下打开命令提示符。这是Windows 10的示例。

In latest version python 3.6.2 and above, is available in

C:\Program Files (x86)\Python36-32\Scripts

You can add the path to our environment variable path as below enter image description here

Make sure you close your command prompt or git after setting up your path. Also should you open your command prompt in administrator mode. This is example for windows 10.


回答 13

甚至我都不熟悉,但是C:\ Python34 \ Scripts> pip install django为我工作。路径应设置为Python安装的Script文件夹所在的位置,即C:\ Python34 \ Scripts。我想这是因为django是一个基于python的框架,这就是为什么在安装时必须维护此目录结构的原因。

Even I’m new to this but, C:\Python34\Scripts>pip install django ,worked for me. The path should be set as where the Script folder of Python installation is i.e.C:\Python34\Scripts. I suppose its because django is a framework which is based on python that’s why this directory structure has to be maintained while installing.


回答 14

您可以尝试pip3,例如:

C:> pip3 install pandas

Yo can try pip3, something like:

C:> pip3 install pandas

回答 15

我刚刚安装了python 3.6.2。

我知道了

C:\ Users \ USERNAME \ AppData \ Local \ Programs \ Python \ Python36-32 \ Scripts

I have just installed python 3.6.2.

I got the path as

C:\Users\USERNAME\AppData\Local\Programs\Python\Python36-32\Scripts


回答 16

在Windows中,打开cmd并使用来找到PYTHON_HOME的位置where python,现在使用以下命令将该位置 添加到您的环境PATH中:

set PATH=%PATH%;<PYTHON_HOME>\Scripts

参考此


在Linux中,打开终端并使用来找到PYTHON_HOME的位置which python,现在使用来将其添加PYTHON_HOME/Scripts到PATH变量中:

PATH=$PATH:<PYTHON_HOME>\Scripts
export PATH

In Windows, open cmd and find the location of PYTHON_HOME using where python, now add this location to your Environment PATH using:

set PATH=%PATH%;<PYTHON_HOME>\Scripts

or Refer this


In Linux, open terminal and find the location of PYTHON_HOME using which python, now add the PYTHON_HOME/Scripts to the PATH variable using:

PATH=$PATH:<PYTHON_HOME>\Scripts
export PATH

回答 17

我认为从Python 2.7.9及更高版本开始,预安装了pip,它将位于您的scripts文件夹中。因此,您必须将“ scripts”文件夹添加到路径。我的被​​安装在C:\ Python27 \ Scripts中。检查您的路径,以便您可以相应地更改以下内容,然后转到powershell,将以下代码粘贴到powershell中,然后按Enter键。之后,重新启动,您的问题将得到解决。

[Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\Python27\Scripts", "User")

I think from Python 2.7.9 and higher pip comes pre installed and it will be in your scripts folder. So you have to add the “scripts” folder to the path. Mine is installed in C:\Python27\Scripts. Check yours to see what your path is so that you can alter the below accordingly, then Go to powershell, paste the below code in powershell and hit Enter key. After that, reboot and your issue will be resolved.

[Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\Python27\Scripts", "User")

回答 18

或者,如果您像我一样使用PyCharm(2017.3.3),只需在终端中更改目录并安装:

cd C:\Users\{user}\PycharmProjects\test\venv\Scripts
pip install ..

Or If you are using PyCharm (2017.3.3) like me, just change directory in terminal and install:

cd C:\Users\{user}\PycharmProjects\test\venv\Scripts
pip install ..

回答 19

在Windows环境中,只需在dos shell中执行以下命令即可。

path =%path%; D:\ Program Files \ python3.6.4 \ Scripts; (新路径=当前路径; python脚本文件夹的路径)

In windows environment, just execute below commands in dos shell.

path=%path%;D:\Program Files\python3.6.4\Scripts; (new path=current path;path of the python script folder)


回答 20

我遇到了同样的问题以管理员身份运行Windows PowerShell 可以解决我的问题。

在此处输入图片说明

I was facing same issue Run Windows PowerShell as Administrator it resolved my issue.

enter image description here


回答 21

最常见的是:

cmd.exe

python -m pip install --user [name of your module here without brackets]

Most frequently it is:

in cmd.exe write

python -m pip install --user [name of your module here without brackets]

回答 22

更正PATH之后,我仍然收到此错误。

如果您的代码库要求您具有Python的早期版本(在我的情况下为2.7),则它可能是存在pip之前的版本。

这不是很规范,但是安装一个更新的版本对我有用。(我用的是2.7.13)

I continued to recieve this error after correcting my PATH.

If your codebase requires that you have an earlier version of Python (2.7 in my case), it may have been a version prior to the existence of pip.

It’s not very canonical but installing a more recent version worked for me. (I used 2.7.13)


回答 23

我有同样的问题。你只需要去你的

C:\ Python27 \脚本

并将其添加到环境变量中。设置路径后,只需在C:\ Python27 \ Scripts上运行pip.exe文件,然后在cmd中尝试pip。但是,如果没有任何反应,请尝试运行所有pip应用程序,例如pip2.7和pip2.exe。点子将像魅力一样工作。

I had this same issue. You just need to go to your

C:\Python27\Scripts

and add it to environment variables. After path setting just run pip.exe file on C:\Python27\Scripts and then try pip in cmd. But if nothing happens try running all pip applications like pip2.7 and pip2.exe. And pip will work like a charm.


回答 24

解决此问题的一种非常简单的方法是在文件资源管理器中打开安装pip的路径,然后单击该路径,然后键入cmd,这将设置路径,使您的安装方式更容易。

迟到了4年,但几天前我遇到了同样的问题,其他所有方法对我都不起作用。

A very simple way to get around this is to open the path where pip is installed in file explorer, and click on the path, then type cmd, this sets the path, allowing you to install way easier.

4 years late, but I ran into the same issue a couple days ago and all the other methods didn’t work for me.


回答 25

尝试卸载Python,删除其余程序文件,然后重新全新安装。它为我工作。当我迁移到新笔记本电脑并使用迁移软件将软件从旧笔记本电脑迁移到新笔记本电脑时,发生了此错误。是的,效果不是很好。

Try uninstall Python, delete the remaining program files, then install it again fresh. It worked for me. This error happened to me when I migrate to a new laptop and used a migration software to move my software from the old laptop to the new one. And yeah, didn’t work quite well.


回答 26

小小的澄清:在“ Windows 7 64位PC”中,添加...Python34\Scripts到path变量后,pip install pygame对我不起作用。

所以我检查了“ … Python34 \ Scripts”文件夹,它没有pip,但有pip3pip3.4。所以我跑了pip3.4 install pygame .... .whl。有效。

(进一步在已下载pygame...whl文件所在的文件夹中打开一个命令窗口。)

Small clarification: in “Windows 7 64 bit PC”, after adding ...Python34\Scripts to the path variable, pip install pygame didn’t work for me.

So I checked the “…Python34\Scripts” folder, it didn’t have pip, but it had pip3 and pip3.4. So I ran pip3.4 install pygame .... .whl. It worked.

(Further open a command window in the same folder where you have the downloaded pygame...whl file.)


回答 27

对我来说,问题是,在PATH中添加以下内容后,系统未重新启动。

C:\Users\admin\AppData\Local\Programs\Python\Python37\Scripts

For me the issue was, system was not RESTARTED after adding below in PATH.

C:\Users\admin\AppData\Local\Programs\Python\Python37\Scripts


回答 28

安装SQL 2019 Python时,PIP存在一些已知问题,需要修复(步骤7) https://docs.microsoft.com/zh-cn/sql/advanced-analytics/known-issues-for-sql-server-机器学习服务?view = sql-server-ver15

pip配置了需要TLS / SSL的位置,但是Python中的ssl模块不可用。

Workaround
Copy the following files:

libssl-1_1-x64.dll
libcrypto-1_1-x64.dll

from the folder 
C:\Program Files\Microsoft SQL Server\MSSSQL15.MSSQLSERVER\PYTHON_SERVICES\Library\bin
to the folder 
C:\Program Files\Microsoft SQL Server\MSSSQL15.MSSQLSERVER\PYTHON_SERVICES\DLLs

Then open a new DOS command shell prompt.

When installing SQL 2019 Python, there are known issues for PIP which require a fix (step 7) https://docs.microsoft.com/en-us/sql/advanced-analytics/known-issues-for-sql-server-machine-learning-services?view=sql-server-ver15

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

Workaround
Copy the following files:

libssl-1_1-x64.dll
libcrypto-1_1-x64.dll

from the folder 
C:\Program Files\Microsoft SQL Server\MSSSQL15.MSSQLSERVER\PYTHON_SERVICES\Library\bin
to the folder 
C:\Program Files\Microsoft SQL Server\MSSSQL15.MSSQLSERVER\PYTHON_SERVICES\DLLs

Then open a new DOS command shell prompt.

回答 29

对于Mac,在终端中运行以下命令

echo  export "PATH=$HOME/Library/Python/2.7/bin:$PATH"

For mac run below command in terminal

echo  export "PATH=$HOME/Library/Python/2.7/bin:$PATH"