标签归档:Python

Numpy argsort-它在做什么?

问题:Numpy argsort-它在做什么?

为什么numpy给出以下结果:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

当我期望它能做到这一点时:

[3 2 0 1]

显然,我对该功能缺乏了解。

Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I’d expect it to do this:

[3 2 0 1]

Clearly my understanding of the function is lacking.


回答 0

根据文档

返回将对数组进行排序的索引。

  • 2是的索引0.0
  • 3是的索引0.1
  • 1是的索引1.41
  • 0是的索引1.48

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

回答 1

[2, 3, 1, 0] 表示最小的元素位于索引2,其次最小的元素位于索引3,然后是索引1,然后是索引0。

多种方法可以获取您想要的结果:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

例如,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

这将检查它们是否都产生相同的结果:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

这些IPython %timeit基准测试建议大型阵列using_indexed_assignment最快:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

对于小型阵列,using_argsort_twice可能会更快:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

还请注意,这stats.rankdata使您可以更好地控制如何处理相等值的元素。

[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

There are a number of ways to get the result you are looking for:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

For example,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

This checks that they all produce the same result:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

For small arrays, using_argsort_twice may be faster:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

Note also that stats.rankdata gives you more control over how to handle elements of equal value.


回答 2

由于文档说,argsort

返回将对数组进行排序的索引。

这意味着argsort的第一个元素是应首先排序的元素的索引,第二个元素是应第二个排序的元素的索引,依此类推。

您似乎想要的是值的排名顺序,这是由提供的scipy.stats.rankdata。请注意,如果队伍中有平局,您需要考虑应该怎么做。

As the documentation says, argsort:

Returns the indices that would sort an array.

That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.


回答 3

numpy.argsort(a,axis = -1,kind =’quicksort’,order = None)

返回将对数组进行排序的索引

使用kind关键字指定的算法沿给定的轴执行间接排序。它沿着给定的轴按排序顺序返回与该索引数据具有相同形状的索引数组。

考虑一下python中的一个示例,其中包含一个值列表

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

现在我们使用argsort函数:

import numpy as np
list(np.argsort(listExample))

输出将是

[0, 5, 6, 1, 3, 2, 4]

这是listExample中值索引的列表,如果将这些索引映射到各自的值,则将得到如下结果:

[0, 0, 1, 2, 2000, 2456, 5000]

(我发现此功能在许多地方都非常有用,例如,如果您想对列表/数组进行排序,但又不想使用list.sort()函数(即,不更改列表中实际值的顺序),则可以使用此功能功能。)

有关更多详细信息,请参见以下链接:https : //docs.scipy.org/doc/numpy-1.15.0/reference/genic/numpy.argsort.html

numpy.argsort(a, axis=-1, kind=’quicksort’, order=None)

Returns the indices that would sort an array

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as that index data along the given axis in sorted order.

Consider one example in python, having a list of values as

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

Now we use argsort function:

import numpy as np
list(np.argsort(listExample))

The output will be

[0, 5, 6, 1, 3, 2, 4]

This is the list of indices of values in listExample if you map these indices to the respective values then we will get the result as follows:

[0, 0, 1, 2, 2000, 2456, 5000]

(I find this function very useful in many places e.g. If you want to sort the list/array but don’t want to use list.sort() function (i.e. without changing the order of actual values in the list) you can use this function.)

For more details refer this link: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argsort.html


回答 4

输入:
将numpy导入为np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort()。argsort()

输出:
array([3,2,0,1])

input:
import numpy as np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort().argsort()

output:
array([3, 2, 0, 1])


回答 5

首先,对数组进行了排序。然后使用数组的初始索引生成一个数组。

First, it was ordered the array. Then generate an array with the initial index of the array.


回答 6

np.argsort返回“种类”(指定排序算法的类型)给定的排序数组的索引。但是,当列表与np.argmax一起使用时,它将返回列表中最大元素的索引。而np.sort对给定的数组,列表进行排序。

np.argsort returns the index of the sorted array given by the ‘kind’ (which specifies the type of sorting algorithm). However, when a list is used with np.argmax, it returns the index of the largest element in the list. While, np.sort, sorts the given array, list.


回答 7

只是想直接将OP的原始理解与使用代码的实际实现进行对比。

numpy.argsort 定义为对于一维数组:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

OP最初认为其定义是针对一维数组:

x == numpy.sort(x)[x.argsort()] # this will not be True

注意:此代码在一般情况下不起作用(仅适用于1D),此答案仅用于说明目的。

Just want to directly contrast the OP’s original understanding against the actual implementation with code.

numpy.argsort is defined such that for 1D arrays:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

The OP originally thought that it was defined such that for 1D arrays:

x == numpy.sort(x)[x.argsort()] # this will not be True

Note: This code doesn’t work in the general case (only works for 1D), this answer is purely for illustration purposes.


回答 8

它根据给定的数组索引返回索引[1.48,1.41,0.0,0.1],这意味着: 0.0是索引[2]中的第一个元素。 0.1是index [3]中的第二个元素。 1.41是索引[1]中的第三个元素。 1.48是索引[0]中的第四个元素。输出:

[2,3,1,0]

It returns indices according to the given array indices,[1.48,1.41,0.0,0.1],that means: 0.0 is the first element, in index [2]. 0.1 is the second element, in index[3]. 1.41 is the third element, in index [1]. 1.48 is the fourth element, in index[0]. Output:

[2,3,1,0]

如何在Python中记录源文件名和行号

问题:如何在Python中记录源文件名和行号

是否有可能装饰/扩展python标准日志记录系统,以便在调用日志记录方法时也将文件和文件的行号记录在调用它的位置,或者可能是调用该文件的方法?

Is it possible to decorate/extend the python standard logging system, so that when a logging method is invoked it also logs the file and the line number where it was invoked or maybe the method that invoked it?


回答 0

当然,请检查日志记录文档中的格式化程序。特别是lineno和pathname变量。

%(pathname)s 发出日志记录调用的源文件完整路径名(如果有)。

%(filename)s 路径名文件名部分。

%(module)s 模块(文件名的名称部分)。

%(funcName)s 包含日志记录调用的函数名称。

%(lineno)d 发出记录调用的源行号(如果有)。

看起来像这样:

formatter = logging.Formatter('[%(asctime)s] p%(process)s {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s','%m-%d %H:%M:%S')

Sure, check formatters in logging docs. Specifically the lineno and pathname variables.

%(pathname)s Full pathname of the source file where the logging call was issued(if available).

%(filename)s Filename portion of pathname.

%(module)s Module (name portion of filename).

%(funcName)s Name of function containing the logging call.

%(lineno)d Source line number where the logging call was issued (if available).

Looks something like this:

formatter = logging.Formatter('[%(asctime)s] p%(process)s {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s','%m-%d %H:%M:%S')

回答 1

Seb的非常有用的答案之上,这是一个方便的代码段,以合理的格式演示了记录器的用法:

#!/usr/bin/env python
import logging

logging.basicConfig(format='%(asctime)s,%(msecs)d %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
    datefmt='%Y-%m-%d:%H:%M:%S',
    level=logging.DEBUG)

logger = logging.getLogger(__name__)
logger.debug("This is a debug log")
logger.info("This is an info log")
logger.critical("This is critical")
logger.error("An error occurred")

生成此输出:

2017-06-06:17:07:02,158 DEBUG    [log.py:11] This is a debug log
2017-06-06:17:07:02,158 INFO     [log.py:12] This is an info log
2017-06-06:17:07:02,158 CRITICAL [log.py:13] This is critical
2017-06-06:17:07:02,158 ERROR    [log.py:14] An error occurred

On top of Seb’s very useful answer, here is a handy code snippet that demonstrates the logger usage with a reasonable format:

#!/usr/bin/env python
import logging

logging.basicConfig(format='%(asctime)s,%(msecs)d %(levelname)-8s [%(filename)s:%(lineno)d] %(message)s',
    datefmt='%Y-%m-%d:%H:%M:%S',
    level=logging.DEBUG)

logger = logging.getLogger(__name__)
logger.debug("This is a debug log")
logger.info("This is an info log")
logger.critical("This is critical")
logger.error("An error occurred")

Generates this output:

2017-06-06:17:07:02,158 DEBUG    [log.py:11] This is a debug log
2017-06-06:17:07:02,158 INFO     [log.py:12] This is an info log
2017-06-06:17:07:02,158 CRITICAL [log.py:13] This is critical
2017-06-06:17:07:02,158 ERROR    [log.py:14] An error occurred

回答 2

以将调试日志记录发送到标准输出的方式建立在上面的基础上:

import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.DEBUG)
FORMAT = "[%(filename)s:%(lineno)s - %(funcName)20s() ] %(message)s"
formatter = logging.Formatter(FORMAT)
ch.setFormatter(formatter)
root.addHandler(ch)

logging.debug("I am sent to standard out.")

将以上内容放入一个名为的文件中,将debug_logging_example.py产生输出:

[debug_logging_example.py:14 -             <module>() ] I am sent to standard out.

然后,如果要关闭日志记录注释掉root.setLevel(logging.DEBUG)

对于单个文件(例如,类分配),我发现这是比使用print()语句更好的方法。在此允许您在提交调试输出之前在一个位置关闭调试输出。

To build on the above in a way that sends debug logging to standard out:

import logging
import sys

root = logging.getLogger()
root.setLevel(logging.DEBUG)

ch = logging.StreamHandler(sys.stdout)
ch.setLevel(logging.DEBUG)
FORMAT = "[%(filename)s:%(lineno)s - %(funcName)20s() ] %(message)s"
formatter = logging.Formatter(FORMAT)
ch.setFormatter(formatter)
root.addHandler(ch)

logging.debug("I am sent to standard out.")

Putting the above into a file called debug_logging_example.py produces the output:

[debug_logging_example.py:14 -             <module>() ] I am sent to standard out.

Then if you want to turn off logging comment out root.setLevel(logging.DEBUG).

For single files (e.g. class assignments) I’ve found this a far better way of doing this as opposed to using print() statements. Where it allows you to turn the debug output off in a single place before you submit it.


回答 3

对于使用PyCharm或Eclipse pydev的开发人员,以下内容将在控制台日志输出中生成指向log语句源的链接:

import logging, sys, os
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format='%(message)s | \'%(name)s:%(lineno)s\'')
log = logging.getLogger(os.path.basename(__file__))


log.debug("hello logging linked to source")

有关更多讨论和历史记录,请参见Eclipse控制台中的Pydev源文件超链接

For devs using PyCharm or Eclipse pydev, the following will produce a link to the source of the log statement in the console log output:

import logging, sys, os
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format='%(message)s | \'%(name)s:%(lineno)s\'')
log = logging.getLogger(os.path.basename(__file__))


log.debug("hello logging linked to source")

See Pydev source file hyperlinks in Eclipse console for longer discussion and history.


回答 4

# your imports above ...


logging.basicConfig(
    format='%(asctime)s,%(msecs)d %(levelname)-8s [%(pathname)s:%(lineno)d in 
    function %(funcName)s] %(message)s',
    datefmt='%Y-%m-%d:%H:%M:%S',
    level=logging.DEBUG
)

logger = logging.getLogger(__name__)

# your classes and methods below ...
# An naive Sample of usage:
try:
    logger.info('Sample of info log')
    # your code here
except Exception as e:
    logger.error(e)

与其他答案不同,这将记录文件的完整路径以及可能发生错误的函数名称。如果您的项目中有多个模块,并且在这些模块中分布了多个具有相同名称的文件,这将非常有用。

# your imports above ...


logging.basicConfig(
    format='%(asctime)s,%(msecs)d %(levelname)-8s [%(pathname)s:%(lineno)d in 
    function %(funcName)s] %(message)s',
    datefmt='%Y-%m-%d:%H:%M:%S',
    level=logging.DEBUG
)

logger = logging.getLogger(__name__)

# your classes and methods below ...
# An naive Sample of usage:
try:
    logger.info('Sample of info log')
    # your code here
except Exception as e:
    logger.error(e)

Different of the other answers, this will log full path of file and the function name that might have occurred an error. This is useful if you have a project with more than one module and several files with the same name distributed in these modules.


如何在Django中过滤用于计数注释的对象?

问题:如何在Django中过滤用于计数注释的对象?

考虑简单的Django模型EventParticipant

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

使用参与者总数来注释事件查询很容易:

events = Event.objects.all().annotate(participants=models.Count('participant'))

如何用筛选的参与者计数进行注释is_paid=True

我需要查询所有事件,而与参与者人数无关,例如,我不需要按带注释的结果进行过滤。如果有0参与者,那没关系,我只需要带有0注释的值即可。

文档中的示例在这里不起作用,因为它从查询中排除了对象,而不是使用注释了对象0

更新。Django 1.8具有新的条件表达式功能,因此我们现在可以这样做:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

更新 2。Django 2.0具有新的条件聚合功能,请参阅下面的可接受答案

Consider simple Django models Event and Participant:

class Event(models.Model):
    title = models.CharField(max_length=100)

class Participant(models.Model):
    event = models.ForeignKey(Event, db_index=True)
    is_paid = models.BooleanField(default=False, db_index=True)

It’s easy to annotate events query with total number of participants:

events = Event.objects.all().annotate(participants=models.Count('participant'))

How to annotate with count of participants filtered by is_paid=True?

I need to query all events regardless of number of participants, e.g. I don’t need to filter by annotated result. If there are 0 participants, that’s ok, I just need 0 in annotated value.

The example from documentation doesn’t work here, because it excludes objects from query instead of annotating them with 0.

Update. Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0,
        output_field=models.IntegerField()
    )))

Update 2. Django 2.0 has new Conditional aggregation feature, see the accepted answer below.


回答 0

Django 2.0中的条件聚合可让您进一步减少过去的流量。这也将使用Postgres的filter逻辑,该逻辑比求和的情况要快一些(我见过像20-30%这样的数字被打乱)。

无论如何,就您的情况而言,我们正在研究以下简单内容:

from django.db.models import Q, Count
events = Event.objects.annotate(
    paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

在文档中有一个单独的部分,关于对注释进行过滤。它和条件聚合是一样的东西,但是更像上面的例子。无论哪种方式,这都比我以前做的粗糙子查询要健康得多。

Conditional aggregation in Django 2.0 allows you to further reduce the amount of faff this has been in the past. This will also use Postgres’ filter logic, which is somewhat faster than a sum-case (I’ve seen numbers like 20-30% bandied around).

Anyway, in your case, we’re looking at something as simple as:

from django.db.models import Q, Count
events = Event.objects.annotate(
    paid_participants=Count('participants', filter=Q(participants__is_paid=True))
)

There’s a separate section in the docs about filtering on annotations. It’s the same stuff as conditional aggregation but more like my example above. Either which way, this is a lot healthier than the gnarly subqueries I was doing before.


回答 1

刚刚发现Django 1.8具有新的条件表达式功能,因此现在我们可以这样做:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0, output_field=models.IntegerField()
    )))

Just discovered that Django 1.8 has new conditional expressions feature, so now we can do like this:

events = Event.objects.all().annotate(paid_participants=models.Sum(
    models.Case(
        models.When(participant__is_paid=True, then=1),
        default=0, output_field=models.IntegerField()
    )))

回答 2

更新

Django 1.11现在通过subquery-expressions支持了我提到的子查询方法。

Event.objects.annotate(
    num_paid_participants=Subquery(
        Participant.objects.filter(
            is_paid=True,
            event=OuterRef('pk')
        ).values('event')
        .annotate(cnt=Count('pk'))
        .values('cnt'),
        output_field=models.IntegerField()
    )
)

我更喜欢这种方法而不是聚合(sum + case),因为它应该更快,更容易被优化(使用适当的索引)

对于较旧的版本,可以使用 .extra

Event.objects.extra(select={'num_paid_participants': "\
    SELECT COUNT(*) \
    FROM `myapp_participant` \
    WHERE `myapp_participant`.`is_paid` = 1 AND \
            `myapp_participant`.`event_id` = `myapp_event`.`id`"
})

UPDATE

The sub-query approach which I mention is now supported in Django 1.11 via subquery-expressions.

Event.objects.annotate(
    num_paid_participants=Subquery(
        Participant.objects.filter(
            is_paid=True,
            event=OuterRef('pk')
        ).values('event')
        .annotate(cnt=Count('pk'))
        .values('cnt'),
        output_field=models.IntegerField()
    )
)

I prefer this over aggregation (sum+case), because it should be faster and easier to be optimized (with proper indexing).

For older version, the same can be achieved using .extra

Event.objects.extra(select={'num_paid_participants': "\
    SELECT COUNT(*) \
    FROM `myapp_participant` \
    WHERE `myapp_participant`.`is_paid` = 1 AND \
            `myapp_participant`.`event_id` = `myapp_event`.`id`"
})

回答 3

我建议改用.values您的Participantqueryset 方法。

简而言之,您想要做的是:

Participant.objects\
    .filter(is_paid=True)\
    .values('event')\
    .distinct()\
    .annotate(models.Count('id'))

完整的示例如下:

  1. 创建2 Event秒:

    event1 = Event.objects.create(title='event1')
    event2 = Event.objects.create(title='event2')
    
  2. Participants 添加到他们:

    part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
              for _ in range(10)]
    part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
              for _ in range(50)]
    
  3. 将所有Participants按其event字段分组:

    Participant.objects.values('event')
    > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
    

    这里需要与众不同:

    Participant.objects.values('event').distinct()
    > <QuerySet [{'event': 1}, {'event': 2}]>
    

    什么.values.distinct正在做的事情是,他们正在创造的两个水桶Participant用元的分组小号event。请注意,这些存储桶包含Participant

  4. 然后,您可以注释这些存储桶,因为它们包含原始集Participant。在这里,我们要计算的数量Participant,只需通过计算id这些存储区中的元素的s即可(因为它们是Participant):

    Participant.objects\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
    
  5. 最后,您只Participant需要一个is_paidbeing True,您可以只在前一个表达式的前面添加一个过滤器,这将产生上面显示的表达式:

    Participant.objects\
        .filter(is_paid=True)\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
    

唯一的缺点是Event您只能id从上面的方法中获取,因此您必须检索之后的内容。

I would suggest to use the .values method of your Participant queryset instead.

For short, what you want to do is given by:

Participant.objects\
    .filter(is_paid=True)\
    .values('event')\
    .distinct()\
    .annotate(models.Count('id'))

A complete example is as follow:

  1. Create 2 Events:

    event1 = Event.objects.create(title='event1')
    event2 = Event.objects.create(title='event2')
    
  2. Add Participants to them:

    part1l = [Participant.objects.create(event=event1, is_paid=((_%2) == 0))\
              for _ in range(10)]
    part2l = [Participant.objects.create(event=event2, is_paid=((_%2) == 0))\
              for _ in range(50)]
    
  3. Group all Participants by their event field:

    Participant.objects.values('event')
    > <QuerySet [{'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 1}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, {'event': 2}, '...(remaining elements truncated)...']>
    

    Here distinct is needed:

    Participant.objects.values('event').distinct()
    > <QuerySet [{'event': 1}, {'event': 2}]>
    

    What .values and .distinct are doing here is that they are creating two buckets of Participants grouped by their element event. Note that those buckets contain Participant.

  4. You can then annotate those buckets as they contain the set of original Participant. Here we want to count the number of Participant, this is simply done by counting the ids of the elements in those buckets (since those are Participant):

    Participant.objects\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 10}, {'event': 2, 'id__count': 50}]>
    
  5. Finally you want only Participant with a is_paid being True, you may just add a filter in front of the previous expression, and this yield the expression shown above:

    Participant.objects\
        .filter(is_paid=True)\
        .values('event')\
        .distinct()\
        .annotate(models.Count('id'))
    > <QuerySet [{'event': 1, 'id__count': 5}, {'event': 2, 'id__count': 25}]>
    

The only drawback is that you have to retrieve the Event afterwards as you only have the id from the method above.


回答 4

我正在寻找什么结果:

  • 将任务添加到报告中的人员(受让人)。-唯一身份人员总数
  • 将任务添加到报告中但仅针对计费性大于0的任务的人员。

通常,我将不得不使用两个不同的查询:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

但我想在一个查询中两者。因此:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

结果:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>

What result I am looking for:

  • People (assignee) who have tasks added to a report. – Total Unique count of People
  • People who have tasks added to a report but, for task whose billability is more than 0 only.

In general, I would have to use two different queries:

Task.objects.filter(billable_efforts__gt=0)
Task.objects.all()

But I want both in one query. Hence:

Task.objects.values('report__title').annotate(withMoreThanZero=Count('assignee', distinct=True, filter=Q(billable_efforts__gt=0))).annotate(totalUniqueAssignee=Count('assignee', distinct=True))

Result:

<QuerySet [{'report__title': 'TestReport', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}, {'report__title': 'Utilization_Report_April_2019', 'withMoreThanZero': 37, 'totalUniqueAssignee': 50}]>

Matplotlib:在其他图形元素后面绘制网格线

问题:Matplotlib:在其他图形元素后面绘制网格线

在Matplotlib中,我按如下所示制作虚线网格:

fig = pylab.figure()    
ax = fig.add_subplot(1,1,1)
ax.yaxis.grid(color='gray', linestyle='dashed')

但是,我不知道如何(甚至可能)在其他图形元素(如条形图)后面绘制网格线。更改添加网格的顺序与添加其他元素的顺序没有区别。

是否有可能使网格线出现在其他所有内容的后面?

In Matplotlib, I make dashed grid lines as follows:

fig = pylab.figure()    
ax = fig.add_subplot(1,1,1)
ax.yaxis.grid(color='gray', linestyle='dashed')

however, I can’t find out how (or even if it is possible) to make the grid lines be drawn behind other graph elements, such as bars. Changing the order of adding the grid versus adding other elements makes no difference.

Is it possible to make it so that the grid lines appear behind everything else?


回答 0

据此-http://matplotlib.1069221.n5.nabble.com/axis-elements-and-zorder-td5346.html-您可以使用Axis.set_axisbelow(True)

(我目前是第一次安装matplotlib,所以不知道这是否正确-我只是通过谷歌搜索“ matplotlib z顺序网格”找到它的-通常,“ z顺序”用于描述这种情况(z为轴“页面外”))

According to this – http://matplotlib.1069221.n5.nabble.com/axis-elements-and-zorder-td5346.html – you can use Axis.set_axisbelow(True)

(I am currently installing matplotlib for the first time, so have no idea if that’s correct – I just found it by googling “matplotlib z order grid” – “z order” is typically used to describe this kind of thing (z being the axis “out of the page”))


回答 1

对我来说,目前尚不清楚如何应用安德鲁·库克的答案,因此这是基于此的完整解决方案:

ax.set_axisbelow(True)
ax.yaxis.grid(color='gray', linestyle='dashed')

To me, it was unclear how to apply andrew cooke’s answer, so this is a complete solution based on that:

ax.set_axisbelow(True)
ax.yaxis.grid(color='gray', linestyle='dashed')

回答 2

如果要验证所有数字的设置,可以设置

plt.rc('axes', axisbelow=True)

要么

plt.rcParams['axes.axisbelow'] = True

它适用于Matplotlib> = 2.0。

If you want to validate the setting for all figures, you may set

plt.rc('axes', axisbelow=True)

or

plt.rcParams['axes.axisbelow'] = True

It works for Matplotlib>=2.0.


回答 3

我有同样的问题,以下工作:

[line.set_zorder(3) for line in ax.lines]
fig.show() # to update

提高3到一个更高的值,如果它不能正常工作。

I had the same problem and the following worked:

[line.set_zorder(3) for line in ax.lines]
fig.show() # to update

Increase 3to a higher value if it does not work.


如何使用Pandas创建随机整数的DataFrame?

问题:如何使用Pandas创建随机整数的DataFrame?

我知道如果我使用randn

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

给了我我想要的东西,但是带有正态分布的元素。但是,如果我只想要随机整数怎么办?

randint通过提供范围来工作,但不能像提供数组那样randn工作。那么我该如何使用某个范围之间的随机整数呢?

I know that if I use randn,

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

gives me what I am looking for, but with elements from a normal distribution. But what if I just wanted random integers?

randint works by providing a range, but not an array like randn does. So how do I do this with random integers between some range?


回答 0

numpy.random.randint接受第三个参数(size),您可以在其中指定输出数组的大小。您可以使用它来创建DataFrame

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

此处- np.random.randint(0,100,size=(100, 4))创建一个大小为的输出数组,(100,4)其中的随机整数元素在之间[0,100)


演示-

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

生成:

     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
6   58  23  23  60
..  ..  ..  ..  ..

numpy.random.randint accepts a third argument (size) , in which you can specify the size of the output array. You can use this to create your DataFrame

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Here – np.random.randint(0,100,size=(100, 4)) – creates an output array of size (100,4) with random integer elements between [0,100) .


Demo –

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

which produces:

     A   B   C   D
0   45  88  44  92
1   62  34   2  86
2   85  65  11  31
3   74  43  42  56
4   90  38  34  93
5    0  94  45  10
6   58  23  23  60
..  ..  ..  ..  ..

回答 1

如今,建议使用NumPy创建随机整数的方法是使用numpy.random.Generator.integers。(文件

import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
      A    B    C    D
 0   58   96   82   24
 1   21    3   35   36
 2   67   79   22   78
 3   81   65   77   94
 4   73    6   70   96
... ...  ...  ...  ...
95   76   32   28   51
96   33   68   54   77
97   76   43   57   43
98   34   64   12   57
99   81   77   32   50
100 rows × 4 columns

The recommended way to create random integers with NumPy these days is to use numpy.random.Generator.integers. (documentation)

import numpy as np
import pandas as pd

rng = np.random.default_rng()
df = pd.DataFrame(rng.integers(0, 100, size=(100, 4)), columns=list('ABCD'))
df
----------------------
      A    B    C    D
 0   58   96   82   24
 1   21    3   35   36
 2   67   79   22   78
 3   81   65   77   94
 4   73    6   70   96
... ...  ...  ...  ...
95   76   32   28   51
96   33   68   54   77
97   76   43   57   43
98   34   64   12   57
99   81   77   32   50
100 rows × 4 columns

语法错误:不是机会

问题:语法错误:不是机会

我尝试在python IDLE中执行以下代码

from __future__ import braces 

我得到了以下错误:

SyntaxError: not a chance

上述错误是什么意思?

I tried executed the following code in the python IDLE

from __future__ import braces 

And I got the following error:

SyntaxError: not a chance

What does the above error mean?


回答 0

您已经在Python中找到了一个复活节彩蛋。开个玩笑。

这意味着永远不会实现用大括号而不是缩进来分隔块。

通常,从特殊__future__模块导入会启用向后不兼容的print()功能,例如功能或真正的划分。

因此,线from __future__ import braces被认为是指你要启用该功能“用括号来创建块”,异常告诉您那的机会不断发生的零。

您可以添加到包括在Python中-笑话的一大串,就像import __hello__import thisimport antigravity。Python开发人员具有良好的幽默感!

You have found an easter egg in Python. It is a joke.

It means that delimiting blocks by braces instead of indentation will never be implemented.

Normally, imports from the special __future__ module enable features that are backwards-incompatible, such as the print() function, or true division.

So the line from __future__ import braces is taken to mean you want to enable the ‘create blocks with braces’ feature, and the exception tells you your chances of that ever happening are nil.

You can add that to the long list of in-jokes included in Python, just like import __hello__, import this and import antigravity. The Python developers have a well-developed sense of humour!


回答 1

__future__模块通常用于提供Python未来版本的功能。

这是一个复活节彩蛋,总结了开发人员在此问题上的感受。

还有更多:

import this 将显示Python的禅宗。

import __hello__将显示Hello World...

在Python 2.7和3.0中,import antigravity将打开浏览器以显示漫画!

The __future__ module is normally used to provide features from future versions of Python.

This is an easter egg that summarizes its developers’ feelings on this issue.

There are several more:

import this will display the zen of Python.

import __hello__ will display Hello World....

In Python 2.7 and 3.0, import antigravity will open the browser to a comic!


将if-elif-else语句放在一行上吗?

问题:将if-elif-else语句放在一行上吗?

我已阅读以下链接,但未解决我的问题。
Python是否具有三元条件运算符?(问题是将if-else语句压缩到一行)

有没有更简单的方式编写if-elif-else语句,使其适合一行?
例如,

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

或一个真实的例子:

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

我只是觉得,如果上面的示例可以用以下方式编写,则看起来会更加简洁。

x=2 if i>100 elif i<100 1 else 0 [WRONG]

I have read the links below, but it doesn’t address my question.
Does Python have a ternary conditional operator? (the question is about condensing if-else statement to one line)

Is there an easier way of writing an if-elif-else statement so it fits on one line?
For example,

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

Or a real-world example:

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

I just feel if the example above could be written the following way, it could look like more concise.

x=2 if i>100 elif i<100 1 else 0 [WRONG]

回答 0

不,这是不可能的(至少不能使用任意语句),也不是可取的。将所有内容都放在一行中很可能会违反PEP-8,在这种情况下,行的长度不得超过80个字符。

这也与Python的Zen背道而驰:“可读性很重要”。(import this在Python提示符下键入以读取整个内容)。

可以在Python中使用三元表达式,但只能用于表达式,不能用于语句:

>>> a = "Hello" if foo() else "Goodbye"

编辑:

现在,您修改后的问题表明,除了要分配的值之外,这三个语句是相同的。在这种情况下,链式三元运算符确实可以工作,但是我仍然认为它的可读性较差:

>>> i=100
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
0
>>> i=101
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
2
>>> i=99
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
1

No, it’s not possible (at least not with arbitrary statements), nor is it desirable. Fitting everything on one line would most likely violate PEP-8 where it is mandated that lines should not exceed 80 characters in length.

It’s also against the Zen of Python: “Readability counts”. (Type import this at the Python prompt to read the whole thing).

You can use a ternary expression in Python, but only for expressions, not for statements:

>>> a = "Hello" if foo() else "Goodbye"

Edit:

Your revised question now shows that the three statements are identical except for the value being assigned. In that case, a chained ternary operator does work, but I still think that it’s less readable:

>>> i=100
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
0
>>> i=101
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
2
>>> i=99
>>> a = 1 if i<100 else 2 if i>100 else 0
>>> a
1

回答 1

如果您仅在不同情况下需要不同的表达式,那么这可能对您有用:

expr1 if condition1 else expr2 if condition2 else expr

例如:

a = "neg" if b<0 else "pos" if b>0 else "zero"

If you only need different expressions for different cases then this may work for you:

expr1 if condition1 else expr2 if condition2 else expr

For example:

a = "neg" if b<0 else "pos" if b>0 else "zero"

回答 2

只需在else语句中嵌套另一个if子句。但这并没有使它看起来更漂亮。

>>> x=5
>>> x if x>0 else ("zero" if x==0 else "invalid value")
5
>>> x = 0
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'zero'
>>> x = -1
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'invalid value'

Just nest another if clause in the else statement. But that doesn’t make it look any prettier.

>>> x=5
>>> x if x>0 else ("zero" if x==0 else "invalid value")
5
>>> x = 0
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'zero'
>>> x = -1
>>> x if x>0 else ("zero" if x==0 else "invalid value")
'invalid value'

回答 3

尽管有其他一些答案:是的,这是可能的

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

转换为以下一种衬纸:

statement1 if expression1 else (statement2 if expression2 else statement3)

实际上,您可以将它们嵌套到无限远。请享用 ;)

Despite some other answers: YES it IS possible:

if expression1:
   statement1
elif expression2:
   statement2
else:
   statement3

translates to the following one liner:

statement1 if expression1 else (statement2 if expression2 else statement3)

in fact you can nest those till infinity. Enjoy ;)


回答 4

您可以选择实际使用a的get方法dict

x = {i<100: -1, -10<=i<=10: 0, i>100: 1}.get(True, 2)

get如果其中一个键可以保证计算为,则不需要该方法True

x = {i<0: -1, i==0: 0, i>0: 1}[True]

理想情况下,最多不应将其中一个键评估为True。如果一个以上的键计算为True,则结果似乎不可预测。

You can optionally actually use the get method of a dict:

x = {i<100: -1, -10<=i<=10: 0, i>100: 1}.get(True, 2)

You don’t need the get method if one of the keys is guaranteed to evaluate to True:

x = {i<0: -1, i==0: 0, i>0: 1}[True]

At most one of the keys should ideally evaluate to True. If more than one key evaluates to True, the results could seem unpredictable.


回答 5

在我看来,还有一种方法是很难理解的,但无论如何我还是会出于好奇而分享:

x = (i>100 and 2) or (i<100 and 1) or 0

此处提供更多信息:https : //docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not

There’s an alternative that’s quite unreadable in my opinion but I’ll share anyway just as a curiosity:

x = (i>100 and 2) or (i<100 and 1) or 0

More info here: https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not


回答 6

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

如果要在一行中使用上述代码,则可以使用以下代码:

x = 2 if i > 100 else 1 if i < 100 else 0

这样做时,如果i> 100,x将被分配为2;如果i <100,则x将被分配;如果i = 100,则x将被分配为0。

if i > 100:
    x = 2
elif i < 100:
    x = 1
else:
    x = 0

If you want to use the above-mentioned code in one line, you can use the following:

x = 2 if i > 100 else 1 if i < 100 else 0

On doing so, x will be assigned 2 if i > 100, 1 if i < 100 and 0 if i = 100


回答 7

这也取决于您的表情的性质。关于“不做”的其他答案的一般建议对于通用语句和通用表达式非常有效。

但是,如果您只需要一个“ dispatch”表(例如,根据给定选项的值调用一个不同的函数),则可以将这些函数放在字典中进行调用。

就像是:

def save(): 
   ...
def edit():
   ...
options = {"save": save, "edit": edit, "remove": lambda : "Not Implemented"}

option = get_input()
result = options[option]()

代替if-else:

if option=="save":
    save()
...

It also depends on the nature of your expressions. The general advice on the other answers of “not doing it” is quite valid for generic statements and generic expressions.

But if all you need is a “dispatch” table, like, calling a different function depending on the value of a given option, you can put the functions to call inside a dictionary.

Something like:

def save(): 
   ...
def edit():
   ...
options = {"save": save, "edit": edit, "remove": lambda : "Not Implemented"}

option = get_input()
result = options[option]()

Instead of an if-else:

if option=="save":
    save()
...

回答 8

人们已经提到了三元表达式。有时,以简单的条件分配为例,可以使用数学表达式执行条件分配。这可能不会使您的代码具有很高的可读性,但是确实可以将它放在很短的一行上。您的示例可以这样写:

x = 2*(i>100) | 1*(i<100)

比较将为True或False,然后与数字相乘将为1或0。可以使用+而不是| 在中间。

People have already mentioned ternary expressions. Sometimes with a simple conditional assignment as your example, it is possible to use a mathematical expression to perform the conditional assignment. This may not make your code very readable, but it does get it on one fairly short line. Your example could be written like this:

x = 2*(i>100) | 1*(i<100)

The comparisons would be True or False, and when multiplying with numbers would then be either 1 or 0. One could use a + instead of an | in the middle.


回答 9

三元运算符是一个简洁的表达的最好方式。语法为variable = value_1 if condition else value_2。因此,对于您的示例,您必须两次应用三元运算符:

i = 23 # set any value for i
x = 2 if i > 100 else 1 if i < 100 else 0

The ternary operator is the best way to a concise expression. The syntax is variable = value_1 if condition else value_2. So, for your example, you must apply the ternary operator twice:

i = 23 # set any value for i
x = 2 if i > 100 else 1 if i < 100 else 0

回答 10

您可以使用嵌套三元if语句。

# if-else ternary construct
country_code = 'USA'
is_USA = True if country_code == 'USA' else False
print('is_USA:', is_USA)

# if-elif-else ternary construct
# Create function to avoid repeating code.
def get_age_category_name(age):
    age_category_name = 'Young' if age <= 40 else ('Middle Aged' if age > 40 and age <= 65 else 'Senior')
    return age_category_name

print(get_age_category_name(25))
print(get_age_category_name(50))
print(get_age_category_name(75))

You can use nested ternary if statements.

# if-else ternary construct
country_code = 'USA'
is_USA = True if country_code == 'USA' else False
print('is_USA:', is_USA)

# if-elif-else ternary construct
# Create function to avoid repeating code.
def get_age_category_name(age):
    age_category_name = 'Young' if age <= 40 else ('Middle Aged' if age > 40 and age <= 65 else 'Senior')
    return age_category_name

print(get_age_category_name(25))
print(get_age_category_name(50))
print(get_age_category_name(75))

python请求文件上传

问题:python请求文件上传

我正在执行一个使用Python请求库上传文件的简单任务。我搜索了Stack Overflow,似乎没有人遇到相同的问题,即服务器未收到该文件:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

我用文件名填充了’upload_file’关键字的值,因为如果我将其保留为空白,则表示

Error - You must select a file to upload!

现在我明白了

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

仅当文件为空时才会出现。因此,我对如何成功发送文件感到困惑。我知道该文件有效,因为如果我访问此网站并手动填写表格,它将返回一个很好的匹配对象列表,这就是我想要的。我非常感谢所有提示。

其他一些相关的线程(但不能回答我的问题):

I’m performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

I’m filling the value of ‘upload_file’ keyword with my filename, because if I leave it blank, it says

Error - You must select a file to upload!

And now I get

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

Which comes up only if the file is empty. So I’m stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I’m after. I’d really appreciate all hints.

Some other threads related (but not answering my problem):


回答 0

如果upload_file要作为文件,请使用:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

并且requests将派遣一个多部分表单POST体与upload_file字段设置为内容file.txt的文件。

文件名将包含在特定字段的mime标头中:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

注意filename="file.txt"参数。

files如果需要更多控制,则可以使用元组作为映射值,其中包含2到4个元素。第一个元素是文件名,其后是内容,以及可选的content-type标头值和可选的附加标头映射:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

这将设置备用文件名和内容类型,而忽略可选的标题。

如果您要从文件中提取整个POST正文(未指定其他字段),则不要使用files参数,只需将文件直接发布为即可data。然后,您可能还需要设置Content-Type标头,否则将不会设置任何标头。请参阅Python请求-文件中的POST数据

If upload_file is meant to be the file, use:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.

The filename will be included in the mime header for the specific field:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

Note the filename="file.txt" parameter.

You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

This sets an alternative filename and content type, leaving out the optional headers.

If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don’t use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests – POST data from a file.


回答 1

(2018)新的python请求库简化了此过程,我们可以使用’files’变量表示我们要上传经过多部分编码的文件

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

(2018) the new python requests library has simplified this process, we can use the ‘files’ variable to signal that we want to upload a multipart-encoded file

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

回答 2

客户上传

如果要使用Python requests库上传单个文件,则请求lib 支持流上传,这使您无需读取内存即可发送大文件或流。

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

服务器端

然后将文件存储在server.py侧面,这样就可以将流保存到文件中而不加载到内存中。以下是使用Flask文件上传的示例。

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

或使用修复程序中提到的werkzeug表单数据解析来解决“ 大文件上传占用内存 ”的问题,以避免在大文件上传时(约60秒内无效使用 st 22 GiB文件。) 13 MiB。)。

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

Client Upload

If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

Server Side

Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of “large file uploads eating up memory” in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

回答 3

在Ubuntu中,您可以采用这种方式,

将文件保存在某个位置(临时),然后打开并发送给API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

In Ubuntu you can apply this way,

to save file at some location (temporary) and then open and send it to API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

如何将字典列表合并为一个字典?

问题:如何将字典列表合并为一个字典?

我该如何列出这样的词典

[{'a':1}, {'b':2}, {'c':1}, {'d':2}]

变成这样的单个字典

{'a':1, 'b':2, 'c':1, 'd':2}

How can I turn a list of dicts like this..

[{'a':1}, {'b':2}, {'c':1}, {'d':2}]

…into a single dict like this:

{'a':1, 'b':2, 'c':1, 'd':2}

回答 0

这适用于任何长度的字典:

>>> result = {}
>>> for d in L:
...    result.update(d)
... 
>>> result
{'a':1,'c':1,'b':2,'d':2}

作为一个理解

# Python >= 2.7
{k: v for d in L for k, v in d.items()}

# Python < 2.7
dict(pair for d in L for pair in d.items())

This works for dictionaries of any length:

>>> result = {}
>>> for d in L:
...    result.update(d)
... 
>>> result
{'a':1,'c':1,'b':2,'d':2}

As a comprehension:

# Python >= 2.7
{k: v for d in L for k, v in d.items()}

# Python < 2.7
dict(pair for d in L for pair in d.items())

回答 1

对于Python 3.3+,有一个ChainMap集合

>>> from collections import ChainMap
>>> a = [{'a':1},{'b':2},{'c':1},{'d':2}]
>>> dict(ChainMap(*a))
{'b': 2, 'c': 1, 'a': 1, 'd': 2}

另请参阅:

In case of Python 3.3+, there is a ChainMap collection:

>>> from collections import ChainMap
>>> a = [{'a':1},{'b':2},{'c':1},{'d':2}]
>>> dict(ChainMap(*a))
{'b': 2, 'c': 1, 'a': 1, 'd': 2}

Also see:


回答 2

>>> L=[{'a': 1}, {'b': 2}, {'c': 1}, {'d': 2}]    
>>> dict(i.items()[0] for i in L)
{'a': 1, 'c': 1, 'b': 2, 'd': 2}

注意:“ b”和“ c”的顺序与您的输出不匹配,因为字典是无序的

如果字典可以具有多个键/值

>>> dict(j for i in L for j in i.items())
>>> L=[{'a': 1}, {'b': 2}, {'c': 1}, {'d': 2}]    
>>> dict(i.items()[0] for i in L)
{'a': 1, 'c': 1, 'b': 2, 'd': 2}

Note: the order of ‘b’ and ‘c’ doesn’t match your output because dicts are unordered

if the dicts can have more than one key/value

>>> dict(j for i in L for j in i.items())

回答 3

对于平面词典,您可以执行以下操作:

from functools import reduce
reduce(lambda a, b: dict(a, **b), list_of_dicts)

For flat dictionaries you can do this:

from functools import reduce
reduce(lambda a, b: dict(a, **b), list_of_dicts)

回答 4

这类似于@delnan,但提供了修改k / v(键/值)项的选项,并且我认为它更具可读性:

new_dict = {k:v for list_item in list_of_dicts for (k,v) in list_item.items()}

例如,替换k / v元素,如下所示:

new_dict = {str(k).replace(" ","_"):v for list_item in list_of_dicts for (k,v) in list_item.items()}

将dict对象从列表中拉出后,从字典.items()生成器中解压缩k,v元组

This is similar to @delnan but offers the option to modify the k/v (key/value) items and I believe is more readable:

new_dict = {k:v for list_item in list_of_dicts for (k,v) in list_item.items()}

for instance, replace k/v elems as follows:

new_dict = {str(k).replace(" ","_"):v for list_item in list_of_dicts for (k,v) in list_item.items()}

unpacks the k,v tuple from the dictionary .items() generator after pulling the dict object out of the list


回答 5

dict1.update( dict2 )

这是不对称的,因为您需要选择对重复的密钥进行处理。在这种情况下,dict2将覆盖dict1。换另一种方式。

编辑:啊,对不起,没有看到。

可以在单个表达式中执行此操作:

>>> from itertools import chain
>>> dict( chain( *map( dict.items, theDicts ) ) )
{'a': 1, 'c': 1, 'b': 2, 'd': 2}

最后一点都不归功于我!

但是,我认为通过一个简单的for循环执行此操作可能更像Pythonic(显式>隐式,flat> nested)。YMMV。

dict1.update( dict2 )

This is asymmetrical because you need to choose what to do with duplicate keys; in this case, dict2 will overwrite dict1. Exchange them for the other way.

EDIT: Ah, sorry, didn’t see that.

It is possible to do this in a single expression:

>>> from itertools import chain
>>> dict( chain( *map( dict.items, theDicts ) ) )
{'a': 1, 'c': 1, 'b': 2, 'd': 2}

No credit to me for this last!

However, I’d argue that it might be more Pythonic (explicit > implicit, flat > nested ) to do this with a simple for loop. YMMV.


回答 6

您可以使用来自funcy库的join函数:

from funcy import join
join(list_of_dicts)

You can use join function from funcy library:

from funcy import join
join(list_of_dicts)

回答 7

PEP 448中压缩字典后,@ dietbuddha答案的改进很少,对我来说,这种方式更易读,而且速度更快

from functools import reduce
result_dict = reduce(lambda a, b: {**a, **b}, list_of_dicts)

但是请记住,这仅适用于Python 3.5+版本。

Little improvement for @dietbuddha answer with dictionary unpacking from PEP 448, for me, it`s more readable this way, also, it is faster as well:

from functools import reduce
result_dict = reduce(lambda a, b: {**a, **b}, list_of_dicts)

But keep in mind, this works only with Python 3.5+ versions.


回答 8

>>> dictlist = [{'a':1},{'b':2},{'c':1},{'d':2, 'e':3}]
>>> dict(kv for d in dictlist for kv in d.iteritems())
{'a': 1, 'c': 1, 'b': 2, 'e': 3, 'd': 2}
>>>

注意,我在最后典中添加了第二个键/值对,以显示它可用于多个条目。此外,列表中较晚位置的字典中的键将覆盖较早版本中的字典相同的键。

>>> dictlist = [{'a':1},{'b':2},{'c':1},{'d':2, 'e':3}]
>>> dict(kv for d in dictlist for kv in d.iteritems())
{'a': 1, 'c': 1, 'b': 2, 'e': 3, 'd': 2}
>>>

Note I added a second key/value pair to the last dictionary to show it works with multiple entries. Also keys from dicts later in the list will overwrite the same key from an earlier dict.


回答 9

dic1 = {‘Maria’:12,’Paco’:22,’Jose’:23} dic2 = {‘Patricia’:25,’Marcos’:22’Tomas’:36}

dic2 = dict(dic1.items()+ dic2.items())

这将是结果:

dic2 {‘Jose’:23,’Marcos’:22,’Patricia’:25,’Tomas’:36,’Paco’:22,’Maria’:12}

dic1 = {‘Maria’:12, ‘Paco’:22, ‘Jose’:23} dic2 = {‘Patricia’:25, ‘Marcos’:22 ‘Tomas’:36}

dic2 = dict(dic1.items() + dic2.items())

and this will be the outcome:

dic2 {‘Jose’: 23, ‘Marcos’: 22, ‘Patricia’: 25, ‘Tomas’: 36, ‘Paco’: 22, ‘Maria’: 12}


在django-rest-framework中禁用ViewSet中的方法

问题:在django-rest-framework中禁用ViewSet中的方法

ViewSets 具有自动列出,检索,创建,更新,删除,…的方法

我想禁用其中一些,我想出的解决方案可能不是一个好方法,因为OPTIONS仍然指出了允许的范围。

关于如何正确执行此操作的任何想法吗?

class SampleViewSet(viewsets.ModelViewSet):
    queryset = api_models.Sample.objects.all()
    serializer_class = api_serializers.SampleSerializer

    def list(self, request):
        return Response(status=status.HTTP_405_METHOD_NOT_ALLOWED)
    def create(self, request):
        return Response(status=status.HTTP_405_METHOD_NOT_ALLOWED)

ViewSets have automatic methods to list, retrieve, create, update, delete, …

I would like to disable some of those, and the solution I came up with is probably not a good one, since OPTIONS still states those as allowed.

Any idea on how to do this the right way?

class SampleViewSet(viewsets.ModelViewSet):
    queryset = api_models.Sample.objects.all()
    serializer_class = api_serializers.SampleSerializer

    def list(self, request):
        return Response(status=status.HTTP_405_METHOD_NOT_ALLOWED)
    def create(self, request):
        return Response(status=status.HTTP_405_METHOD_NOT_ALLOWED)

回答 0

的定义ModelViewSet是:

class ModelViewSet(mixins.CreateModelMixin, 
                   mixins.RetrieveModelMixin, 
                   mixins.UpdateModelMixin,
                   mixins.DestroyModelMixin,
                   mixins.ListModelMixin,
                   GenericViewSet)

因此,除了扩展之外ModelViewSet,为什么不随便使用您需要的东西呢?因此,例如:

from rest_framework import viewsets, mixins

class SampleViewSet(mixins.RetrieveModelMixin,
                    mixins.UpdateModelMixin,
                    mixins.DestroyModelMixin,
                    viewsets.GenericViewSet):
    ...

使用这种方法,路由器应该只为所包含的方法生成路由。

参考

模型视图集

The definition of ModelViewSet is:

class ModelViewSet(mixins.CreateModelMixin, 
                   mixins.RetrieveModelMixin, 
                   mixins.UpdateModelMixin,
                   mixins.DestroyModelMixin,
                   mixins.ListModelMixin,
                   GenericViewSet)

So rather than extending ModelViewSet, why not just use whatever you need? So for example:

from rest_framework import viewsets, mixins

class SampleViewSet(mixins.RetrieveModelMixin,
                    mixins.UpdateModelMixin,
                    mixins.DestroyModelMixin,
                    viewsets.GenericViewSet):
    ...

With this approach, the router should only generate routes for the included methods.

Reference:

ModelViewSet


回答 1

您可以继续使用viewsets.ModelViewSethttp_method_names在ViewSet上进行定义。

class SampleViewSet(viewsets.ModelViewSet):
    queryset = api_models.Sample.objects.all()
    serializer_class = api_serializers.SampleSerializer
    http_method_names = ['get', 'post', 'head']

一旦你加入http_method_names,你将无法做到putpatch了。

如果您想要put但不想要patch,您可以保留http_method_names = ['get', 'post', 'head', 'put']

在内部,DRF视图从Django CBV扩展。Django CBV具有一个名为http_method_names的属性。因此,您也可以在DRF视图中使用http_method_names。

[Shameless Plug]:如果此答案有用,您将喜欢我在DRF上的系列文章,网址https://www.agiliq.com/blog/2019/04/drf-polls/

You could keep using viewsets.ModelViewSet and define http_method_names on your ViewSet.

Example

class SampleViewSet(viewsets.ModelViewSet):
    queryset = api_models.Sample.objects.all()
    serializer_class = api_serializers.SampleSerializer
    http_method_names = ['get', 'post', 'head']

Once you add http_method_names, you will not be able to do put and patch anymore.

If you want put but don’t want patch, you can keep http_method_names = ['get', 'post', 'head', 'put']

Internally, DRF Views extend from Django CBV. Django CBV has an attribute called http_method_names. So you can use http_method_names with DRF views too.

[Shameless Plug]: If this answer was helpful, you will like my series of posts on DRF at https://www.agiliq.com/blog/2019/04/drf-polls/.


回答 2

尽管这篇文章已经有一段时间了,但我突然发现实际上它们是禁用这些功能的一种方法,您可以直接在views.py中对其进行编辑。

资料来源:https : //www.django-rest-framework.org/api-guide/viewsets/#viewset-actions

from rest_framework import viewsets, status
from rest_framework.response import Response

class NameWhateverYouWantViewSet(viewsets.ModelViewSet):

    def create(self, request):
        response = {'message': 'Create function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def update(self, request, pk=None):
        response = {'message': 'Update function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def partial_update(self, request, pk=None):
        response = {'message': 'Update function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def destroy(self, request, pk=None):
        response = {'message': 'Delete function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

Although it’s been a while for this post, I suddenly found out that actually there is a way to disable those functions, you can edit it in the views.py directly.

Source: https://www.django-rest-framework.org/api-guide/viewsets/#viewset-actions

from rest_framework import viewsets, status
from rest_framework.response import Response

class NameThisClassWhateverYouWantViewSet(viewsets.ModelViewSet):

    def create(self, request):
        response = {'message': 'Create function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def update(self, request, pk=None):
        response = {'message': 'Update function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def partial_update(self, request, pk=None):
        response = {'message': 'Update function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

    def destroy(self, request, pk=None):
        response = {'message': 'Delete function is not offered in this path.'}
        return Response(response, status=status.HTTP_403_FORBIDDEN)

回答 3

如果尝试从DRF视图集中禁用PUT方法,则可以创建一个自定义路由器:

from rest_framework.routers import DefaultRouter

class NoPutRouter(DefaultRouter):
    """
    Router class that disables the PUT method.
    """
    def get_method_map(self, viewset, method_map):

        bound_methods = super().get_method_map(viewset, method_map)

        if 'put' in bound_methods.keys():
            del bound_methods['put']

        return bound_methods

通过在路由器上禁用该方法,您的api模式文档将是正确的。

If you are trying to disable the PUT method from a DRF viewset, you can create a custom router:

from rest_framework.routers import DefaultRouter

class NoPutRouter(DefaultRouter):
    """
    Router class that disables the PUT method.
    """
    def get_method_map(self, viewset, method_map):

        bound_methods = super().get_method_map(viewset, method_map)

        if 'put' in bound_methods.keys():
            del bound_methods['put']

        return bound_methods

By disabling the method at the router, your api schema documentation will be correct.


回答 4

如何在DRF中为ViewSet禁用“删除”方法

class YourViewSet(viewsets.ModelViewSet):
    def _allowed_methods(self):
        return [m for m in super(YourViewSet, self)._allowed_methods() if m not in ['DELETE']]

PS这比显式指定所有必需的方法更可靠,因此很少有机会忘记一些重要的方法OPTIONS,HEAD等

默认情况下,DPS的PPS具有 http_method_names = ['get', 'post', 'put', 'patch', 'delete', 'head', 'options', 'trace']

How to disable “DELETE” method for ViewSet in DRF

class YourViewSet(viewsets.ModelViewSet):
    def _allowed_methods(self):
        return [m for m in super(YourViewSet, self)._allowed_methods() if m not in ['DELETE']]

P.S. This is more reliable than explicitly specifying all the necessary methods, so there is less chance of forgetting some of important methods OPTIONS, HEAD, etc

P.P.S. by default DRF has http_method_names = ['get', 'post', 'put', 'patch', 'delete', 'head', 'options', 'trace']


回答 5

在Django Rest Framework 3.xx中,您可以ModelViewSet通过将字典传递给as_view方法来简单地启用要启用的每个方法。在此字典中,键必须包含请求类型(GET,POST,DELETE等),并且值必须包含相应的方法名称(列表,检索,更新等)。例如,假设您要Sample创建或读取模型,但不希望对其进行修改。因此,这意味着你想listretrievecreate方法,是使(和你希望别人被禁用。)

您需要做的就是添加如下路径urlpatterns

path('sample/', SampleViewSet.as_view({
    'get': 'list',
    'post': 'create'
})),
path('sample/<pk>/', SampleViewSet.as_view({  # for get sample by id.
    'get': 'retrieve'
}))

如您所见,上述路由设置中没有no deleteputrequest,因此例如,如果您将put请求发送到url,它将以405响应您Method Not Allowed

{
    "detail": "Method \"PUT\" not allowed."
}

In Django Rest Framework 3.x.x you can simply enable every each method you want to be enabled for ModelViewSet, by passing a dictionary to as_view method. In this dictionary, the key must contain request type (GET, POST, DELETE, etc) and the value must contain corresponding method name (list, retrieve, update, etc). For example let say you want Sample model to be created or read but you don’t want it to be modified. So it means you want list, retrieve and create method to be enable (and you want others to be disabled.)

All you need to do is to add paths to urlpatterns like these:

path('sample/', SampleViewSet.as_view({
    'get': 'list',
    'post': 'create'
})),
path('sample/<pk>/', SampleViewSet.as_view({  # for get sample by id.
    'get': 'retrieve'
}))

As you can see there’s no delete and put request in above routing settings, so for example if you send a put request to the url, it response you with 405 Method Not Allowed:

{
    "detail": "Method \"PUT\" not allowed."
}

回答 6

如果您打算禁用放置/发布/销毁方法,则可以使用

viewsets.ReadOnlyModelViewSet https://www.django-rest-framework.org/tutorial/6-viewsets-and-routers/#refactoring-to-use-viewsets

If you are planning to disable put/post/destroy methods, you can use

viewsets.ReadOnlyModelViewSet https://www.django-rest-framework.org/tutorial/6-viewsets-and-routers/#refactoring-to-use-viewsets