分类目录归档:知识问答

测试列表是否共享python中的任何项目

问题:测试列表是否共享python中的任何项目

我想检查一个列表中的任何项目是否存在于另一个列表中。我可以使用下面的代码简单地做到这一点,但是我怀疑可能有一个库函数可以做到这一点。如果没有,是否有更多的pythonic方法可以达到相同的结果。

In [78]: a = [1, 2, 3, 4, 5]

In [79]: b = [8, 7, 6]

In [80]: c = [8, 7, 6, 5]

In [81]: def lists_overlap(a, b):
   ....:     for i in a:
   ....:         if i in b:
   ....:             return True
   ....:     return False
   ....: 

In [82]: lists_overlap(a, b)
Out[82]: False

In [83]: lists_overlap(a, c)
Out[83]: True

In [84]: def lists_overlap2(a, b):
   ....:     return len(set(a).intersection(set(b))) > 0
   ....: 

I want to check if any of the items in one list are present in another list. I can do it simply with the code below, but I suspect there might be a library function to do this. If not, is there a more pythonic method of achieving the same result.

In [78]: a = [1, 2, 3, 4, 5]

In [79]: b = [8, 7, 6]

In [80]: c = [8, 7, 6, 5]

In [81]: def lists_overlap(a, b):
   ....:     for i in a:
   ....:         if i in b:
   ....:             return True
   ....:     return False
   ....: 

In [82]: lists_overlap(a, b)
Out[82]: False

In [83]: lists_overlap(a, c)
Out[83]: True

In [84]: def lists_overlap2(a, b):
   ....:     return len(set(a).intersection(set(b))) > 0
   ....: 

回答 0

简短答案:使用not set(a).isdisjoint(b),通常是最快的。

有测试四种常见的方式,如果两个列表ab共享任何项目。第一种选择是将两个都转换为集合并检查它们的交集,如下所示:

bool(set(a) & set(b))

由于集合是使用Python中的哈希表存储的,因此可以搜索它们O(1)(有关Python中运算符复杂性的更多信息,请参见此处)。从理论上讲,这是O(n+m)对平均nm在列表中的对象ab。但是1)它必须首先从列表中创建集合,这可能花费不可忽略的时间量; 2)它假定哈希冲突在您的数据中很少。

第二种方法是使用生成器表达式对列表执行迭代,例如:

any(i in a for i in b)

这允许就地搜索,因此不会为中间变量分配新的内存。它也可以在第一个发现上解决。但是in操作员始终O(n)在列表中(请参阅此处)。

另一个建议的选项是混合访问列表中的一个,转换另一个集合,然后测试该集合的成员资格,如下所示:

a = set(a); any(i in a for i in b)

第四种方法是利用isdisjoint()(冻结)集合的方法(请参阅此处),例如:

not set(a).isdisjoint(b)

如果您搜索的元素在数组的开头附近(例如已排序),则倾向于使用生成器表达式,因为集合交集方法必须为中间变量分配新的内存:

from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974

这是此示例的执行时间与列表大小的关系图:

请注意,两个轴都是对数的。这代表了生成器表达式的最佳情况。可以看出,该isdisjoint()方法对于非常小的列表大小更好,而生成器表达式对于更大的列表大小更好。

另一方面,由于搜索是从混合表达式和生成器表达式的开头开始的,因此,如果共享元素系统地位于数组的末尾(或者两个列表都不共享任何值),则不相交和集合交集方法比生成器表达式和混合方法快得多。

>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668

有趣的是,对于较大的列表大小,生成器表达式要慢得多。这仅适用于1000次重复,而不是前一个数字的100000次。当没有共享任何元素时,此设置也很合适,并且是不相交和设置相交方法的最佳情况。

这是两个使用随机数的分析(而不是操纵设置以偏爱一种或多种技术):

分享的可能性很高:元素是从中随机抽取的[1, 2*len(a)]。分享机会低:元素是从中随机抽取的[1, 1000*len(a)]

到目前为止,该分析假设两个列表的大小相同。如果有两个不同大小的列表,例如a小得多,isdisjoint()总是更快:

确保a列表较小,否则性能会降低。在此实验中,a列表大小设置为常量5

综上所述:

  • 如果列表很小(<10个元素),not set(a).isdisjoint(b)则总是最快的。
  • 如果列表中的元素已排序或具有可以利用的规则结构,则生成器表达式any(i in a for i in b)在大列表大小时最快。
  • not set(a).isdisjoint(b)用来测试设置的交集,它总是比快bool(set(a) & set(b))
  • 混合“遍历列表,按条件测试” a = set(a); any(i in a for i in b)通常比其他方法慢。
  • 当涉及到不共享元素的列表时,生成器表达式和混合函数比其他两种方法要慢得多。

在大多数情况下,使用该isdisjoint()方法是最好的方法,因为生成器表达式的执行时间会更长,因为在没有共享任何元素时效率非常低。

Short answer: use not set(a).isdisjoint(b), it’s generally the fastest.

There are four common ways to test if two lists a and b share any items. The first option is to convert both to sets and check their intersection, as such:

bool(set(a) & set(b))

Because sets are stored using a hash table in Python, searching them is O(1) (see here for more information about complexity of operators in Python). Theoretically, this is O(n+m) on average for n and m objects in lists a and b. But 1) it must first create sets out of the lists, which can take a non-negligible amount of time, and 2) it supposes that hashing collisions are sparse among your data.

The second way to do it is using a generator expression performing iteration on the lists, such as:

any(i in a for i in b)

This allows to search in-place, so no new memory is allocated for intermediary variables. It also bails out on the first find. But the in operator is always O(n) on lists (see here).

Another proposed option is an hybridto iterate through one of the list, convert the other one in a set and test for membership on this set, like so:

a = set(a); any(i in a for i in b)

A fourth approach is to take advantage of the isdisjoint() method of the (frozen)sets (see here), for example:

not set(a).isdisjoint(b)

If the elements you search are near the beginning of an array (e.g. it is sorted), the generator expression is favored, as the sets intersection method have to allocate new memory for the intermediary variables:

from timeit import timeit
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=list(range(1000))", number=100000)
26.077727576019242
>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=list(range(1000))", number=100000)
0.16220548999262974

Here’s a graph of the execution time for this example in function of list size:

Note that both axes are logarithmic. This represents the best case for the generator expression. As can be seen, the isdisjoint() method is better for very small list sizes, whereas the generator expression is better for larger list sizes.

On the other hand, as the search begins with the beginning for the hybrid and generator expression, if the shared element are systematically at the end of the array (or both lists does not share any values), the disjoint and set intersection approaches are then way faster than the generator expression and the hybrid approach.

>>> timeit('any(i in a for i in b)', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
13.739536046981812
>>> timeit('bool(set(a) & set(b))', setup="a=list(range(1000));b=[x+998 for x in range(999,0,-1)]", number=1000))
0.08102107048034668

It is interesting to note that the generator expression is way slower for bigger list sizes. This is only for 1000 repetitions, instead of the 100000 for the previous figure. This setup also approximates well when when no elements are shared, and is the best case for the disjoint and set intersection approaches.

Here are two analysis using random numbers (instead of rigging the setup to favor one technique or another):

High chance of sharing: elements are randomly taken from [1, 2*len(a)]. Low chance of sharing: elements are randomly taken from [1, 1000*len(a)].

Up to now, this analysis supposed both lists are of the same size. In case of two lists of different sizes, for example a is much smaller, isdisjoint() is always faster:

Make sure that the a list is the smaller, otherwise the performance decreases. In this experiment, the a list size was set constant to 5.

In summary:

  • If the lists are very small (< 10 elements), not set(a).isdisjoint(b) is always the fastest.
  • If the elements in the lists are sorted or have a regular structure that you can take advantage of, the generator expression any(i in a for i in b) is the fastest on large list sizes;
  • Test the set intersection with not set(a).isdisjoint(b), which is always faster than bool(set(a) & set(b)).
  • The hybrid “iterate through list, test on set” a = set(a); any(i in a for i in b) is generally slower than other methods.
  • The generator expression and the hybrid are much slower than the two other approaches when it comes to lists without sharing elements.

In most cases, using the isdisjoint() method is the best approach as the generator expression will take much longer to execute, as it is very inefficient when no elements are shared.


回答 1

def lists_overlap3(a, b):
    return bool(set(a) & set(b))

注意:以上假设您想要布尔值作为答案。如果您只需要在if语句中使用表达式,则只需使用if set(a) & set(b):

def lists_overlap3(a, b):
    return bool(set(a) & set(b))

Note: the above assumes that you want a boolean as the answer. If all you need is an expression to use in an if statement, just use if set(a) & set(b):


回答 2

def lists_overlap(a, b):
  sb = set(b)
  return any(el in sb for el in a)

这是渐近最优的(最坏情况O(n + m)),并且由于any的短路,可能比交叉点方法更好。

例如:

lists_overlap([3,4,5], [1,2,3])

到达后将立即返回True 3 in sb

编辑:另一种变化(感谢Dave Kirby):

def lists_overlap(a, b):
  sb = set(b)
  return any(itertools.imap(sb.__contains__, a))

这依赖于imap用C实现的迭代器,而不是生成器理解。它还sb.__contains__用作映射功能。我不知道这会带来多少性能差异。它仍然会短路。

def lists_overlap(a, b):
  sb = set(b)
  return any(el in sb for el in a)

This is asymptotically optimal (worst case O(n + m)), and might be better than the intersection approach due to any‘s short-circuiting.

E.g.:

lists_overlap([3,4,5], [1,2,3])

will return True as soon as it gets to 3 in sb

EDIT: Another variation (with thanks to Dave Kirby):

def lists_overlap(a, b):
  sb = set(b)
  return any(itertools.imap(sb.__contains__, a))

This relies on imap‘s iterator, which is implemented in C, rather than a generator comprehension. It also uses sb.__contains__ as the mapping function. I don’t know how much performance difference this makes. It will still short-circuit.


回答 3

您还可以将其any与列表理解一起使用:

any([item in a for item in b])

You could also use any with list comprehension:

any([item in a for item in b])

回答 4

在python 2.6或更高版本中,您可以执行以下操作:

return not frozenset(a).isdisjoint(frozenset(b))

In python 2.6 or later you can do:

return not frozenset(a).isdisjoint(frozenset(b))

回答 5

您可以使用任何内置函数/ wa generator表达式:

def list_overlap(a,b): 
     return any(i for i in a if i in b)

正如John和Lie所指出的那样,当对于两个列表共享的每个i bool(i)== False时,这都会给出错误的结果。它应该是:

return any(i in b for i in a)

You can use the any built in function /w a generator expression:

def list_overlap(a,b): 
     return any(i for i in a if i in b)

As John and Lie have pointed out this gives incorrect results when for every i shared by the two lists bool(i) == False. It should be:

return any(i in b for i in a)

回答 6

这个问题已经很老了,但是我注意到当人们在参数集合与列表时,没有人想到将它们一起使用。按照Soravux的示例,

清单的最坏情况:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234

列表的最佳情况是:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707

因此,遍历一个列表以查看它是否在集合中比遍历两个列表更快,这是有意义的,因为检查数字是否在集合中需要固定的时间,而通过遍历列表进行检查所花费的时间与长度成正比。名单。

因此,我的结论是遍历一个列表,并检查它是否在set中

This question is pretty old, but I noticed that while people were arguing sets vs. lists, that no one thought of using them together. Following Soravux’s example,

Worst case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
100.91506409645081
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
19.746716022491455
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=[x+9999 for x in range(10000)]", number=100000)
0.092626094818115234

And the best case for lists:

>>> timeit('bool(set(a) & set(b))',  setup="a=list(range(10000)); b=list(range(10000))", number=100000)
154.69790101051331
>>> timeit('any(i in a for i in b)', setup="a=list(range(10000)); b=list(range(10000))", number=100000)
0.082653045654296875
>>> timeit('any(i in a for i in b)', setup="a= set(range(10000)); b=list(range(10000))", number=100000)
0.08434605598449707

So even faster than iterating through two lists is iterating though a list to see if it’s in a set, which makes sense since checking if a number is in a set takes constant time while checking by iterating through a list takes time proportional to the length of the list.

Thus, my conclusion is that iterate through a list, and check if it’s in a set.


回答 7

如果您不在乎重叠的元素是什么,则只需检查len合并列表与合并为一组的列表的即可。如果有重叠的元素,则集合将更短:

len(set(a+b+c))==len(a+b+c) 如果没有重叠,则返回True。

if you don’t care what the overlapping element might be, you can simply check the len of the combined list vs. the lists combined as a set. If there are overlapping elements, the set will be shorter:

len(set(a+b+c))==len(a+b+c) returns True, if there is no overlap.


回答 8

我将以一种功能性的编程风格来介绍另一个:

any(map(lambda x: x in a, b))

说明:

map(lambda x: x in a, b)

返回在其中的元素布尔值的列表b中找到a。然后any将该列表传递给,该列表仅返回True是否有任何元素True

I’ll throw another one in with a functional programming style:

any(map(lambda x: x in a, b))

Explanation:

map(lambda x: x in a, b)

returns a list of booleans where elements of b are found in a. That list is then passed to any, which simply returns True if any elements are True.


如何恢复到Anaconda中的先前软件包?

问题:如何恢复到Anaconda中的先前软件包?

如果我做

conda info pandas

我可以看到所有可用的软件包。

pandas今天上午将其更新为最新版本,但是现在我需要恢复到以前的版本。我试过了

conda update pandas 0.13.1

但这没用。如何指定要使用的版本?

If I do

conda info pandas

I can see all of the packages available.

I updated my pandas to the latest this morning, but I need to revert to a prior version now. I tried

conda update pandas 0.13.1

but that didn’t work. How do I specify which version to use?


回答 0

我不得不改用该install函数:

conda install pandas=0.13.1

I had to use the install function instead:

conda install pandas=0.13.1

回答 1

对于希望还原最近安装的软件包的情况,该软件包对依赖项进行了一些更改(例如tensorflow),可以通过以下方法“回滚”到较早的安装状态:

conda list --revisions
conda install --revision [revision number]

第一个命令显示以前的安装版本(带有依赖项),第二个命令还原到revision number您指定的版本。

请注意,如果您希望(重新)安装更高版本,则可能必须顺序重新安装所有中间版本。如果您的版本为23,重新安装了版本20,并希望返回,则可能必须运行每个版本:

conda install --revision 21
conda install --revision 22
conda install --revision 23

For the case that you wish to revert a recently installed package that made several changes to dependencies (such as tensorflow), you can “roll back” to an earlier installation state via the following method:

conda list --revisions
conda install --revision [revision number]

The first command shows previous installation revisions (with dependencies) and the second reverts to whichever revision number you specify.

Note that if you wish to (re)install a later revision, you may have to sequentially reinstall all intermediate versions. If you had been at revision 23, reinstalled revision 20 and wish to return, you may have to run each:

conda install --revision 21
conda install --revision 22
conda install --revision 23

如何避免在Python中显式的“自我”?

问题:如何避免在Python中显式的“自我”?

我通过遵循一些pygame教程来学习Python 。

在其中我发现了关键字self的广泛使用,并且主要来自Java背景,我发现自己一直忘记输入self。例如,代替self.rect.centerx我输入rect.centerx,因为对我来说,rect已经是该类的成员变量。

Java的并行的我能想到的这种情况是有前缀成员变量的所有引用与

我是否在所有成员变量前面加上self前缀,还是有一种方法可以声明它们,从而避免这样做呢?

即使我的建议不是pythonic,我仍然想知道是否有可能。

我看了这些相关的SO问题,但它们并不能完全回答我的要求:

I have been learning Python by following some pygame tutorials.

Therein I found extensive use of the keyword self, and coming from a primarily Java background, I find that I keep forgetting to type self. For example, instead of self.rect.centerx I would type rect.centerx, because, to me, rect is already a member variable of the class.

The Java parallel I can think of for this situation is having to prefix all references to member variables with this.

Am I stuck prefixing all member variables with self, or is there a way to declare them that would allow me to avoid having to do so?

Even if what I am suggesting isn’t pythonic, I’d still like to know if it is possible.

I have taken a look at these related SO questions, but they don’t quite answer what I am after:


回答 0

Python需要指定self。 结果是,即使没有看到完整的类定义,也永远不会混淆什么是成员,什么不是成员。这会导致有用的属性,例如:您不能添加意外遮蔽非成员并从而破坏代码的成员。

一个极端的例子:您可以编写类而不知道它可能具有哪些基类,并且始终知道您是否正在访问成员:

class A(some_function()):
  def f(self):
    self.member = 42
    self.method()

这就是完整的代码!(some_function返回用作基础的类型。)

另一个是动态组合类的方法的:

class B(object):
  pass

print B()
# <__main__.B object at 0xb7e4082c>

def B_init(self):
  self.answer = 42
def B_str(self):
  return "<The answer is %s.>" % self.answer
# notice these functions require no knowledge of the actual class
# how hard are they to read and realize that "members" are used?

B.__init__ = B_init
B.__str__ = B_str

print B()
# <The answer is 42.>

请记住,这两个例子都是极端的,您不会每天看到它们,我也不建议您经常编写这样的代码,但是它们确实显示了明确要求自我的各个方面。

Python requires specifying self. The result is there’s never any confusion over what’s a member and what’s not, even without the full class definition visible. This leads to useful properties, such as: you can’t add members which accidentally shadow non-members and thereby break code.

One extreme example: you can write a class without any knowledge of what base classes it might have, and always know whether you are accessing a member or not:

class A(some_function()):
  def f(self):
    self.member = 42
    self.method()

That’s the complete code! (some_function returns the type used as a base.)

Another, where the methods of a class are dynamically composed:

class B(object):
  pass

print B()
# <__main__.B object at 0xb7e4082c>

def B_init(self):
  self.answer = 42
def B_str(self):
  return "<The answer is %s.>" % self.answer
# notice these functions require no knowledge of the actual class
# how hard are they to read and realize that "members" are used?

B.__init__ = B_init
B.__str__ = B_str

print B()
# <The answer is 42.>

Remember, both of these examples are extreme and you won’t see them every day, nor am I suggesting you should often write code like this, but they do clearly show aspects of self being explicitly required.


回答 1

先前的答案基本上都是“您不能”或“您不应”的变体。我同意后一种观点,但从技术上来说,这个问题尚未得到解答。

此外,出于合理的原因,有人可能想要按照实际问题的要求去做某事。我有时遇到的一件事是冗长的数学方程式,其中使用长名称会使方程式无法识别。以下是在固定示例中如何执行此操作的几种方法:

import numpy as np
class MyFunkyGaussian() :
    def __init__(self, A, x0, w, s, y0) :
        self.A = float(A)
        self.x0 = x0
        self.w = w
        self.y0 = y0
        self.s = s

    # The correct way, but subjectively less readable to some (like me) 
    def calc1(self, x) :
        return (self.A/(self.w*np.sqrt(np.pi))/(1+self.s*self.w**2/2)
                * np.exp( -(x-self.x0)**2/self.w**2)
                * (1+self.s*(x-self.x0)**2) + self.y0 )

    # The correct way if you really don't want to use 'self' in the calculations
    def calc2(self, x) :
        # Explicity copy variables
        A, x0, w, y0, s = self.A, self.x0, self.w, self.y0, self.s
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

    # Probably a bad idea...
    def calc3(self, x) :
        # Automatically copy every class vairable
        for k in self.__dict__ : exec(k+'= self.'+k)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

g = MyFunkyGaussian(2.0, 1.5, 3.0, 5.0, 0.0)
print(g.calc1(0.5))
print(g.calc2(0.5))
print(g.calc3(0.5))

第三个例子-即使用for k in self.__dict__ : exec(k+'= self.'+k)基本上就是问题的实质所在,但是让我清楚一点,我认为这通常不是一个好主意。

欲了解更多信息,并通过类变量,甚至函数的方式进行迭代,看答案和讨论这个问题。有关动态命名变量的其他方法的讨论以及为什么通常这样做不是一个好主意,请参阅此博客文章。

更新:似乎没有办法在Python3中的函数中动态更新或更改局部变量,因此calc3和类似的变体不再可能。我现在能想到的唯一与python3兼容的解决方案是使用globals

def calc4(self, x) :
        # Automatically copy every class variable in globals
        globals().update(self.__dict__)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

总体而言,这将是可怕的做法。

Previous answers are all basically variants of “you can’t” or “you shouldn’t”. While I agree with the latter sentiment, the question is technically still unanswered.

Furthermore, there are legitimate reasons why someone might want to do something along the lines of what the actual question is asking. One thing I run into sometimes is lengthy math equations where using long names makes the equation unrecognizable. Here are a couple ways of how you could do this in a canned example:

import numpy as np
class MyFunkyGaussian() :
    def __init__(self, A, x0, w, s, y0) :
        self.A = float(A)
        self.x0 = x0
        self.w = w
        self.y0 = y0
        self.s = s

    # The correct way, but subjectively less readable to some (like me) 
    def calc1(self, x) :
        return (self.A/(self.w*np.sqrt(np.pi))/(1+self.s*self.w**2/2)
                * np.exp( -(x-self.x0)**2/self.w**2)
                * (1+self.s*(x-self.x0)**2) + self.y0 )

    # The correct way if you really don't want to use 'self' in the calculations
    def calc2(self, x) :
        # Explicity copy variables
        A, x0, w, y0, s = self.A, self.x0, self.w, self.y0, self.s
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

    # Probably a bad idea...
    def calc3(self, x) :
        # Automatically copy every class vairable
        for k in self.__dict__ : exec(k+'= self.'+k)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

g = MyFunkyGaussian(2.0, 1.5, 3.0, 5.0, 0.0)
print(g.calc1(0.5))
print(g.calc2(0.5))
print(g.calc3(0.5))

The third example – i.e. using for k in self.__dict__ : exec(k+'= self.'+k) is basically what the question is actually asking for, but let me be clear that I don’t think it is generally a good idea.

For more info, and ways to iterate through class variables, or even functions, see answers and discussion to this question. For a discussion of other ways to dynamically name variables, and why this is usually not a good idea see this blog post.

UPDATE: There appears to be no way to dynamically update or change locals in a function in Python3, so calc3 and similar variants are no longer possible. The only python3 compatible solution I can think of now is to use globals:

def calc4(self, x) :
        # Automatically copy every class variable in globals
        globals().update(self.__dict__)
        sqrt, exp, pi = np.sqrt, np.exp, np.pi
        return ( A/( w*sqrt(pi) )/(1+s*w**2/2)
                * exp( -(x-x0)**2/w**2 )
                * (1+s*(x-x0)**2) + y0 )

Which, again, would be a terrible practice in general.


回答 2

实际上self不是关键字,它只是Python中实例方法的第一个参数的常规名称。而且第一个参数不能被跳过,因为它是方法知道该类的哪个实例被调用的唯一机制。

Actually self is not a keyword, it’s just the name conventionally given to the first parameter of instance methods in Python. And that first parameter can’t be skipped, as it’s the only mechanism a method has of knowing which instance of your class it’s being called on.


回答 3

您可以使用任何想要的名称,例如

class test(object):
    def function(this, variable):
        this.variable = variable

甚至

class test(object):
    def function(s, variable):
        s.variable = variable

但您仍然无法使用范围的名称。

我不建议您使用与自己不同的东西,除非您有令人信服的理由,因为这会使有经验的pythonista陌生。

You can use whatever name you want, for example

class test(object):
    def function(this, variable):
        this.variable = variable

or even

class test(object):
    def function(s, variable):
        s.variable = variable

but you are stuck with using a name for the scope.

I do not recommend you use something different to self unless you have a convincing reason, as it would make it alien for experienced pythonistas.


回答 4

是的,您必须始终指定self,因为根据python哲学,显式要比隐式好。

您还将发现使用python进行编程的方式与使用Java进行编程的方式非常不同,因此,self由于您没有在对象内部投影所有内容,因此使用的趋势会减少。相反,您可以更多地使用模块级功能,可以更好地对其进行测试。

顺便说说。我最初讨厌它,现在讨厌相反的东西。缩进驱动的流量控制也是如此。

yes, you must always specify self, because explicit is better than implicit, according to python philosophy.

You will also find out that the way you program in python is very different from the way you program in java, hence the use of self tends to decrease because you don’t project everything inside the object. Rather, you make larger use of module-level function, which can be better tested.

by the way. I hated it at first, now I hate the opposite. same for indented-driven flow control.


回答 5

“自身”是类的当前对象实例的常规占位符。当您要引用类中的对象的属性,字段或方法时,就好像在引用“自身”一样使用它。但是,为了使它简短一些,Python编程领域中的某个人开始使用“ self”,其他领域则使用“ this”,但是它们使它成为无法替换的关键字。我宁愿使用“它”来增加代码的可读性。这是Python的优点之一-您可以自由选择对象实例的占位符,而不是“自身”。自我示例:

class UserAccount():    
    def __init__(self, user_type, username, password):
        self.user_type = user_type
        self.username = username            
        self.password = encrypt(password)        

    def get_password(self):
        return decrypt(self.password)

    def set_password(self, password):
        self.password = encrypt(password)

现在我们用“其”替换“自我”:

class UserAccount():    
    def __init__(its, user_type, username, password):
        its.user_type = user_type
        its.username = username            
        its.password = encrypt(password)        

    def get_password(its):
        return decrypt(its.password)

    def set_password(its, password):
        its.password = encrypt(password)

现在哪个更易读?

The “self” is the conventional placeholder of the current object instance of a class. Its used when you want to refer to the object’s property or field or method inside a class as if you’re referring to “itself”. But to make it shorter someone in the Python programming realm started to use “self” , other realms use “this” but they make it as a keyword which cannot be replaced. I rather used “its” to increase the code readability. Its one of the good things in Python – you have a freedom to choose your own placeholder for the object’s instance other than “self”. Example for self:

class UserAccount():    
    def __init__(self, user_type, username, password):
        self.user_type = user_type
        self.username = username            
        self.password = encrypt(password)        

    def get_password(self):
        return decrypt(self.password)

    def set_password(self, password):
        self.password = encrypt(password)

Now we replace ‘self’ with ‘its’:

class UserAccount():    
    def __init__(its, user_type, username, password):
        its.user_type = user_type
        its.username = username            
        its.password = encrypt(password)        

    def get_password(its):
        return decrypt(its.password)

    def set_password(its, password):
        its.password = encrypt(password)

which is more readable now?


回答 6

self是python语法的一部分,用于访问对象的成员,因此恐怕您会受其束缚

self is part of the python syntax to access members of objects, so I’m afraid you’re stuck with it


回答 7

实际上,您可以使用Armin Ronacher演讲“ 5年的坏主意”中的食谱“自卑自我”(用Google搜索)。

这是一个非常聪明的秘方,几乎所有阿明·罗纳赫(Armin Ronacher)的著作都如此,但我认为这个主意并不吸引人。我想我更愿意在C#/ Java中对此进行明确说明。

更新。链接到“坏主意食谱”:https//speakerdeck.com/mitsuhiko/5-years-of-bad-ideas?slide = 58

Actually you can use recipe “Implicit self” from Armin Ronacher presentation “5 years of bad ideas” ( google it).

It’s a very clever recipe, as almost everything from Armin Ronacher, but I don’t think this idea is very appealing. I think I’d prefer explicit this in C#/Java.

Update. Link to “bad idea recipe”: https://speakerdeck.com/mitsuhiko/5-years-of-bad-ideas?slide=58


回答 8

是的,自我很乏味。但是,更好吗?

class Test:

    def __init__(_):
        _.test = 'test'

    def run(_):
        print _.test

Yeah, self is tedious. But, is it better?

class Test:

    def __init__(_):
        _.test = 'test'

    def run(_):
        print _.test

回答 9

来自:自我地狱-更多有状态的功能。

混合方法效果最好 您所有实际进行计算的类方法都应移到闭包中,并且清理语法的扩展应保留在类中。将闭包塞入类,将类像命名空间一样对待。闭包本质上是静态函数,因此甚至在类中也不需要self *。

From: Self Hell – More stateful functions.

…a hybrid approach works best. All of your class methods that actually do computation should be moved into closures, and extensions to clean up syntax should be kept in classes. Stuff the closures into classes, treating the class much like a namespace. The closures are essentially static functions, and so do not require selfs*, even in the class…


回答 10

我认为,如果有一个“成员”语句和“全局”语句,那将更容易且更具可读性,因此您可以告诉解释器哪些是类的对象成员。

I think that it would be easier and more readable if there was a statement “member” just as there is “global” so you can tell the interpreter which are the objects members of the class.


python中的json.dump()和json.dumps()有什么区别?

问题:python中的json.dump()和json.dumps()有什么区别?

我在官方文档中进行了搜索,以查找python中的json.dump()和json.dumps()之间的区别。显然,它们与文件写入选项有关。
但是,它们之间的详细区别是什么?在什么情况下,一个比另一个具有更多的优势?

I searched in this official document to find difference between the json.dump() and json.dumps() in python. It is clear that they are related with file write option.
But what is the detailed difference between them and in what situations one has more advantage than other?


回答 0

除了文档所说的内容外,没有什么可添加的。如果要将JSON转储到文件/套接字或其他文件中,则应使用dump()。如果只需要它作为字符串(用于打印,解析或其他操作),则使用dumps()(转储字符串)

正如Antii Haapala在此答案中提到的,在ensure_ascii行为上有一些细微的差异。这主要是由于底层write()函数是如何工作的,因为它是对块而不是整个字符串进行操作。检查他的答案以获取更多详细信息。

json.dump()

将obj作为JSON格式的流序列化到fp(支持.write()的类似文件的对象

如果ensure_ascii为False,则写入fp的某些块可能是unicode实例

json.dumps()

将obj序列化为JSON格式的str

如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicode实例

There isn’t much else to add other than what the docs say. If you want to dump the JSON into a file/socket or whatever, then you should go with dump(). If you only need it as a string (for printing, parsing or whatever) then use dumps() (dump string)

As mentioned by Antti Haapala in this answer, there are some minor differences on the ensure_ascii behaviour. This is mostly due to how the underlying write() function works, being that it operates on chunks rather than the whole string. Check his answer for more details on that.

json.dump()

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object

If ensure_ascii is False, some chunks written to fp may be unicode instances

json.dumps()

Serialize obj to a JSON formatted str

If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance


回答 1

与功能s取字符串参数。其他则采用文件流。

The functions with an s take string parameters. The others take file streams.


回答 2

在内存使用和速度上。

调用时,jsonstr = json.dumps(mydata)它首先在内存中创建数据的完整副本,然后才将file.write(jsonstr)其复制到磁盘。因此,这是一种更快的方法,但是如果要保存大量数据,则可能会成为问题。

当调用json.dump(mydata, file)-不带’s’时,不使用新的内存,因为数据是按块转储的。但是整个过程要慢大约2倍。

来源:我检查了json.dump()和的源代码,json.dumps()还测试了两个变量,它们测量了time.time()htop中的时间并观察了它们的内存使用情况。

In memory usage and speed.

When you call jsonstr = json.dumps(mydata) it first creates a full copy of your data in memory and only then you file.write(jsonstr) it to disk. So this is a faster method but can be a problem if you have a big piece of data to save.

When you call json.dump(mydata, file) — without ‘s’, new memory is not used, as the data is dumped by chunks. But the whole process is about 2 times slower.

Source: I checked the source code of json.dump() and json.dumps() and also tested both the variants measuring the time with time.time() and watching the memory usage in htop.


回答 3

Python 2的一个显着差异是,如果您使用ensure_ascii=False,则dump可以将UTF-8编码的数据正确写入文件中(除非您使用的扩展名不是UTF-8的8位字符串):

dumps另一方面,with ensure_ascii=False可以产生a strunicode仅取决于您用于字符串的类型:

使用此转换表将obj序列化为JSON格式的str。如果sure_ascii为False,则结果可能包含非ASCII字符,并且返回值可能是unicodeinstance

(强调我的)。请注意,它可能仍然是一个str实例。

因此,如果不检查返回的格式以及可能使用的格式,就无法使用其返回值将结构保存到文件中unicode.encode

当然,这在Python 3中不再是有效的问题,因为不再存在这种8位/ Unicode的混淆。


至于loadVS loadsload认为整个文件是一个JSON文件,所以你不能用它来从单个文件读取多个新行限制JSON文件。

One notable difference in Python 2 is that if you’re using ensure_ascii=False, dump will properly write UTF-8 encoded data into the file (unless you used 8-bit strings with extended characters that are not UTF-8):

dumps on the other hand, with ensure_ascii=False can produce a str or unicode just depending on what types you used for strings:

Serialize obj to a JSON formatted str using this conversion table. If ensure_ascii is False, the result may contain non-ASCII characters and the return value may be a unicode instance.

(emphasis mine). Note that it may still be a str instance as well.

Thus you cannot use its return value to save the structure into file without checking which format was returned and possibly playing with unicode.encode.

This of course is not valid concern in Python 3 any more, since there is no more this 8-bit/Unicode confusion.


As for load vs loads, load considers the whole file to be one JSON document, so you cannot use it to read multiple newline limited JSON documents from a single file.


在Python3中按索引访问dict_keys元素

问题:在Python3中按索引访问dict_keys元素

我正在尝试通过其索引访问dict_key的元素:

test = {'foo': 'bar', 'hello': 'world'}
keys = test.keys()  # dict_keys object

keys.index(0)
AttributeError: 'dict_keys' object has no attribute 'index'

我想得到foo

与:

keys[0]
TypeError: 'dict_keys' object does not support indexing

我怎样才能做到这一点?

I’m trying to access a dict_key’s element by its index:

test = {'foo': 'bar', 'hello': 'world'}
keys = test.keys()  # dict_keys object

keys.index(0)
AttributeError: 'dict_keys' object has no attribute 'index'

I want to get foo.

same with:

keys[0]
TypeError: 'dict_keys' object does not support indexing

How can I do this?


回答 0

list()而是调用字典:

keys = list(test)

在Python 3中,该dict.keys()方法返回一个字典视图对象,它作为一个集合。直接迭代字典也会产生键,因此将字典转换为列表会得到所有键的列表:

>>> test = {'foo': 'bar', 'hello': 'world'}
>>> list(test)
['foo', 'hello']
>>> list(test)[0]
'foo'

Call list() on the dictionary instead:

keys = list(test)

In Python 3, the dict.keys() method returns a dictionary view object, which acts as a set. Iterating over the dictionary directly also yields keys, so turning a dictionary into a list results in a list of all the keys:

>>> test = {'foo': 'bar', 'hello': 'world'}
>>> list(test)
['foo', 'hello']
>>> list(test)[0]
'foo'

回答 1

不是完整的答案,但可能是有用的提示。如果它确实是您想要的第一项*,那么

next(iter(q))

比快得多

list(q)[0]

对于大词典,因为不必将整个内容存储在内存中。

对于10.000.000物品,我发现它快了将近40.000倍。

*如果dict只是Python 3.6之前的伪随机项目,则第一项(此后它在标准实现中已订购,尽管不建议依赖它)。

Not a full answer but perhaps a useful hint. If it is really the first item you want*, then

next(iter(q))

is much faster than

list(q)[0]

for large dicts, since the whole thing doesn’t have to be stored in memory.

For 10.000.000 items I found it to be almost 40.000 times faster.

*The first item in case of a dict being just a pseudo-random item before Python 3.6 (after that it’s ordered in the standard implementation, although it’s not advised to rely on it).


回答 2

我想要第一个字典项的“键”和“值”对。我用下面的代码。

 key, val = next(iter(my_dict.items()))

I wanted “key” & “value” pair of a first dictionary item. I used the following code.

 key, val = next(iter(my_dict.items()))

回答 3

test = {'foo': 'bar', 'hello': 'world'}
ls = []
for key in test.keys():
    ls.append(key)
print(ls[0])

将键附加到静态定义的列表然后对其进行索引的常规方式

test = {'foo': 'bar', 'hello': 'world'}
ls = []
for key in test.keys():
    ls.append(key)
print(ls[0])

Conventional way of appending the keys to a statically defined list and then indexing it for same


回答 4

在许多情况下,这可能是XY问题。为什么要按位置索引字典键?您真的需要吗?直到最近,字典甚至还没有在Python中排序,因此访问第一个元素是任意的。

我刚刚将一些Python 2代码翻译为Python 3:

keys = d.keys()
for (i, res) in enumerate(some_list):
    k = keys[i]
    # ...

这不是很漂亮,但也不是很糟糕。起初,我正要用可怕的东西代替它

    k = next(itertools.islice(iter(keys), i, None))

在我意识到这一切写成更好之前

for (k, res) in zip(d.keys(), some_list):

效果很好。

我相信在许多其他情况下,可以避免按位置索引字典关键字。尽管字典在Python 3.7中是有序的,但是依靠它并不是很漂亮。上面的代码仅起作用,因为的内容some_list是最近从的内容中产生的d

如果您确实需要disk_keys按索引访问元素,请仔细看一下代码。也许您不需要。

In many cases, this may be an XY Problem. Why are you indexing your dictionary keys by position? Do you really need to? Until recently, dictionaries were not even ordered in Python, so accessing the first element was arbitrary.

I just translated some Python 2 code to Python 3:

keys = d.keys()
for (i, res) in enumerate(some_list):
    k = keys[i]
    # ...

which is not pretty, but not very bad either. At first, I was about to replace it by the monstrous

    k = next(itertools.islice(iter(keys), i, None))

before I realised this is all much better written as

for (k, res) in zip(d.keys(), some_list):

which works just fine.

I believe that in many other cases, indexing dictionary keys by position can be avoided. Although dictionaries are ordered in Python 3.7, relying on that is not pretty. The code above only works because the contents of some_list had been recently produced from the contents of d.

Have a hard look at your code if you really need to access a disk_keys element by index. Perhaps you don’t need to.


回答 5

试试这个

keys = [next(iter(x.keys())) for x in test]
print(list(keys))

结果看起来像这样。[‘foo’,’hello’]

您可以在此处找到更多可能的解决方案。

Try this

keys = [next(iter(x.keys())) for x in test]
print(list(keys))

The result looks like this. [‘foo’, ‘hello’]

You can find more possible solutions here.


上个月的python日期

问题:上个月的python日期

我正在尝试使用python获取上个月的日期。这是我尝试过的:

str( time.strftime('%Y') ) + str( int(time.strftime('%m'))-1 )

但是,这种方法很糟糕,原因有两个:首先,它返回2012年2月的20122(而不是201202),其次它将返回0而不是1月的12。

我已经用bash解决了这个麻烦

echo $(date -d"3 month ago" "+%G%m%d")

我认为,如果bash为此目的提供了一种内置方式,那么功能更强大的python应该比强迫编写自己的脚本来实现此目标更好。我当然可以做类似的事情:

if int(time.strftime('%m')) == 1:
    return '12'
else:
    if int(time.strftime('%m')) < 10:
        return '0'+str(time.strftime('%m')-1)
    else:
        return str(time.strftime('%m') -1)

我没有测试过此代码,也不想使用它(除非我找不到其他方法:/)

谢谢你的帮助!

I am trying to get the date of the previous month with python. Here is what i’ve tried:

str( time.strftime('%Y') ) + str( int(time.strftime('%m'))-1 )

However, this way is bad for 2 reasons: First it returns 20122 for the February of 2012 (instead of 201202) and secondly it will return 0 instead of 12 on January.

I have solved this trouble in bash with

echo $(date -d"3 month ago" "+%G%m%d")

I think that if bash has a built-in way for this purpose, then python, much more equipped, should provide something better than forcing writing one’s own script to achieve this goal. Of course i could do something like:

if int(time.strftime('%m')) == 1:
    return '12'
else:
    if int(time.strftime('%m')) < 10:
        return '0'+str(time.strftime('%m')-1)
    else:
        return str(time.strftime('%m') -1)

I have not tested this code and i don’t want to use it anyway (unless I can’t find any other way:/)

Thanks for your help!


回答 0

datetime和datetime.timedelta类是您的朋友。

  1. 找到今天。
  2. 用它来查找本月的第一天。
  3. 使用timedelta备份一天,直到上个月的最后一天。
  4. 打印您要查找的YYYYMM字符串。

像这样:

 import datetime
 today = datetime.date.today()
 first = today.replace(day=1)
 lastMonth = first - datetime.timedelta(days=1)
 print(lastMonth.strftime("%Y%m"))

201202 打印。

datetime and the datetime.timedelta classes are your friend.

  1. find today.
  2. use that to find the first day of this month.
  3. use timedelta to backup a single day, to the last day of the previous month.
  4. print the YYYYMM string you’re looking for.

Like this:

 import datetime
 today = datetime.date.today()
 first = today.replace(day=1)
 lastMonth = first - datetime.timedelta(days=1)
 print(lastMonth.strftime("%Y%m"))

201202 is printed.


回答 1

您应该使用dateutil。这样,您就可以使用relativedelta,它是timedelta的改进版本。

>>> import datetime 
>>> import dateutil.relativedelta
>>> now = datetime.datetime.now()
>>> print now
2012-03-15 12:33:04.281248
>>> print now + dateutil.relativedelta.relativedelta(months=-1)
2012-02-15 12:33:04.281248

You should use dateutil. With that, you can use relativedelta, it’s an improved version of timedelta.

>>> import datetime 
>>> import dateutil.relativedelta
>>> now = datetime.datetime.now()
>>> print now
2012-03-15 12:33:04.281248
>>> print now + dateutil.relativedelta.relativedelta(months=-1)
2012-02-15 12:33:04.281248

回答 2

from datetime import date, timedelta

first_day_of_current_month = date.today().replace(day=1)
last_day_of_previous_month = first_day_of_current_month - timedelta(days=1)

print "Previous month:", last_day_of_previous_month.month

要么:

from datetime import date, timedelta

prev = date.today().replace(day=1) - timedelta(days=1)
print prev.month
from datetime import date, timedelta

first_day_of_current_month = date.today().replace(day=1)
last_day_of_previous_month = first_day_of_current_month - timedelta(days=1)

print "Previous month:", last_day_of_previous_month.month

Or:

from datetime import date, timedelta

prev = date.today().replace(day=1) - timedelta(days=1)
print prev.month

回答 3

bgporter的答案为基础

def prev_month_range(when = None): 
    """Return (previous month's start date, previous month's end date)."""
    if not when:
        # Default to today.
        when = datetime.datetime.today()
    # Find previous month: https://stackoverflow.com/a/9725093/564514
    # Find today.
    first = datetime.date(day=1, month=when.month, year=when.year)
    # Use that to find the first day of this month.
    prev_month_end = first - datetime.timedelta(days=1)
    prev_month_start = datetime.date(day=1, month= prev_month_end.month, year= prev_month_end.year)
    # Return previous month's start and end dates in YY-MM-DD format.
    return (prev_month_start.strftime('%Y-%m-%d'), prev_month_end.strftime('%Y-%m-%d'))

Building on bgporter’s answer.

def prev_month_range(when = None): 
    """Return (previous month's start date, previous month's end date)."""
    if not when:
        # Default to today.
        when = datetime.datetime.today()
    # Find previous month: https://stackoverflow.com/a/9725093/564514
    # Find today.
    first = datetime.date(day=1, month=when.month, year=when.year)
    # Use that to find the first day of this month.
    prev_month_end = first - datetime.timedelta(days=1)
    prev_month_start = datetime.date(day=1, month= prev_month_end.month, year= prev_month_end.year)
    # Return previous month's start and end dates in YY-MM-DD format.
    return (prev_month_start.strftime('%Y-%m-%d'), prev_month_end.strftime('%Y-%m-%d'))

回答 4

它非常容易和简单。做这个

from dateutil.relativedelta import relativedelta
from datetime import datetime

today_date = datetime.today()
print "todays date time: %s" %today_date

one_month_ago = today_date - relativedelta(months=1)
print "one month ago date time: %s" % one_month_ago
print "one month ago date: %s" % one_month_ago.date()

输出如下:$ python2.7 main.py

todays date time: 2016-09-06 02:13:01.937121
one month ago date time: 2016-08-06 02:13:01.937121
one month ago date: 2016-08-06

Its very easy and simple. Do this

from dateutil.relativedelta import relativedelta
from datetime import datetime

today_date = datetime.today()
print "todays date time: %s" %today_date

one_month_ago = today_date - relativedelta(months=1)
print "one month ago date time: %s" % one_month_ago
print "one month ago date: %s" % one_month_ago.date()

Here is the output: $python2.7 main.py

todays date time: 2016-09-06 02:13:01.937121
one month ago date time: 2016-08-06 02:13:01.937121
one month ago date: 2016-08-06

回答 5

对于到达这里并希望获得上个月的第一天和最后一天的人:

from datetime import date, timedelta

last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)

start_day_of_prev_month = date.today().replace(day=1) - timedelta(days=last_day_of_prev_month.day)

# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)

输出:

First day of prev month: 2019-02-01
Last day of prev month: 2019-02-28

For someone who got here and looking to get both the first and last day of the previous month:

from datetime import date, timedelta

last_day_of_prev_month = date.today().replace(day=1) - timedelta(days=1)

start_day_of_prev_month = date.today().replace(day=1) - timedelta(days=last_day_of_prev_month.day)

# For printing results
print("First day of prev month:", start_day_of_prev_month)
print("Last day of prev month:", last_day_of_prev_month)

Output:

First day of prev month: 2019-02-01
Last day of prev month: 2019-02-28

回答 6

def prev_month(date=datetime.datetime.today()):
    if date.month == 1:
        return date.replace(month=12,year=date.year-1)
    else:
        try:
            return date.replace(month=date.month-1)
        except ValueError:
            return prev_month(date=date.replace(day=date.day-1))
def prev_month(date=datetime.datetime.today()):
    if date.month == 1:
        return date.replace(month=12,year=date.year-1)
    else:
        try:
            return date.replace(month=date.month-1)
        except ValueError:
            return prev_month(date=date.replace(day=date.day-1))

回答 7

只是为了好玩,一个使用divmod的纯数学答案。由于相乘,效率很低,也可以对月份数进行简单检查(如果等于12,则增加年份等)

year = today.year
month = today.month

nm = list(divmod(year * 12 + month + 1, 12))
if nm[1] == 0:
    nm[1] = 12
    nm[0] -= 1
pm = list(divmod(year * 12 + month - 1, 12))
if pm[1] == 0:
    pm[1] = 12
    pm[0] -= 1

next_month = nm
previous_month = pm

Just for fun, a pure math answer using divmod. Pretty inneficient because of the multiplication, could do just as well a simple check on the number of month (if equal to 12, increase year, etc)

year = today.year
month = today.month

nm = list(divmod(year * 12 + month + 1, 12))
if nm[1] == 0:
    nm[1] = 12
    nm[0] -= 1
pm = list(divmod(year * 12 + month - 1, 12))
if pm[1] == 0:
    pm[1] = 12
    pm[0] -= 1

next_month = nm
previous_month = pm

回答 8

使用Pendulum非常完整的库,我们有了subtract方法(而不是“ subStract”):

import pendulum
today = pendulum.datetime.today()  # 2020, january
lastmonth = today.subtract(months=1)
lastmonth.strftime('%Y%m')
# '201912'

我们看到它可以应对跳跃的岁月。

反向等效为add

https://pendulum.eustace.io/docs/#addition-and-subtraction

With the Pendulum very complete library, we have the subtract method (and not “subStract”):

import pendulum
today = pendulum.datetime.today()  # 2020, january
lastmonth = today.subtract(months=1)
lastmonth.strftime('%Y%m')
# '201912'

We see that it handles jumping years.

The reverse equivalent is add.

https://pendulum.eustace.io/docs/#addition-and-subtraction


回答 9

以@JF Sebastian的注释为基础,您可以将replace()函数链接起来以返回一个“月”。由于一个月不是固定的时间段,因此此解决方案尝试返回到上个月的同一日期,这当然不能在所有月份都有效。在这种情况下,此算法默认为上个月的最后一天。

from datetime import datetime, timedelta

d = datetime(2012, 3, 31) # A problem date as an example

# last day of last month
one_month_ago = (d.replace(day=1) - timedelta(days=1))
try:
    # try to go back to same day last month
    one_month_ago = one_month_ago.replace(day=d.day)
except ValueError:
    pass
print("one_month_ago: {0}".format(one_month_ago))

输出:

one_month_ago: 2012-02-29 00:00:00

Building off the comment of @J.F. Sebastian, you can chain the replace() function to go back one “month”. Since a month is not a constant time period, this solution tries to go back to the same date the previous month, which of course does not work for all months. In such a case, this algorithm defaults to the last day of the prior month.

from datetime import datetime, timedelta

d = datetime(2012, 3, 31) # A problem date as an example

# last day of last month
one_month_ago = (d.replace(day=1) - timedelta(days=1))
try:
    # try to go back to same day last month
    one_month_ago = one_month_ago.replace(day=d.day)
except ValueError:
    pass
print("one_month_ago: {0}".format(one_month_ago))

Output:

one_month_ago: 2012-02-29 00:00:00

回答 10

如果要在LINUX / UNIX环境中查看EXE类型文件中的ASCII字母,请尝试“ od -c’filename’| more”

您可能会得到很多无法识别的项目,但它们都会全部显示出来,并且将显示HEX表示形式,并且ASCII等效字符(如果适用)将跟随十六进制代码行。在您知道的已编译代码上尝试一下。您可能会在其中识别出一些东西。

If you want to look at the ASCII letters in a EXE type file in a LINUX/UNIX Environment, try “od -c ‘filename’ |more”

You will likely get a lot of unrecognizable items, but they will all be presented, and the HEX representations will be displayed, and the ASCII equivalent characters (if appropriate) will follow the line of hex codes. Try it on a compiled piece of code that you know. You might see things in it you recognize.


回答 11

有一个高级库dateparser可以确定给定自然语言的过去日期,并返回相应的Python datetime对象

from dateparser import parse
parse('4 months ago')

There is a high level library dateparser that can determine the past date given natural language, and return the corresponding Python datetime object

from dateparser import parse
parse('4 months ago')

计算大熊猫数量的最有效方法是什么?

问题:计算大熊猫数量的最有效方法是什么?

我有一个大的(约1200万行)数据帧df,说:

df.columns = ['word','documents','frequency']

因此,以下及时运行:

word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']

但是,这要花费很长的时间才能运行:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

我在这里做错了什么?有没有更好的方法来计算大型数据框中的出现次数?

df.word.describe()

运行良好,所以我真的没想到这个Occurrences_of_Words数据框会花费很长时间。

ps:如果答案很明显,并且您觉得有必要因提出这个问题而对我不利,请同时提供答案。谢谢。

I have a large (about 12M rows) dataframe df with say:

df.columns = ['word','documents','frequency']

So the following ran in a timely fashion:

word_grouping = df[['word','frequency']].groupby('word')
MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index()
MaxFrequency_perWord.columns = ['word','MaxFrequency']

However, this is taking an unexpected long time to run:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

What am I doing wrong here? Is there a better way to count occurences in a large dataframe?

df.word.describe()

ran pretty well, so I really did not expect this Occurrences_of_Words dataframe to take very long to build.

ps: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well. thank you.


回答 0

我认为df['word'].value_counts()应该服务。通过跳过groupby机制,您可以节省一些时间。我不知道为什么count要慢于max。两者都需要一些时间来避免丢失值。(与相比size。)

无论如何,对value_counts进行了专门优化以处理像您的单词这样的对象类型,因此我怀疑您会做得更好。

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you’ll save some time. I’m not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you’ll do much better than that.


回答 1

当您想统计pandas dataFrame中一列中分类数据的频率时,请使用: df['Column_Name'].value_counts()

来源

When you want to count the frequency of categorical data in a column in pandas dataFrame use: df['Column_Name'].value_counts()

Source.


回答 2

只是先前答案的补充。别忘了,在处理实际数据时,可能会有空值,因此使用选项将默认值包括在内也很有用dropna=False默认值为True

一个例子:

>>> df['Embarked'].value_counts(dropna=False)
S      644
C      168
Q       77
NaN      2

Just an addition to the previous answers. Let’s not forget that when dealing with real data there might be null values, so it’s useful to also include those in the counting by using the option dropna=False (default is True)

An example:

>>> df['Embarked'].value_counts(dropna=False)
S      644
C      168
Q       77
NaN      2

在Python列表的第一位置插入[关闭]

问题:在Python列表的第一位置插入[关闭]

如何在列表的第一个索引处插入元素?如果我使用list.insert(0,elem),elem是否会修改第一个索引的内容?还是我必须使用第一个元素创建一个新列表,然后将旧列表复制到这个新列表中?

How can I insert an element at the first index of a list ? If I use list.insert(0,elem), do elem modify the content of the first index? Or do I have to create a new list with the first elem and then copy the old list inside this new one?


回答 0

用途insert

In [1]: ls = [1,2,3]

In [2]: ls.insert(0, "new")

In [3]: ls
Out[3]: ['new', 1, 2, 3]

Use insert:

In [1]: ls = [1,2,3]

In [2]: ls.insert(0, "new")

In [3]: ls
Out[3]: ['new', 1, 2, 3]

回答 1

从文档中:

list.insert(i,x)
在给定位置插入项目。第一个参数是要在其之前插入的元素的索引,因此a.insert(0, x)将其插入 到列表的开头,并且a.insert(len(a),x)等效于a.append(x)

http://docs.python.org/2/tutorial/datastructures.html#more-on-lists

From the documentation:

list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a),x) is equivalent to a.append(x)

http://docs.python.org/2/tutorial/datastructures.html#more-on-lists


Tensorflow 2.0-AttributeError:模块’tensorflow’没有属性’Session’

问题:Tensorflow 2.0-AttributeError:模块’tensorflow’没有属性’Session’

sess = tf.Session()在Tensorflow 2.0环境中执行命令时,出现如下错误消息:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'

系统信息:

  • 操作系统平台和发行版:Windows 10
  • python版本:3.7.1
  • Tensorflow版本:2.0.0-alpha0(随pip一起安装)

重现步骤:

安装:

  1. 点安装-升级点
  2. pip install tensorflow == 2.0.0-alpha0
  3. 点安装keras
  4. 点安装numpy == 1.16.2

执行:

  1. 执行命令:将tensorflow导入为tf
  2. 执行命令:sess = tf.Session()

When I am executing the command sess = tf.Session() in Tensorflow 2.0 environment, I am getting an error message as below:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'tensorflow' has no attribute 'Session'

System Information:

  • OS Platform and Distribution: Windows 10
  • Python Version: 3.7.1
  • Tensorflow Version: 2.0.0-alpha0 (installed with pip)

Steps to reproduce:

Installation:

  1. pip install –upgrade pip
  2. pip install tensorflow==2.0.0-alpha0
  3. pip install keras
  4. pip install numpy==1.16.2

Execution:

  1. Execute command: import tensorflow as tf
  2. Execute command: sess = tf.Session()

回答 0

根据TF 1:1 Symbols Map,在TF 2.0中,您应该使用tf.compat.v1.Session()而不是tf.Session()

https://docs.google.com/spreadsheets/d/1FLFJLzg7WNP6JHODX5q8BDgptKafq_slHpnHVbJIteQ/edit#gid=0

要获得TF 2.0中类似TF 1.x的行为,可以运行

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

但后来人们无法受益于TF 2.0所做的许多改进。有关更多详细信息,请参阅迁移指南 https://www.tensorflow.org/guide/migrate

According to TF 1:1 Symbols Map, in TF 2.0 you should use tf.compat.v1.Session() instead of tf.Session()

https://docs.google.com/spreadsheets/d/1FLFJLzg7WNP6JHODX5q8BDgptKafq_slHpnHVbJIteQ/edit#gid=0

To get TF 1.x like behaviour in TF 2.0 one can run

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

but then one cannot benefit of many improvements made in TF 2.0. For more details please refer to the migration guide https://www.tensorflow.org/guide/migrate


回答 1

TF2默认情况下运行急切执行,因此无需会话。如果要运行静态图,则更正确的方法是tf.function()在TF2中使用。虽然仍然可以通过tf.compat.v1.Session()TF2访问Session ,但我不建议使用它。通过比较问候世界中的差异来证明这种差异可能会有所帮助:

TF1.x你好世界:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(msg))

TF2.x你好世界:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
tf.print(msg)

有关更多信息,请参见Effective TensorFlow 2

TF2 runs Eager Execution by default, thus removing the need for Sessions. If you want to run static graphs, the more proper way is to use tf.function() in TF2. While Session can still be accessed via tf.compat.v1.Session() in TF2, I would discourage using it. It may be helpful to demonstrate this difference by comparing the difference in hello worlds:

TF1.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(msg))

TF2.x hello world:

import tensorflow as tf
msg = tf.constant('Hello, TensorFlow!')
tf.print(msg)

For more info, see Effective TensorFlow 2


回答 2

安装后第一次尝试python时遇到了这个问题 windows10 + python3.7(64bit) + anacconda3 + jupyter notebook.

我通过参考“ https://vispud.blogspot.com/2019/05/tensorflow200a0-attributeerror-module.html ”解决了此问题

我同意

我相信TF 2.0已删除了“ Session()”。

我插入了两行。一个是tf.compat.v1.disable_eager_execution(),另一个是sess = tf.compat.v1.Session()

我的Hello.py如下:

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()

print(sess.run(hello))

I faced this problem when I first tried python after installing windows10 + python3.7(64bit) + anacconda3 + jupyter notebook.

I solved this problem by refering to “https://vispud.blogspot.com/2019/05/tensorflow200a0-attributeerror-module.html

I agree with

I believe “Session()” has been removed with TF 2.0.

I inserted two lines. One is tf.compat.v1.disable_eager_execution() and the other is sess = tf.compat.v1.Session()

My Hello.py is as follows:

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()

print(sess.run(hello))

回答 3

对于TF2.x,您可以这样做。

import tensorflow as tf
with tf.compat.v1.Session() as sess:
    hello = tf.constant('hello world')
    print(sess.run(hello))

>>> b'hello world

For TF2.x, you can do like this.

import tensorflow as tf
with tf.compat.v1.Session() as sess:
    hello = tf.constant('hello world')
    print(sess.run(hello))

>>> b'hello world


回答 4

尝试这个

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()

print(sess.run(hello))

try this

import tensorflow as tf

tf.compat.v1.disable_eager_execution()

hello = tf.constant('Hello, TensorFlow!')

sess = tf.compat.v1.Session()

print(sess.run(hello))

回答 5

如果这是您的代码,则正确的解决方案是将其重写为不使用Session(),因为在TensorFlow 2中不再需要

如果这只是您正在运行的代码,则可以通过运行降级到TensorFlow 1

pip3 install --upgrade --force-reinstall tensorflow-gpu==1.15.0 

(或TensorFlow 1最新版本

If this is your code, the correct solution is to rewrite it to not use Session(), since that’s no longer necessary in TensorFlow 2

If this is just code you’re running, you can downgrade to TensorFlow 1 by running

pip3 install --upgrade --force-reinstall tensorflow-gpu==1.15.0 

(or whatever the latest version of TensorFlow 1 is)


回答 6

Tensorflow 2.x支持默认执行Eager Execution,因此不支持Session。

Tensorflow 2.x support’s Eager Execution by default hence Session is not supported.


回答 7

使用Anaconda + Spyder(Python 3.7)

[码]

import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
type(valor1)
print(valor1)
soma=valor1+valor2
type(soma)
print(soma)
sess = tf.compat.v1.Session()
with sess:
    print(sess.run(soma))

[安慰]

import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
type(valor1)
print(valor1)
soma=valor1+valor2
type(soma)
Tensor("Const_8:0", shape=(), dtype=int32)
Out[18]: tensorflow.python.framework.ops.Tensor

print(soma)
Tensor("add_4:0", shape=(), dtype=int32)

sess = tf.compat.v1.Session()

with sess:
    print(sess.run(soma))
5

Using Anaconda + Spyder (Python 3.7)

[code]

import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
type(valor1)
print(valor1)
soma=valor1+valor2
type(soma)
print(soma)
sess = tf.compat.v1.Session()
with sess:
    print(sess.run(soma))

[console]

import tensorflow as tf
valor1 = tf.constant(2)
valor2 = tf.constant(3)
type(valor1)
print(valor1)
soma=valor1+valor2
type(soma)
Tensor("Const_8:0", shape=(), dtype=int32)
Out[18]: tensorflow.python.framework.ops.Tensor

print(soma)
Tensor("add_4:0", shape=(), dtype=int32)

sess = tf.compat.v1.Session()

with sess:
    print(sess.run(soma))
5

回答 8

TF v2.0支持Eager模式和v1.0的Graph模式。因此,v2.0不支持tf.session()。因此,建议您重写代码以在Eager模式下工作。

TF v2.0 supports Eager mode vis-a-vis Graph mode of v1.0. Hence, tf.session() is not supported on v2.0. Hence, would suggest you to rewrite your code to work in Eager mode.


回答 9

import tensorflow as tf
sess = tf.Session()

此代码将在版本2.x上显示属性错误

在版本2.x中使用版本1.x代码

尝试这个

import tensorflow.compat.v1 as tf
sess = tf.Session()
import tensorflow as tf
sess = tf.Session()

this code will show an Attribute error on version 2.x

to use version 1.x code in version 2.x

try this

import tensorflow.compat.v1 as tf
sess = tf.Session()

在运行时确定带有upload_to的Django FileField

问题:在运行时确定带有upload_to的Django FileField

我正在尝试设置我的上传文件,以便如果用户joe上传文件,则文件将转到MEDIA_ROOT / joe,而不是让每个人的文件都转到MEDIA_ROOT。问题是我不知道如何在模型中定义它。这是当前的外观:

class Content(models.Model):
    name = models.CharField(max_length=200)
    user = models.ForeignKey(User)
    file = models.FileField(upload_to='.')

所以我想要的不是“。” 作为upload_to,将其作为用户名。

我知道从Django 1.0开始,您可以定义自己的函数来处理upload_to,但是该函数也不知道谁将成为谁,所以我有点迷失了。

谢谢您的帮助!

I’m trying to set up my uploads so that if user joe uploads a file it goes to MEDIA_ROOT/joe as opposed to having everyone’s files go to MEDIA_ROOT. The problem is I don’t know how to define this in the model. Here is how it currently looks:

class Content(models.Model):
    name = models.CharField(max_length=200)
    user = models.ForeignKey(User)
    file = models.FileField(upload_to='.')

So what I want is instead of ‘.’ as the upload_to, have it be the user’s name.

I understand that as of Django 1.0 you can define your own function to handle the upload_to but that function has no idea of who the user will be either so I’m a bit lost.

Thanks for the help!


回答 0

您可能已经阅读了文档,所以这里有一个简单的示例可以使之有意义:

def content_file_name(instance, filename):
    return '/'.join(['content', instance.user.username, filename])

class Content(models.Model):
    name = models.CharField(max_length=200)
    user = models.ForeignKey(User)
    file = models.FileField(upload_to=content_file_name)

如您所见,您甚至不需要使用给定的文件名-如果愿意,您也可以覆盖您可调用的upload_to中的文件名。

You’ve probably read the documentation, so here’s an easy example to make it make sense:

def content_file_name(instance, filename):
    return '/'.join(['content', instance.user.username, filename])

class Content(models.Model):
    name = models.CharField(max_length=200)
    user = models.ForeignKey(User)
    file = models.FileField(upload_to=content_file_name)

As you can see, you don’t even need to use the filename given – you could override that in your upload_to callable too if you liked.


回答 1

这确实有帮助。为了简洁起见,决定在我的情况下使用lambda:

file = models.FileField(
    upload_to=lambda instance, filename: '/'.join(['mymodel', str(instance.pk), filename]),
)

This really helped. For a bit more brevity’s sake, decided to use lambda in my case:

file = models.FileField(
    upload_to=lambda instance, filename: '/'.join(['mymodel', str(instance.pk), filename]),
)

回答 2

关于使用“实例”对象的pk值的注释。根据文档:

在大多数情况下,此对象尚未保存到数据库,因此,如果使用默认的AutoField,则它的主键字段可能尚未具有值。

因此,使用pk的有效性取决于特定模型的定义。

A note on using the ‘instance’ object’s pk value. According to the documentation:

In most cases, this object will not have been saved to the database yet, so if it uses the default AutoField, it might not yet have a value for its primary key field.

Therefore the validity of using pk depends on how your particular model is defined.


回答 3

如果您在迁移时遇到问题,则可能应该使用@deconstructible装饰器。

import datetime
import os
import unicodedata

from django.core.files.storage import default_storage
from django.utils.deconstruct import deconstructible
from django.utils.encoding import force_text, force_str


@deconstructible
class UploadToPath(object):
    def __init__(self, upload_to):
        self.upload_to = upload_to

    def __call__(self, instance, filename):
        return self.generate_filename(filename)

    def get_directory_name(self):
        return os.path.normpath(force_text(datetime.datetime.now().strftime(force_str(self.upload_to))))

    def get_filename(self, filename):
        filename = default_storage.get_valid_name(os.path.basename(filename))
        filename = force_text(filename)
        filename = unicodedata.normalize('NFKD', filename).encode('ascii', 'ignore').decode('ascii')
        return os.path.normpath(filename)

    def generate_filename(self, filename):
        return os.path.join(self.get_directory_name(), self.get_filename(filename))

用法:

class MyModel(models.Model):
    file = models.FileField(upload_to=UploadToPath('files/%Y/%m/%d'), max_length=255)

If you have problems with migrations you probably should be using @deconstructible decorator.

import datetime
import os
import unicodedata

from django.core.files.storage import default_storage
from django.utils.deconstruct import deconstructible
from django.utils.encoding import force_text, force_str


@deconstructible
class UploadToPath(object):
    def __init__(self, upload_to):
        self.upload_to = upload_to

    def __call__(self, instance, filename):
        return self.generate_filename(filename)

    def get_directory_name(self):
        return os.path.normpath(force_text(datetime.datetime.now().strftime(force_str(self.upload_to))))

    def get_filename(self, filename):
        filename = default_storage.get_valid_name(os.path.basename(filename))
        filename = force_text(filename)
        filename = unicodedata.normalize('NFKD', filename).encode('ascii', 'ignore').decode('ascii')
        return os.path.normpath(filename)

    def generate_filename(self, filename):
        return os.path.join(self.get_directory_name(), self.get_filename(filename))

Usage:

class MyModel(models.Model):
    file = models.FileField(upload_to=UploadToPath('files/%Y/%m/%d'), max_length=255)