标签归档:python-2.7

过滤python词典中的项,其中键包含特定的字符串

问题:过滤python词典中的项,其中键包含特定的字符串

我是用python开发东西的C编码器。我知道如何在C语言中执行以下操作(以及因此在应用于Python的类似C的逻辑中),但是我想知道这样做的“ Python”方式是什么。

我有一个字典d,我想对项的子集进行操作,只有那些键(字符串)的项包含特定的子字符串。

即C逻辑将是:

for key in d:
    if filter_string in key:
        # do something
    else
        # do nothing, continue

我在想python版本会像

filtered_dict = crazy_python_syntax(d, substring)
for key,value in filtered_dict.iteritems():
    # do something

我在这里找到了很多有关过滤字典的文章,但是找不到与之相关的文章。

我的字典未嵌套,我正在使用python 2.7

I’m a C coder developing something in python. I know how to do the following in C (and hence in C-like logic applied to python), but I’m wondering what the ‘Python’ way of doing it is.

I have a dictionary d, and I’d like to operate on a subset of the items, only those who’s key (string) contains a specific substring.

i.e. the C logic would be:

for key in d:
    if filter_string in key:
        # do something
    else
        # do nothing, continue

I’m imagining the python version would be something like

filtered_dict = crazy_python_syntax(d, substring)
for key,value in filtered_dict.iteritems():
    # do something

I’ve found a lot of posts on here regarding filtering dictionaries, but couldn’t find one which involved exactly this.

My dictionary is not nested and i’m using python 2.7


回答 0

字典理解如何:

filtered_dict = {k:v for k,v in d.iteritems() if filter_string in k}

您所看到的它应该是不言自明的,因为它的英语读起来很好。

此语法要求Python 2.7或更高版本。

在Python 3中,只有dict.items()iteritems()所以您可以使用:

filtered_dict = {k:v for (k,v) in d.items() if filter_string in k}

How about a dict comprehension:

filtered_dict = {k:v for k,v in d.iteritems() if filter_string in k}

One you see it, it should be self-explanatory, as it reads like English pretty well.

This syntax requires Python 2.7 or greater.

In Python 3, there is only dict.items(), not iteritems() so you would use:

filtered_dict = {k:v for (k,v) in d.items() if filter_string in k}

回答 1

选择最易读和易于维护的内容。仅仅因为您可以将其写成一行并不意味着您应该这样做。您现有的解决方案与我将要使用的迭代器跳过用户查找值的方法很接近,并且我讨厌如果不能避免,则使用嵌套的ifs:

for key, val in d.iteritems():
    if filter_string not in key:
        continue
    # do something

但是,如果您确实想要让您迭代筛选的dict的东西,那么我将不会执行构建筛选的dict然后对其进行迭代的两步过程,而是使用生成器,因为比pythonic(和超赞的)要好得多生成器?

首先,我们创建我们的生成器,并且良好的设计要求我们使它足够抽象以便可重用:

# The implementation of my generator may look vaguely familiar, no?
def filter_dict(d, filter_string):
    for key, val in d.iteritems():
        if filter_string not in key:
            continue
        yield key, val

然后,我们可以使用生成器通过简单易懂的代码很好地,干净地解决您的问题:

for key, val in filter_dict(d, some_string):
    # do something

简而言之:生成器很棒。

Go for whatever is most readable and easily maintainable. Just because you can write it out in a single line doesn’t mean that you should. Your existing solution is close to what I would use other than I would user iteritems to skip the value lookup, and I hate nested ifs if I can avoid them:

for key, val in d.iteritems():
    if filter_string not in key:
        continue
    # do something

However if you realllly want something to let you iterate through a filtered dict then I would not do the two step process of building the filtered dict and then iterating through it, but instead use a generator, because what is more pythonic (and awesome) than a generator?

First we create our generator, and good design dictates that we make it abstract enough to be reusable:

# The implementation of my generator may look vaguely familiar, no?
def filter_dict(d, filter_string):
    for key, val in d.iteritems():
        if filter_string not in key:
            continue
        yield key, val

And then we can use the generator to solve your problem nice and cleanly with simple, understandable code:

for key, val in filter_dict(d, some_string):
    # do something

In short: generators are awesome.


回答 2

您可以使用内置的过滤器功能根据特定条件过滤字典,列表等。

filtered_dict = dict(filter(lambda item: filter_str in item[0], d.items()))

优点是您可以将其用于不同的数据结构。

You can use the built-in filter function to filter dictionaries, lists, etc. based on specific conditions.

filtered_dict = dict(filter(lambda item: filter_str in item[0], d.items()))

The advantage is that you can use it for different data structures.


回答 3

input = {"A":"a", "B":"b", "C":"c"}
output = {k:v for (k,v) in input.items() if key_satifies_condition(k)}
input = {"A":"a", "B":"b", "C":"c"}
output = {k:v for (k,v) in input.items() if key_satifies_condition(k)}

回答 4

乔纳森(Jonathon)在他的回答中给了你运用字典理解的方法。这是处理您要做的事情的一种方法。

如果您想对字典的值做一些事情,则根本不需要字典理解:

我正在使用iteritems(),因为您用标记了您的问题

results = map(some_function, [(k,v) for k,v in a_dict.iteritems() if 'foo' in k])

现在,结果将出现在列表中,该列表some_function应用于已包含foo在其键中的字典的每个键/值对。

如果只想处理值并忽略键,则只需更改列表理解即可:

results = map(some_function, [v for k,v in a_dict.iteritems() if 'foo' in k])

some_function 可以是任何可调用的,因此lambda也可以工作:

results = map(lambda x: x*2, [v for k,v in a_dict.iteritems() if 'foo' in k])

内部列表实际上不是必需的,因为您还可以传递生成器表达式来映射:

>>> map(lambda a: a[0]*a[1], ((k,v) for k,v in {2:2, 3:2}.iteritems() if k == 2))
[4]

Jonathon gave you an approach using dict comprehensions in his answer. Here is an approach that deals with your do something part.

If you want to do something with the values of the dictionary, you don’t need a dictionary comprehension at all:

I’m using iteritems() since you tagged your question with

results = map(some_function, [(k,v) for k,v in a_dict.iteritems() if 'foo' in k])

Now the result will be in a list with some_function applied to each key/value pair of the dictionary, that has foo in its key.

If you just want to deal with the values and ignore the keys, just change the list comprehension:

results = map(some_function, [v for k,v in a_dict.iteritems() if 'foo' in k])

some_function can be any callable, so a lambda would work as well:

results = map(lambda x: x*2, [v for k,v in a_dict.iteritems() if 'foo' in k])

The inner list is actually not required, as you can pass a generator expression to map as well:

>>> map(lambda a: a[0]*a[1], ((k,v) for k,v in {2:2, 3:2}.iteritems() if k == 2))
[4]

RuntimeWarning:在除法中遇到无效的值

问题:RuntimeWarning:在除法中遇到无效的值

我必须使用欧拉方法为“弹簧中的球”模型编写程序

from pylab import*
from math import*
m=0.1
Lo=1
tt=30
k=200
t=20
g=9.81
dt=0.01
n=int((ceil(t/dt)))
km=k/m
r0=[-5,5*sqrt(3)]
v0=[-5,5*sqrt(3)]
a=zeros((n,2))
r=zeros((n,2))
v=zeros((n,2))
t=zeros((n,2))
r[1,:]=r0
v[1,:]=v0
for i in range(n-1):
    rr=dot(r[i,:],r[i,:])**0.5
    a=-g+km*cos(tt)*(rr-L0)*r[i,:]/rr
    v[i+1,:]=v[i,:]+a*dt
    r[i+1,:]=r[i,:]+v[i+1,:]*dt
    t[i+1]=t[i]+dt

    #print norm(r[i,:])

plot(r[:,0],r[:,1])
xlim(-100,100)
ylim(-100,100)
xlabel('x [m]')
ylabel('y [m]')

show()

我不断收到此错误:

a=-g+km*cos(tt)*(rr-L0)*r[i,:]/rr
RuntimeWarning: invalid value encountered in divide

我无法弄清楚,代码有什么问题?

I have to make a program using Euler’s method for the “ball in a spring” model

from pylab import*
from math import*
m=0.1
Lo=1
tt=30
k=200
t=20
g=9.81
dt=0.01
n=int((ceil(t/dt)))
km=k/m
r0=[-5,5*sqrt(3)]
v0=[-5,5*sqrt(3)]
a=zeros((n,2))
r=zeros((n,2))
v=zeros((n,2))
t=zeros((n,2))
r[1,:]=r0
v[1,:]=v0
for i in range(n-1):
    rr=dot(r[i,:],r[i,:])**0.5
    a=-g+km*cos(tt)*(rr-L0)*r[i,:]/rr
    v[i+1,:]=v[i,:]+a*dt
    r[i+1,:]=r[i,:]+v[i+1,:]*dt
    t[i+1]=t[i]+dt

    #print norm(r[i,:])

plot(r[:,0],r[:,1])
xlim(-100,100)
ylim(-100,100)
xlabel('x [m]')
ylabel('y [m]')

show()

I keep getting this error:

a=-g+km*cos(tt)*(rr-L0)*r[i,:]/rr
RuntimeWarning: invalid value encountered in divide

I can’t figure it out, what is wrong with the code?


回答 0

我认为您的代码正在尝试“除以零”或“除以NaN”。如果您知道并且不想让它困扰您,那么您可以尝试:

import numpy as np
np.seterr(divide='ignore', invalid='ignore')

有关更多详细信息,请参见:

I think your code is trying to “divide by zero” or “divide by NaN”. If you are aware of that and don’t want it to bother you, then you can try:

import numpy as np
np.seterr(divide='ignore', invalid='ignore')

For more details see:


回答 1

Python索引从0(而不是1)开始,因此您的赋值“ r [1 ,:] = r0”定义r的第二个(即索引1)元素,而第一个(索引0)元素保留为一对零。for循环中i的第一个值为0,因此rr获取r中第一个条目与自身的点积的平方根(即0),并且在下一行中用rr除以引发错误。

Python indexing starts at 0 (rather than 1), so your assignment “r[1,:] = r0” defines the second (i.e. index 1) element of r and leaves the first (index 0) element as a pair of zeros. The first value of i in your for loop is 0, so rr gets the square root of the dot product of the first entry in r with itself (which is 0), and the division by rr in the subsequent line throws the error.


回答 2

为了防止被零除,您可以在div0错误发生的地方预初始化输出“ out”,例如np.where,由于不考虑条件而对整行进行了评估,因此不会将其切掉。

预初始化的示例:

a = np.arange(10).reshape(2,5)
a[1,3] = 0
print(a)    #[[0 1 2 3 4], [5 6 7 0 9]]
a[0]/a[1]   # errors at 3/0
out = np.ones( (5) )  #preinit
np.divide(a[0],a[1], out=out, where=a[1]!=0) #only divide nonzeros else 1

To prevent division by zero you could pre-initialize the output ‘out’ where the div0 error happens, eg np.where does not cut it since the complete line is evaluated regardless of condition.

example with pre-initialization:

a = np.arange(10).reshape(2,5)
a[1,3] = 0
print(a)    #[[0 1 2 3 4], [5 6 7 0 9]]
a[0]/a[1]   # errors at 3/0
out = np.ones( (5) )  #preinit
np.divide(a[0],a[1], out=out, where=a[1]!=0) #only divide nonzeros else 1

回答 3

您正在除以rr0.0。检查是否rr为零,并做一些合理的事情,而不是在分母中使用它。

You are dividing by rr which may be 0.0. Check if rr is zero and do something reasonable other than using it in the denominator.


从Python迭代器获取最后一项的最干净方法

问题:从Python迭代器获取最后一项的最干净方法

从Python 2.6的迭代器中获取最后一项的最佳方法是什么?例如说

my_iter = iter(range(5))

什么是获得的最短码/干净的方式4my_iter

我可以做到这一点,但效率似乎并不高:

[x for x in my_iter][-1]

What’s the best way of getting the last item from an iterator in Python 2.6? For example, say

my_iter = iter(range(5))

What is the shortest-code / cleanest way of getting 4 from my_iter?

I could do this, but it doesn’t seem very efficient:

[x for x in my_iter][-1]

回答 0

item = defaultvalue
for item in my_iter:
    pass
item = defaultvalue
for item in my_iter:
    pass

回答 1

使用deque大小为1的。

from collections import deque

#aa is an interator
aa = iter('apple')

dd = deque(aa, maxlen=1)
last_element = dd.pop()

Use a deque of size 1.

from collections import deque

#aa is an interator
aa = iter('apple')

dd = deque(aa, maxlen=1)
last_element = dd.pop()

回答 2

如果您使用的是Python 3.x:

*_, last = iterator # for a better understanding check PEP 448
print(last)

如果您使用的是python 2.7:

last = next(iterator)
for last in iterator:
    continue
print last


边注:

通常情况下,上述解决方案介绍的是你需要正规的情况下什么,但如果你正在处理数据的数量较大,这是更有效地使用一个deque大小1(来源

from collections import deque

#aa is an interator
aa = iter('apple')

dd = deque(aa, maxlen=1)
last_element = dd.pop()

If you are using Python 3.x:

*_, last = iterator # for a better understanding check PEP 448
print(last)

if you are using python 2.7:

last = next(iterator)
for last in iterator:
    continue
print last


Side Note:

Usually, the solution presented above is the what you need for regular cases, but if you are dealing with a big amount of data, it’s more efficient to use a deque of size 1. (source)

from collections import deque

#aa is an interator
aa = iter('apple')

dd = deque(aa, maxlen=1)
last_element = dd.pop()

回答 3

__reversed__如果可用,可能值得使用

if hasattr(my_iter,'__reversed__'):
    last = next(reversed(my_iter))
else:
    for last in my_iter:
        pass

Probably worth using __reversed__ if it is available

if hasattr(my_iter,'__reversed__'):
    last = next(reversed(my_iter))
else:
    for last in my_iter:
        pass

回答 4

简单如:

max(enumerate(the_iter))[1]

As simple as:

max(enumerate(the_iter))[1]

回答 5

由于存在lambda,这不太可能比空的for循环快,但也许会给别人一个思路

reduce(lambda x,y:y,my_iter)

如果iter为空,则引发TypeError

This is unlikely to be faster than the empty for loop due to the lambda, but maybe it will give someone else an idea

reduce(lambda x,y:y,my_iter)

If the iter is empty, a TypeError is raised


回答 6

有这个

list( the_iter )[-1]

如果迭代的长度确实是史诗般的-如此之长以至于实现列表将耗尽内存-那么您确实需要重新考虑设计。

There’s this

list( the_iter )[-1]

If the length of the iteration is truly epic — so long that materializing the list will exhaust memory — then you really need to rethink the design.


回答 7

我会用 reversed,只是它只序列而不是迭代器,这似乎相当武断。

无论采用哪种方式,都必须遍历整个迭代器。以最高的效率,如果您不再需要迭代器,则可以废弃所有值:

for last in my_iter:
    pass
# last is now the last item

我认为这是次佳的解决方案。

I would use reversed, except that it only takes sequences instead of iterators, which seems rather arbitrary.

Any way you do it, you’ll have to run through the entire iterator. At maximum efficiency, if you don’t need the iterator ever again, you could just trash all the values:

for last in my_iter:
    pass
# last is now the last item

I think this is a sub-optimal solution, though.


回答 8

图尔茨库提供了一个很好的解决方案:

from toolz.itertoolz import last
last(values)

但是,仅在这种情况下,添加非核心依赖项可能并不值得。

The toolz library provides a nice solution:

from toolz.itertoolz import last
last(values)

But adding a non-core dependency might not be worth it for using it only in this case.


回答 9

参见以下代码以获取类似信息:

http://excamera.com/sphinx/article-islast.html

您可以使用它来拾取最后一个物品:

[(last, e) for (last, e) in islast(the_iter) if last]

See this code for something similar:

http://excamera.com/sphinx/article-islast.html

you might use it to pick up the last item with:

[(last, e) for (last, e) in islast(the_iter) if last]

回答 10

我只会用 next(reversed(myiter))

I would just use next(reversed(myiter))


回答 11

问题是关于获取迭代器的最后一个元素,但是如果您的迭代器是通过将条件应用于序列来创建的,则可以通过应用反向来查找反向序列的“第一个”,而只需查看所需的元素即可。与序列本身相反。

一个人为的例子

>>> seq = list(range(10))
>>> last_even = next(_ for _ in reversed(seq) if _ % 2 == 0)
>>> last_even
8

The question is about getting the last element of an iterator, but if your iterator is created by applying conditions to a sequence, then reversed can be used to find the “first” of a reversed sequence, only looking at the needed elements, by applying reverse to the sequence itself.

A contrived example,

>>> seq = list(range(10))
>>> last_even = next(_ for _ in reversed(seq) if _ % 2 == 0)
>>> last_even
8

回答 12

另外,对于无限迭代器,您可以使用:

from itertools import islice 
last = list(islice(iterator(), 1000))[-1] # where 1000 is number of samples 

我以为那会慢一点,deque但是它和循环方法一样快,而且实际上快得多(某种程度上)

Alternatively for infinite iterators you can use:

from itertools import islice 
last = list(islice(iterator(), 1000))[-1] # where 1000 is number of samples 

I thought it would be slower then deque but it’s as fast and it’s actually faster then for loop method ( somehow )


回答 13

这个问题是错误的,只能导致复杂而低效的答案。要获得迭代器,您当然要从可迭代的事物开始,这在大多数情况下将提供访问最后一个元素的更直接的方法。

从可迭代对象创建迭代器后,您就不得不遍历元素,因为这是可迭代对象唯一提供的内容。

因此,最有效,最清晰的方法不是首先创建迭代器,而是使用迭代器的本机访问方法。

The question is wrong and can only lead to an answer that is complicated and inefficient. To get an iterator, you of course start out from something that is iterable, which will in most cases offer a more direct way of accessing the last element.

Once you create an iterator from an iterable you are stuck in going through the elements, because that is the only thing an iterable provides.

So, the most efficient and clear way is not to create the iterator in the first place but to use the native access methods of the iterable.


点冻结与点列表

问题:点冻结与点列表

比较输出结果可发现差异:

user@user-VirtualBox:~$ pip list
feedparser (5.1.3)
pip (1.4.1)
setuptools (1.1.5)
wsgiref (0.1.2)
user@user-VirtualBox:~$ pip freeze
feedparser==5.1.3
wsgiref==0.1.2

Pip的文档状态

freeze                      Output installed packages in requirements format.
list                        List installed packages.

但是什么是“需求格式”?为什么pip list生成的清单比清单更全面pip freeze

A comparison of outputs reveals differences:

user@user-VirtualBox:~$ pip list
feedparser (5.1.3)
pip (1.4.1)
setuptools (1.1.5)
wsgiref (0.1.2)
user@user-VirtualBox:~$ pip freeze
feedparser==5.1.3
wsgiref==0.1.2

Pip’s documentation states

freeze                      Output installed packages in requirements format.
list                        List installed packages.

but what is “requirements format,” and why does pip list generate a more comprehensive list than pip freeze?


回答 0

使用时virtualenv,可以指定一个requirements.txt文件来安装所有依赖项。

典型用法:

$ pip install -r requirements.txt

软件包需要采用特定的格式pip才能理解,即

feedparser==5.1.3
wsgiref==0.1.2
django==1.4.2
...

那就是“要求格式”。

在这里,django==1.4.2意味着安装django版本1.4.2(即使最新版本是1.6.x)。如果您未指定==1.4.2,则会安装可用的最新版本。

您可以在“ Virtualenv和pip基础知识 ”以及官方的“ Requirements File Format ”文档中阅读更多内容。

When you are using a virtualenv, you can specify a requirements.txt file to install all the dependencies.

A typical usage:

$ pip install -r requirements.txt

The packages need to be in a specific format for pip to understand, which is

feedparser==5.1.3
wsgiref==0.1.2
django==1.4.2
...

That is the “requirements format”.

Here, django==1.4.2 implies install django version 1.4.2 (even though the latest is 1.6.x). If you do not specify ==1.4.2, the latest version available would be installed.

You can read more in “Virtualenv and pip Basics“, and the official “Requirements File Format” documentation.


回答 1

为了回答这个问题的第二部分,显示pip list但没有显示的两个软件包pip freezesetuptools(easy_install)和它们pip本身。

它看起来像pip freeze少了点列表软件包点子本身依赖。您可以使用该--all标志来显示那些软件包。

文档中

--all

不要在输出中跳过以下软件包:pip,setuptools,distribute,wheel

To answer the second part of this question, the two packages shown in pip list but not pip freeze are setuptools (which is easy_install) and pip itself.

It looks like pip freeze just doesn’t list packages that pip itself depends on. You may use the --all flag to show also those packages.

From the documentation:

--all

Do not skip these packages in the output: pip, setuptools, distribute, wheel


回答 2

主要区别在于可以将的输出pip freeze转储到requirements.txt文件中,并在以后用于重建“冻结”环境。

换句话说,您可以运行:pip freeze > frozen-requirements.txt在一台计算机上运行 ,然后再在另一台计算机上或干净的环境中运行: pip install -r frozen-requirements.txt 您将获得与安装原始环境时所安装的依赖项完全相同的相同环境生成了Frozen-requirements.txt。

The main difference is that the output of pip freeze can be dumped into a requirements.txt file and used later to re-construct the “frozen” environment.

In other words you can run: pip freeze > frozen-requirements.txt on one machine and then later on a different machine or on a clean environment you can do: pip install -r frozen-requirements.txt and you’ll get the an identical environment with the exact same dependencies installed as you had in the original environment where you generated the frozen-requirements.txt.


回答 3

查看pip文档,该文档将两者的功能描述为:

点列表

列出已安装的软件包,包括可编辑的软件包。

点冻结

以需求格式输出已安装的软件包。

因此有两个区别:

  1. 输出格式,freeze为我们提供了标准的需求格式,以后可以用来pip install -r安装需求。

  2. 输出内容,pip list包括pip freeze不包含的可编辑内容。

Look at the pip documentation, which describes the functionality of both as:

pip list

List installed packages, including editables.

pip freeze

Output installed packages in requirements format.

So there are two differences:

  1. Output format, freeze gives us the standard requirement format that may be used later with pip install -r to install requirements from.

  2. Output content, pip list include editables which pip freeze does not.


回答 4

pip list显示所有软件包。

pip freeze示出了包YOU经由安装pip(或pipenv如果使用该工具)在要求的格式命令。

在下面说明创建虚拟信封时已安装setuptoolspipwheelpipenv shell。这些软件包不是我使用pip以下软件安装的:

test1 % pipenv shell
Creating a virtualenv for this project
Pipfile: /Users/terrence/Development/Python/Projects/test1/Pipfile
Using /usr/local/Cellar/pipenv/2018.11.26_3/libexec/bin/python3.8 (3.8.1) to create virtualenv
 Creating virtual environment...
<SNIP>
Installing setuptools, pip, wheel...
done.
 Successfully created virtual environment! 
<SNIP>

现在回顾与比较,我已经只安装了相应的命令的输出冷LIBsampleproject(其中胡椒是一个依赖):

test1 % pip freeze       <== Packages I'VE installed w/ pip

-e git+https://github.com/gdamjan/hello-world-python-package.git@10<snip>71#egg=cool_lib
peppercorn==0.6
sampleproject==1.3.1


test1 % pip list         <== All packages, incl. ones I've NOT installed w/ pip

Package       Version Location                                                                    
------------- ------- --------------------------------------------------------------------------
cool-lib      0.1  /Users/terrence/.local/share/virtualenvs/test1-y2Zgz1D2/src/cool-lib           <== Installed w/ `pip` command
peppercorn    0.6       <== Dependency of "sampleproject"
pip           20.0.2  
sampleproject 1.3.1     <== Installed w/ `pip` command
setuptools    45.1.0  
wheel         0.34.2

pip list shows ALL installed packages.

pip freeze shows packages YOU installed via pip (or pipenv if using that tool) command in a requirements format.

Remark below that setuptools, pip, wheel are installed when pipenv shell creates my virtual envelope. These packages were NOT installed by me using pip:

test1 % pipenv shell
Creating a virtualenv for this project…
Pipfile: /Users/terrence/Development/Python/Projects/test1/Pipfile
Using /usr/local/Cellar/pipenv/2018.11.26_3/libexec/bin/python3.8 (3.8.1) to create virtualenv…
⠹ Creating virtual environment...
<SNIP>
Installing setuptools, pip, wheel...
done.
✔ Successfully created virtual environment! 
<SNIP>

Now review & compare the output of the respective commands where I’ve only installed cool-lib and sampleproject (of which peppercorn is a dependency):

test1 % pip freeze       <== Packages I'VE installed w/ pip

-e git+https://github.com/gdamjan/hello-world-python-package.git@10<snip>71#egg=cool_lib
peppercorn==0.6
sampleproject==1.3.1


test1 % pip list         <== All packages, incl. ones I've NOT installed w/ pip

Package       Version Location                                                                    
------------- ------- --------------------------------------------------------------------------
cool-lib      0.1  /Users/terrence/.local/share/virtualenvs/test1-y2Zgz1D2/src/cool-lib           <== Installed w/ `pip` command
peppercorn    0.6       <== Dependency of "sampleproject"
pip           20.0.2  
sampleproject 1.3.1     <== Installed w/ `pip` command
setuptools    45.1.0  
wheel         0.34.2

无效的http_host标头

问题:无效的http_host标头

我正在尝试使用Django框架开发网站,并使用DigitalOcean.com启动并将必要的文件部署到django-project中。

我必须将静态文件包含到Django项目中,并且在收集静态文件后,我尝试刷新我的ip

我包括了我用来创建网站的教程。 https://www.pythonprogramming.net/django-web-server-publish-tutorial/

我收到以下错误:

/无效的HTTP_HOST标头中的DisallowedHost:“ 198.211.99.20”。您可能需要将u’198.211.99.20’添加到ALLOWED_HOSTS。

有人可以帮我解决这个问题吗?这是我第一个使用Django框架的网站。

I am trying to develop a website using Django framework and launched using DigitalOcean.com and deployed the necessary files into django-project.

I had to include static files into Django-project and After collecting static files, I tried to refresh my ip

I am including the tutorials which I have used to create the website. https://www.pythonprogramming.net/django-web-server-publish-tutorial/

I am getting the following error :

DisallowedHost at / Invalid HTTP_HOST header: ‘198.211.99.20’. You may need to add u’198.211.99.20′ to ALLOWED_HOSTS.

Can somebody help me to fix this ? This is my first website using Django framework.


回答 0

错误日志很简单。如它的建议,您需要 在设置中添加198.211.99.20ALLOWED_HOSTS

在您的项目settings.py文件中,设置ALLOWED_HOSTS如下:

ALLOWED_HOSTS = ['198.211.99.20', 'localhost', '127.0.0.1']

如需进一步阅读,请从这里阅读

The error log is straightforward. As it suggested,You need to add 198.211.99.20 to your ALLOWED_HOSTS setting.

In your project settings.py file,set ALLOWED_HOSTS like this :

ALLOWED_HOSTS = ['198.211.99.20', 'localhost', '127.0.0.1']

For further reading read from here.


回答 1

settings.py

ALLOWED_HOSTS = ['*']

settings.py

ALLOWED_HOSTS = ['*']

pydot和graphviz错误:无法导入dot_parser,将无法加载点文件

问题:pydot和graphviz错误:无法导入dot_parser,将无法加载点文件

当我使用pydot运行非常简单的代码时

import pydot
graph = pydot.Dot(graph_type='graph')

for i in range(3):

  edge = pydot.Edge("king", "lord%d" % i)
  graph.add_edge(edge)

vassal_num = 0
for i in range(3):
  for j in range(2):
    edge = pydot.Edge("lord%d" % i, "vassal%d" % vassal_num)
    graph.add_edge(edge)
    vassal_num += 1

graph.write_png('example1_graph.png')

它向我显示错误消息:

Couldn't import dot_parser, loading of dot files will not be possible.

我正在使用python 2.7.3

When I run a very simple code with pydot

import pydot
graph = pydot.Dot(graph_type='graph')

for i in range(3):

  edge = pydot.Edge("king", "lord%d" % i)
  graph.add_edge(edge)

vassal_num = 0
for i in range(3):
  for j in range(2):
    edge = pydot.Edge("lord%d" % i, "vassal%d" % vassal_num)
    graph.add_edge(edge)
    vassal_num += 1

graph.write_png('example1_graph.png')

It prints me the error message:

Couldn't import dot_parser, loading of dot files will not be possible.

I’m using python 2.7.3


回答 0

回答pydot >= 1.1

(上游)的不兼容pydot已由6dff94b3f1修复,因此pydot >= 1.1将与兼容pyparsing >= 1.5.7


答案适用于pydot <= 1.0.28

对于遇到此问题的其他人,这是由于pyparsing从1.x到2.x版本的更改所致。要使用pip安装pydot,请先安装较早版本的pyparsing:

pip install pyparsing==1.5.7
pip install pydot==1.0.28

如果您不是pyparsing使用进行安装pip,而是使用进行了安装,请setup.py查看此解决方案以卸载该软件包。谢谢@qtips。

Answer for pydot >= 1.1:

The incompatibility of (upstream) pydot has been fixed by 6dff94b3f1, and thus pydot >= 1.1 will be compatible with pyparsing >= 1.5.7.


Answer applicable to pydot <= 1.0.28:

For anyone else who comes across this, it is due to the changes in pyparsing from 1.x to the 2.x release. To install pydot using pip, first install the older version of pyparsing:

pip install pyparsing==1.5.7
pip install pydot==1.0.28

If you did not install pyparsing using pip, but instead used setup.py, then have a look at this solution to uninstall the package. Thanks @qtips.


回答 1

pip存储库中有一个名为pydot2的新程序包,可与pyparsing2一起正常运行。我无法降级我的软件包,因为matplotlib依赖于较新的pyparsing软件包。

注意:来自Macports的python2.7

There is a new package in the pip repo called pydot2 that functions correctly with pyparsing2. I couldn’t downgrade my packages because matplotlib depends on the newer pyparsing package.

Note: python2.7 from macports


回答 2

pydot使用了来自pyparsing的私有模块变量(_noncomma)。下面的差异修复了它用于pyparsing 2.0.1的问题:

diff --git a/dot_parser.py b/dot_parser.py
index dedd61a..138d152 100644
--- a/dot_parser.py
+++ b/dot_parser.py
@@ -25,8 +25,9 @@ from pyparsing import __version__ as pyparsing_version
 from pyparsing import ( nestedExpr, Literal, CaselessLiteral, Word, Upcase, OneOrMore, ZeroOrMore,
     Forward, NotAny, delimitedList, oneOf, Group, Optional, Combine, alphas, nums,
     restOfLine, cStyleComment, nums, alphanums, printables, empty, quotedString,
-    ParseException, ParseResults, CharsNotIn, _noncomma, dblQuotedString, QuotedString, ParserElement )
+    ParseException, ParseResults, CharsNotIn, dblQuotedString, QuotedString, ParserElement )

+_noncomma = "".join( [ c for c in printables if c != "," ] )

 class P_AttrList:

pydot used a private module variable (_noncomma) from pyparsing. The below diff fixes it to use for pyparsing 2.0.1:

diff --git a/dot_parser.py b/dot_parser.py
index dedd61a..138d152 100644
--- a/dot_parser.py
+++ b/dot_parser.py
@@ -25,8 +25,9 @@ from pyparsing import __version__ as pyparsing_version
 from pyparsing import ( nestedExpr, Literal, CaselessLiteral, Word, Upcase, OneOrMore, ZeroOrMore,
     Forward, NotAny, delimitedList, oneOf, Group, Optional, Combine, alphas, nums,
     restOfLine, cStyleComment, nums, alphanums, printables, empty, quotedString,
-    ParseException, ParseResults, CharsNotIn, _noncomma, dblQuotedString, QuotedString, ParserElement )
+    ParseException, ParseResults, CharsNotIn, dblQuotedString, QuotedString, ParserElement )

+_noncomma = "".join( [ c for c in printables if c != "," ] )

 class P_AttrList:

回答 3

我分叉了pydot存储库[1],应用了Gabi Davar补丁和一些更改来支持python-3。该软件包可在PyPI [2]中找到。

干杯

I forked the pydot repository [1], applied the Gabi Davar patch and some changes to support python-3. The package is available in the PyPI [2].

Cheers


回答 4

$ sudo pip uninstall pydot

$ sudo pip install pydot2

请参阅以下链接:http : //infidea.net/troubleshooting-couldnt-import-dot_parser-loading-of-dot-files-will-not-be-possible/


回答 5

解决方案不是从某个地方安装pydot,而是从官方ubuntu存储库安装“ python-pydot”。

The solution was not to install pydot from somewhere, but “python-pydot” from official ubuntu repositories.


回答 6

现在至少有2个版本似乎支持PyParsing-2和Python-3:

There are now at least 2 more versions that appear to support PyParsing-2 and Python-3:


回答 7

我又遇到了问题,上述解决方案不起作用。如果这对您来说是正确的,并且您还正在Mac Cap和El Capitan上使用Anaconda,请尝试以下操作:

conda install --channel https://conda.anaconda.org/RMG graphviz`
conda install --channel https://conda.anaconda.org/RMG pydot

I had the problem again and my above solution did not work. If that is true for you and you are also using Anaconda on a Mac with El Capitan, try this:

conda install --channel https://conda.anaconda.org/RMG graphviz`
conda install --channel https://conda.anaconda.org/RMG pydot

回答 8

经过多次尝试后,最后我做了什么(伪序列使其适用于networkx):

apt-get remove python-pydot
pip install pydotplus
apt-get install libcgraph6
apt-get install python-pygraphviz


# pip freeze | grep pydot
 pydotplus==2.0.2
# pip freeze | grep pyparsing
pyparsing==2.2.0
# pip freeze | grep graphviz
pygraphviz==1.2
# python -c 'import pydotplus'
#

What I did at the end after so many tries from what i saw here (pseudo sequence for it to work for networkx ) :

apt-get remove python-pydot
pip install pydotplus
apt-get install libcgraph6
apt-get install python-pygraphviz


# pip freeze | grep pydot
 pydotplus==2.0.2
# pip freeze | grep pyparsing
pyparsing==2.2.0
# pip freeze | grep graphviz
pygraphviz==1.2
# python -c 'import pydotplus'
#

回答 9

这对我有用(Anaconda上具有Python 2.7.10的Mac OS X 10.9):

conda uninstall pydot

然后,

conda install pydot

然后在安装pydot之后将Pyparsing降级(从2.x降级到1.5.7)。未来的Google员工:这使我能够正确安装和导入Theano。

This worked for me (Mac OS X 10.9 with Python 2.7.10 on Anaconda):

conda uninstall pydot

Then,

conda install pydot

Pyparsing is then downgraded (from 2.x to 1.5.7) upon pydot’s installation. Future Googlers: this allowed me to install and import Theano correctly.


回答 10

在OSX Mavericks上,以下技巧可以解决问题:我遇到了相同的错误,但是在底部,还抱怨没有graphviz可执行文件…我认为问题是我在其他模块之前安装了graphviz吗?

brew uninstall graphviz
brew install graphviz

On OSX Mavericks the following did the trick… I got the same error but at the bottom there was also a complaint that the graphviz executable was not present… I think the problem was i had installed graphviz prior to the other modules?

brew uninstall graphviz
brew install graphviz

回答 11

当其他解决方案不起作用时,这是解决探针的一种快速而肮脏的方法:

这个例子来自Ubuntu 16.04上的python 2.7。

编辑文件python2.7 / site-packages / keras / utils / visualize_util.py并注释以下代码段。

if not pydot.find_graphviz():
    raise ImportError('Failed to import pydot. You must install pydot'
                      ' and graphviz for `pydotprint` to work.')

find_graphviz()在较新版本的pydot上是多余的,并且上述调用不起作用。

When other solutions do not work, this is a quick and dirty method to solve the probem:

This example is from python 2.7 on Ubuntu 16.04.

Edit the file python2.7/site-packages/keras/utils/visualize_util.py and comment the code segment below.

if not pydot.find_graphviz():
    raise ImportError('Failed to import pydot. You must install pydot'
                      ' and graphviz for `pydotprint` to work.')

find_graphviz() is redundant on newer versions of pydot, and the above call does not work.


回答 12

我也遇到了这个问题,而我的pydot == 1.0.28而pyparsing == 2.2.0。我通过从Google下载最新的pydot 1.2.3(tar.gz)解决了问题,然后离线安装了它。当我在ubuntu 14.04中更新pydot时,它说pydot 1.0.28是最新版本。因此,我从谷歌下载1.2.3版本。

I also met the problem and my pydot==1.0.28 while pyparsing==2.2.0. I fixed the problem by downloading the newest pydot 1.2.3(tar.gz)from google and then install it offline. When I updated the pydot in ubuntu 14.04, it said the pydot 1.0.28 is the newest version. Therefore I download from the google the 1.2.3 version.


回答 13

您需要将pyparsing从2.x版本降级到1.5.7,才能使pydot正常工作。

对于使用Conda的Win-64,这对我有用:

conda install -c https://conda.anaconda.org/Trentonoliphant pyparsing=1.5.7

然后,我禁用/卸载了2.x版本,并在脚本中重新加载了pyparsing:

pyparsing = reload(pyparsing)
pydot = reload(pydot)

要检查您是否运行了正确的版本:

print pyparsing.__version__

You need to downgrade pyparsing from version 2.x to version 1.5.7 to get pydot to work correctly.

For win-64, using Conda, this worked for me:

conda install -c https://conda.anaconda.org/Trentonoliphant pyparsing=1.5.7

I then disabled/uninstalled the 2.x version and reloaded pyparsing in my script:

pyparsing = reload(pyparsing)
pydot = reload(pydot)

To check whether you have the right version running:

print pyparsing.__version__

是否可以在外部范围而不是全局范围内的python中修改变量?

问题:是否可以在外部范围而不是全局范围内的python中修改变量?

给定以下代码:

def A() :
    b = 1

    def B() :
        # I can access 'b' from here.
        print( b )
        # But can i modify 'b' here? 'global' and assignment will not work.

    B()
A()

因为B()函数变量中的代码在b外部范围内,但不在全局范围内。是否可以bB()函数内修改变量?我当然可以从和阅读print(),但是如何修改?

Given following code:

def A() :
    b = 1

    def B() :
        # I can access 'b' from here.
        print( b )
        # But can i modify 'b' here? 'global' and assignment will not work.

    B()
A()

For the code in B() function variable b is in outer scope, but not in global scope. Is it possible to modify b variable from within B() function? Surely I can read it from here and print(), but how to modify it?


回答 0

Python 3.x具有nonlocal关键字。我认为这可以满足您的要求,但是我不确定您是在运行python 2还是3。

非本地语句使列出的标识符引用最近的封闭范围中的先前绑定的变量。这很重要,因为绑定的默认行为是首先搜索本地命名空间。该语句允许封装的代码在全局(模块)范围之外的本地范围之外重新绑定变量。

对于python 2,我通常只使用可变对象(如列表或dict),然后更改值而不是重新分配。

例:

def foo():
    a = []
    def bar():
        a.append(1)
    bar()
    bar()
    print a

foo()

输出:

[1, 1]

Python 3.x has the nonlocal keyword. I think this does what you want, but I’m not sure if you are running python 2 or 3.

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope. This is important because the default behavior for binding is to search the local namespace first. The statement allows encapsulated code to rebind variables outside of the local scope besides the global (module) scope.

For python 2, I usually just use a mutable object (like a list, or dict), and mutate the value instead of reassign.

example:

def foo():
    a = []
    def bar():
        a.append(1)
    bar()
    bar()
    print a

foo()

Outputs:

[1, 1]

回答 1

您可以使用空类来保存临时范围。就像是易变但更漂亮。

def outer_fn():
   class FnScope:
     b = 5
     c = 6
   def inner_fn():
      FnScope.b += 1
      FnScope.c += FnScope.b
   inner_fn()
   inner_fn()
   inner_fn()

这将产生以下交互式输出:

>>> outer_fn()
8 27
>>> fs = FnScope()
NameError: name 'FnScope' is not defined

You can use an empty class to hold a temporary scope. It’s like the mutable but a bit prettier.

def outer_fn():
   class FnScope:
     b = 5
     c = 6
   def inner_fn():
      FnScope.b += 1
      FnScope.c += FnScope.b
   inner_fn()
   inner_fn()
   inner_fn()

This yields the following interactive output:

>>> outer_fn()
8 27
>>> fs = FnScope()
NameError: name 'FnScope' is not defined

回答 2

我对Python有点陌生,但是我已经读了一些。我相信您将获得的最佳效果类似于Java解决方法,即将您的外部变量包装在列表中。

def A():
   b = [1]
   def B():
      b[0] = 2
   B()
   print(b[0])

# The output is '2'

编辑:我想这可能在Python 3之前是正确的。看起来nonlocal是您的答案。

I’m a little new to Python, but I’ve read a bit about this. I believe the best you’re going to get is similar to the Java work-around, which is to wrap your outer variable in a list.

def A():
   b = [1]
   def B():
      b[0] = 2
   B()
   print(b[0])

# The output is '2'

Edit: I guess this was probably true before Python 3. Looks like nonlocal is your answer.


回答 3

不,至少不能以这种方式。

因为“设置操作”将在当前范围内创建一个新名称,该名称将覆盖外部名称。

No you cannot, at least in this way.

Because the “set operation” will create a new name in the current scope, which covers the outer one.


回答 4

对于以后在更安全但更重的解决方法上进行研究的人来说。无需将变量作为参数传递。

def outer():
    a = [1]
    def inner(a=a):
        a[0] += 1
    inner()
    return a[0]

For anyone looking at this much later on a safer but heavier workaround is. Without a need to pass variables as parameters.

def outer():
    a = [1]
    def inner(a=a):
        a[0] += 1
    inner()
    return a[0]

回答 5

简短的答案将自动起作用

我创建了一个python库来解决此特定问题。它是在松散状态下发布的,因此您可以根据需要使用它。您可以在以下位置安装pip install seapie或查看主页https://github.com/hirsimaki-markus/SEAPIE

user@pc:home$ pip install seapie

from seapie import Seapie as seapie
def A():
    b = 1

    def B():
        seapie(1, "b=2")
        print(b)

    B()
A()

输出

2

这些参数具有以下含义:

  • 第一个参数是执行范围。0表示本地B(),1表示父级A(),2表示祖父母,<module>也就是全局
  • 第二个参数是您要在给定范围内执行的字符串或代码对象
  • 您也可以在程序内部不带参数的交互式shell来调用它

长答案

这更复杂。Seapie通过使用CPython API编辑调用堆栈中的帧来工作。CPython是事实上的标准,因此大多数人不必担心它。

如果您正在阅读此书,那么您很可能会喜欢这些魔术字:

frame = sys._getframe(1)          # 1 stands for previous frame
parent_locals = frame.f_locals    # true dictionary of parent locals
parent_globals = frame.f_globals  # true dictionary of parent globals

exec(codeblock, parent_globals, parent_locals)

ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(frame),ctypes.c_int(1))
# the magic value 1 stands for ability to introduce new variables. 0 for update-only

后者将迫使更新传递到本地范围。但是,局部范围的优化与全局范围的优化不同,因此,如果您未以任何方式对其进行初始化,则在尝试直接调用它们时,引入新对象会遇到一些问题。我将从github页面复制一些方法来规避这些问题

  • 预先关联,导入和定义对象
  • 事先将占位符关联到您的对象
  • 在主程序中将对象重新分配给自身以更新符号表:x = locals()[“ x”]
  • 在主程序中使用exec()而不是直接调用以避免优化。而不是调用x:exec(“ x”)

如果您觉得使用exec()不是您想要的东西,则可以通过更新真正的本地字典(而不是locals()返回的字典)来模仿行为。我将从https://faster-cpython.readthedocs.io/mutable.html复制一个示例

import sys
import ctypes

def hack():
    # Get the frame object of the caller
    frame = sys._getframe(1)
    frame.f_locals['x'] = "hack!"
    # Force an update of locals array from locals dict
    ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(frame),
                                          ctypes.c_int(0))

def func():
    x = 1
    hack()
    print(x)

func()

输出:

hack!

The short answer that will just work automagically

I created a python library for solving this specific problem. It is released under the unlisence so use it however you wish. You can install it with pip install seapie or check out the home page here https://github.com/hirsimaki-markus/SEAPIE

user@pc:home$ pip install seapie

from seapie import Seapie as seapie
def A():
    b = 1

    def B():
        seapie(1, "b=2")
        print(b)

    B()
A()

outputs

2

the arguments have following meaning:

  • The first argument is execution scope. 0 would mean local B(), 1 means parent A() and 2 would mean grandparent <module> aka global
  • The second argument is a string or code object you want to execute in the given scope
  • You can also call it without arguments for interactive shell inside your program

The long answer

This is more complicated. Seapie works by editing the frames in call stack using CPython api. CPython is the de facto standard so most people don’t have to worry about it.

The magic words you are probably most likely interesed in if you are reading this are the following:

frame = sys._getframe(1)          # 1 stands for previous frame
parent_locals = frame.f_locals    # true dictionary of parent locals
parent_globals = frame.f_globals  # true dictionary of parent globals

exec(codeblock, parent_globals, parent_locals)

ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(frame),ctypes.c_int(1))
# the magic value 1 stands for ability to introduce new variables. 0 for update-only

The latter will force updates to pass into local scope. local scopes are however optimized differently than global scope so intoducing new objects has some problems when you try to call them directly if they are not initialized in any way. I will copy few ways to circumvent these problems from the github page

  • Assingn, import and define your objects beforehand
  • Assingn placeholder to your objects beforehand
  • Reassign object to itself in main program to update symbol table: x = locals()[“x”]
  • Use exec() in main program instead of directly calling to avoid optimization. Instead of calling x do: exec(“x”)

If you are feeling that using exec() is not something you want to go with you can emulate the behaviour by updating the the true local dictionary (not the one returned by locals()). I will copy an example from https://faster-cpython.readthedocs.io/mutable.html

import sys
import ctypes

def hack():
    # Get the frame object of the caller
    frame = sys._getframe(1)
    frame.f_locals['x'] = "hack!"
    # Force an update of locals array from locals dict
    ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(frame),
                                          ctypes.c_int(0))

def func():
    x = 1
    hack()
    print(x)

func()

Output:

hack!

回答 6

我认为您不应该这样做。可以在其封闭的上下文中更改事物的功能非常危险,因为在不了解该功能的情况下可能会编写该上下文。

您可以通过在类中将B设置为公共方法,将C设置为私有方法(可能是最好的方法)来使其明确。或通过使用可变类型(例如列表)并将其显式传递给C:

def A():
    x = [0]
    def B(var): 
        var[0] = 1
    B(x)
    print x

A()

I don’t think you should want to do this. Functions that can alter things in their enclosing context are dangerous, as that context may be written without the knowledge of the function.

You could make it explicit, either by making B a public method and C a private method in a class (the best way probably); or by using a mutable type such as a list and passing it explicitly to C:

def A():
    x = [0]
    def B(var): 
        var[0] = 1
    B(x)
    print x

A()

回答 7

可以,但是您必须使用全局语句(使用全局变量时,并不是一如既往的很好的解决方案,但是它可以工作):

def A():
    global b
    b = 1

    def B():
      global b
      print( b )
      b = 2

    B()
A()

You can, but you’ll have to use the global statment (not a really good solution as always when using global variables, but it works):

def A():
    global b
    b = 1

    def B():
      global b
      print( b )
      b = 2

    B()
A()

回答 8

我不知道是否有一个功能的属性 __dict__当该外部空间不是全局空间==模块时,的外部空间,当函数是嵌套函数时就是这种情况,在Python 3中

但是据我所知,在Python 2中没有这样的属性。

因此,做您想做的事的唯一可能性是:

1)使用别人所说的可变对象

2)

def A() :
    b = 1
    print 'b before B() ==', b

    def B() :
        b = 10
        print 'b ==', b
        return b

    b = B()
    print 'b after B() ==', b

A()

结果

b before B() == 1
b == 10
b after B() == 10

诺塔

CédricJulien的解决方案有一个缺点:

def A() :
    global b # N1
    b = 1
    print '   b in function B before executing C() :', b

    def B() :
        global b # N2
        print '     b in function B before assigning b = 2 :', b
        b = 2
        print '     b in function B after  assigning b = 2 :', b

    B()
    print '   b in function A , after execution of B()', b

b = 450
print 'global b , before execution of A() :', b
A()
print 'global b , after execution of A() :', b

结果

global b , before execution of A() : 450
   b in function B before executing B() : 1
     b in function B before assigning b = 2 : 1
     b in function B after  assigning b = 2 : 2
   b in function A , after execution of B() 2
global b , after execution of A() : 2

执行后的全局bA()已被修改,因此它可能不适用

只有在全局命名空间中存在带有标识符b的对象时,情况才如此

I don’t know if there is an attribute of a function that gives the __dict__ of the outer space of the function when this outer space isn’t the global space == the module, which is the case when the function is a nested function, in Python 3.

But in Python 2, as far as I know, there isn’t such an attribute.

So the only possibilities to do what you want is:

1) using a mutable object, as said by others

2)

def A() :
    b = 1
    print 'b before B() ==', b

    def B() :
        b = 10
        print 'b ==', b
        return b

    b = B()
    print 'b after B() ==', b

A()

result

b before B() == 1
b == 10
b after B() == 10

.

Nota

The solution of Cédric Julien has a drawback:

def A() :
    global b # N1
    b = 1
    print '   b in function B before executing C() :', b

    def B() :
        global b # N2
        print '     b in function B before assigning b = 2 :', b
        b = 2
        print '     b in function B after  assigning b = 2 :', b

    B()
    print '   b in function A , after execution of B()', b

b = 450
print 'global b , before execution of A() :', b
A()
print 'global b , after execution of A() :', b

result

global b , before execution of A() : 450
   b in function B before executing B() : 1
     b in function B before assigning b = 2 : 1
     b in function B after  assigning b = 2 : 2
   b in function A , after execution of B() 2
global b , after execution of A() : 2

The global b after execution of A() has been modified and it may be not whished so

That’s the case only if there is an object with identifier b in the global namespace


读取巨大的.csv文件

问题:读取巨大的.csv文件

我目前正在尝试从Python 2.7中的.csv文件读取数据,该文件最多包含100万行和200列(文件范围从100mb到1.6gb)。对于少于300,000行的文件,我可以(非常缓慢地)执行此操作,但是一旦超过该行,就会出现内存错误。我的代码如下所示:

def getdata(filename, criteria):
    data=[]
    for criterion in criteria:
        data.append(getstuff(filename, criteron))
    return data

def getstuff(filename, criterion):
    import csv
    data=[]
    with open(filename, "rb") as csvfile:
        datareader=csv.reader(csvfile)
        for row in datareader: 
            if row[3]=="column header":
                data.append(row)
            elif len(data)<2 and row[3]!=criterion:
                pass
            elif row[3]==criterion:
                data.append(row)
            else:
                return data

在getstuff函数中使用else子句的原因是,所有符合条件的元素都将一起列在csv文件中,因此当我经过它们时,为了节省时间,我离开了循环。

我的问题是:

  1. 我如何设法使其与较大的文件一起使用?

  2. 有什么办法可以使它更快?

我的计算机具有8gb RAM,运行64位Windows 7,处理器为3.40 GHz(不确定您需要什么信息)。

I’m currently trying to read data from .csv files in Python 2.7 with up to 1 million rows, and 200 columns (files range from 100mb to 1.6gb). I can do this (very slowly) for the files with under 300,000 rows, but once I go above that I get memory errors. My code looks like this:

def getdata(filename, criteria):
    data=[]
    for criterion in criteria:
        data.append(getstuff(filename, criteron))
    return data

def getstuff(filename, criterion):
    import csv
    data=[]
    with open(filename, "rb") as csvfile:
        datareader=csv.reader(csvfile)
        for row in datareader: 
            if row[3]=="column header":
                data.append(row)
            elif len(data)<2 and row[3]!=criterion:
                pass
            elif row[3]==criterion:
                data.append(row)
            else:
                return data

The reason for the else clause in the getstuff function is that all the elements which fit the criterion will be listed together in the csv file, so I leave the loop when I get past them to save time.

My questions are:

  1. How can I manage to get this to work with the bigger files?

  2. Is there any way I can make it faster?

My computer has 8gb RAM, running 64bit Windows 7, and the processor is 3.40 GHz (not certain what information you need).


回答 0

您正在将所有行读入列表,然后处理该列表。不要那样做

在生成行时对其进行处理。如果需要先过滤数据,请使用生成器函数:

import csv

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        count = 0
        for row in datareader:
            if row[3] == criterion:
                yield row
                count += 1
            elif count:
                # done when having read a consecutive series of rows 
                return

我还简化了您的过滤器测试;逻辑相同,但更为简洁。

因为只匹配与条件匹配的单个行序列,所以还可以使用:

import csv
from itertools import dropwhile, takewhile

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        # first row, plus any subsequent rows that match, then stop
        # reading altogether
        # Python 2: use `for row in takewhile(...): yield row` instead
        # instead of `yield from takewhile(...)`.
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))
        return

您现在可以getstuff()直接循环。在getdata()

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

现在直接getdata()在您的代码中循环:

for row in getdata(somefilename, sequence_of_criteria):
    # process row

现在,您仅在内存中保留一行,而不是每个条件存储数千行。

yield使函数成为生成器函数,这意味着直到开始循环它之前,它不会做任何工作。

You are reading all rows into a list, then processing that list. Don’t do that.

Process your rows as you produce them. If you need to filter the data first, use a generator function:

import csv

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        count = 0
        for row in datareader:
            if row[3] == criterion:
                yield row
                count += 1
            elif count:
                # done when having read a consecutive series of rows 
                return

I also simplified your filter test; the logic is the same but more concise.

Because you are only matching a single sequence of rows matching the criterion, you could also use:

import csv
from itertools import dropwhile, takewhile

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        # first row, plus any subsequent rows that match, then stop
        # reading altogether
        # Python 2: use `for row in takewhile(...): yield row` instead
        # instead of `yield from takewhile(...)`.
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))
        return

You can now loop over getstuff() directly. Do the same in getdata():

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

Now loop directly over getdata() in your code:

for row in getdata(somefilename, sequence_of_criteria):
    # process row

You now only hold one row in memory, instead of your thousands of lines per criterion.

yield makes a function a generator function, which means it won’t do any work until you start looping over it.


回答 1

尽管Martijin的答案是最好的。这是为初学者处理大型csv文件的更直观的方法。这使您可以一次处理一组行或块。

import pandas as pd
chunksize = 10 ** 8
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

Although Martijin’s answer is prob best. Here is a more intuitive way to process large csv files for beginners. This allows you to process groups of rows, or chunks, at a time.

import pandas as pd
chunksize = 10 ** 8
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

回答 2

我进行了大量的振动分析,并研究了大型数据集(数以亿计的点)。我的测试显示pandas.read_csv()函数比numpy.genfromtxt()快20倍。genfromtxt()函数比numpy.loadtxt()快3倍。似乎您需要大数据集的熊猫。

我在博客上讨论了用于测试的代码和数据集,该博客讨论了MATLAB vs Python进行振动分析

I do a fair amount of vibration analysis and look at large data sets (tens and hundreds of millions of points). My testing showed the pandas.read_csv() function to be 20 times faster than numpy.genfromtxt(). And the genfromtxt() function is 3 times faster than the numpy.loadtxt(). It seems that you need pandas for large data sets.

I posted the code and data sets I used in this testing on a blog discussing MATLAB vs Python for vibration analysis.


回答 3

对我有用的是而且超快速的是

import pandas as pd
import dask.dataframe as dd
import time
t=time.clock()
df_train = dd.read_csv('../data/train.csv', usecols=[col1, col2])
df_train=df_train.compute()
print("load train: " , time.clock()-t)

另一个可行的解决方案是:

import pandas as pd 
from tqdm import tqdm

PATH = '../data/train.csv'
chunksize = 500000 
traintypes = {
'col1':'category',
'col2':'str'}

cols = list(traintypes.keys())

df_list = [] # list to hold the batch dataframe

for df_chunk in tqdm(pd.read_csv(PATH, usecols=cols, dtype=traintypes, chunksize=chunksize)):
    # Can process each chunk of dataframe here
    # clean_data(), feature_engineer(),fit()

    # Alternatively, append the chunk to list and merge all
    df_list.append(df_chunk) 

# Merge all dataframes into one dataframe
X = pd.concat(df_list)

# Delete the dataframe list to release memory
del df_list
del df_chunk

what worked for me was and is superfast is

import pandas as pd
import dask.dataframe as dd
import time
t=time.clock()
df_train = dd.read_csv('../data/train.csv', usecols=[col1, col2])
df_train=df_train.compute()
print("load train: " , time.clock()-t)

Another working solution is:

import pandas as pd 
from tqdm import tqdm

PATH = '../data/train.csv'
chunksize = 500000 
traintypes = {
'col1':'category',
'col2':'str'}

cols = list(traintypes.keys())

df_list = [] # list to hold the batch dataframe

for df_chunk in tqdm(pd.read_csv(PATH, usecols=cols, dtype=traintypes, chunksize=chunksize)):
    # Can process each chunk of dataframe here
    # clean_data(), feature_engineer(),fit()

    # Alternatively, append the chunk to list and merge all
    df_list.append(df_chunk) 

# Merge all dataframes into one dataframe
X = pd.concat(df_list)

# Delete the dataframe list to release memory
del df_list
del df_chunk

回答 4

对于着陆这个问题的人。将熊猫与’ chunksize ‘和’ usecols ‘ 一起使用,比其他建议的选项更快地读取了一个巨大的zip文件。

import pandas as pd

sample_cols_to_keep =['col_1', 'col_2', 'col_3', 'col_4','col_5']

# First setup dataframe iterator, ‘usecols’ parameter filters the columns, and 'chunksize' sets the number of rows per chunk in the csv. (you can change these parameters as you wish)
df_iter = pd.read_csv('../data/huge_csv_file.csv.gz', compression='gzip', chunksize=20000, usecols=sample_cols_to_keep) 

# this list will store the filtered dataframes for later concatenation 
df_lst = [] 

# Iterate over the file based on the criteria and append to the list
for df_ in df_iter: 
        tmp_df = (df_.rename(columns={col: col.lower() for col in df_.columns}) # filter eg. rows where 'col_1' value grater than one
                                  .pipe(lambda x:  x[x.col_1 > 0] ))
        df_lst += [tmp_df.copy()] 

# And finally combine filtered df_lst into the final lareger output say 'df_final' dataframe 
df_final = pd.concat(df_lst)

For someone who lands to this question. Using pandas with ‘chunksize’ and ‘usecols’ helped me to read a huge zip file faster than the other proposed options.

import pandas as pd

sample_cols_to_keep =['col_1', 'col_2', 'col_3', 'col_4','col_5']

# First setup dataframe iterator, ‘usecols’ parameter filters the columns, and 'chunksize' sets the number of rows per chunk in the csv. (you can change these parameters as you wish)
df_iter = pd.read_csv('../data/huge_csv_file.csv.gz', compression='gzip', chunksize=20000, usecols=sample_cols_to_keep) 

# this list will store the filtered dataframes for later concatenation 
df_lst = [] 

# Iterate over the file based on the criteria and append to the list
for df_ in df_iter: 
        tmp_df = (df_.rename(columns={col: col.lower() for col in df_.columns}) # filter eg. rows where 'col_1' value grater than one
                                  .pipe(lambda x:  x[x.col_1 > 0] ))
        df_lst += [tmp_df.copy()] 

# And finally combine filtered df_lst into the final lareger output say 'df_final' dataframe 
df_final = pd.concat(df_lst)

回答 5

这是Python3的另一个解决方案:

import csv
with open(filename, "r") as csvfile:
    datareader = csv.reader(csvfile)
    count = 0
    for row in datareader:
        if row[3] in ("column header", criterion):
            doSomething(row)
            count += 1
        elif count > 2:
            break

datareader是一个生成器函数。

here’s another solution for Python3:

import csv
with open(filename, "r") as csvfile:
    datareader = csv.reader(csvfile)
    count = 0
    for row in datareader:
        if row[3] in ("column header", criterion):
            doSomething(row)
            count += 1
        elif count > 2:
            break

here datareader is a generator function.


回答 6

如果您使用的是熊猫并且有很多RAM(足以将整个文件读入内存),请尝试使用pd.read_csvwith low_memory=False,例如:

import pandas as pd
data = pd.read_csv('file.csv', low_memory=False)

If you are using pandas and have lots of RAM (enough to read the whole file into memory) try using pd.read_csv with low_memory=False, e.g.:

import pandas as pd
data = pd.read_csv('file.csv', low_memory=False)

使用Python 2.7.3在64位Windows 7上安装Numpy

问题:使用Python 2.7.3在64位Windows 7上安装Numpy

看起来Numpy的唯一64位Windows安装程序适用于Numpy版本1.3.0,仅适用于Python 2.6

http://sourceforge.net/projects/numpy/files/NumPy/

我不得不回滚到Python 2.6才能在Windows上使用Numpy,这让我感到很奇怪,这让我觉得我缺少了一些东西。

是吗

It looks like the only 64 bit windows installer for Numpy is for Numpy version 1.3.0 which only works with Python 2.6

http://sourceforge.net/projects/numpy/files/NumPy/

It strikes me as strange that I would have to roll back to Python 2.6 to use Numpy on Windows, which makes me think I’m missing something.

Am I?


回答 0

在此站点中尝试(非官方)二进制文件:

http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

numpy无论有没有针对Python 2.7或Python 3的Intel MKL库,您都可以获取最新的x64。

Try the (unofficial) binaries in this site:

http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy

You can get the newest numpy x64 with or without Intel MKL libs for Python 2.7 or Python 3.


回答 1

假设您的计算机上安装了python 2.7 64bit,并且已经从此处下载了numpy ,请按照以下步骤操作(numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl适当更改)。

  1. get-pip下载(通过右键单击并“保存目标”)到本地驱动器。

  2. 在命令提示,导航到包含目录get-pip.py和运行

    python get-pip.py

    这在创建的文件C:\Python27\Scripts,包括pip2pip2.7pip

  3. 将下载的文件复制numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl到上述目录(C:\Python27\Scripts

  4. 仍然在命令提示符下,导航到以上目录并运行:

    pip2.7.exe install "numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl"

Assuming you have python 2.7 64bit on your computer and have downloaded numpy from here, follow the steps below (changing numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl as appropriate).

  1. Download (by right click and “save target”) get-pip to local drive.

  2. At the command prompt, navigate to the directory containing get-pip.py and run

    python get-pip.py

    which creates files in C:\Python27\Scripts, including pip2, pip2.7 and pip.

  3. Copy the downloaded numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl into the above directory (C:\Python27\Scripts)

  4. Still at the command prompt, navigate to the above directory and run:

    pip2.7.exe install "numpy‑1.9.2+mkl‑cp27‑none‑win_amd64.whl"


回答 2

http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy下载numpy-1.9.2 + mkl-cp27-none-win32.whl

将文件复制到C:\ Python27 \ Scripts

从上面的位置运行cmd并输入

pip install numpy-1.9.2+mkl-cp27-none-win32.whl

您有望获得以下输出:

Processing c:\python27\scripts\numpy-1.9.2+mkl-cp27-none-win32.whl
Installing collected packages: numpy
Successfully installed numpy-1.9.2

希望对您有用。

编辑1
添加@oneleggedmule的建议:

您也可以在cmd中运行以下命令:

pip2.7 install numpy-1.9.2+mkl-cp27-none-win_amd64.whl

基本上,单独编写点子也可以很好地工作(与原始答案一样)。为了清晰或明确说明,还可以编写2.7版。

Download numpy-1.9.2+mkl-cp27-none-win32.whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy .

Copy the file to C:\Python27\Scripts

Run cmd from the above location and type

pip install numpy-1.9.2+mkl-cp27-none-win32.whl

You will hopefully get the below output:

Processing c:\python27\scripts\numpy-1.9.2+mkl-cp27-none-win32.whl
Installing collected packages: numpy
Successfully installed numpy-1.9.2

Hope that works for you.

EDIT 1
Adding @oneleggedmule ‘s suggestion:

You can also run the following command in the cmd:

pip2.7 install numpy-1.9.2+mkl-cp27-none-win_amd64.whl

Basically, writing pip alone also works perfectly (as in the original answer). Writing the version 2.7 can also be done for the sake of clarity or specification.


回答 3

(非正式)二进制文件(http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy) 为我工作。
我尝试过Mingw,Cygwin,但由于各种原因都失败了。我使用的是Windows 7 Enterprise(64位)。

The (unofficial) binaries (http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy) worked for me.
I’ve tried Mingw, Cygwin, all failed due to varies reasons. I am on Windows 7 Enterprise, 64bit.


回答 4

您也可以尝试这样做,Python http://continuum.io/downloads

但是您需要修改环境变量PATH,以使anaconda文件夹位于原始Python文件夹之前。

You may also try this, anaconda http://continuum.io/downloads

But you need to modify your environment variable PATH, so that the anaconda folder is before the original Python folder.


回答 5

并非不可能,程序员在Windows上寻找python,也使用适用于Visual Studio的Python工具。在这种情况下,可以利用附带的“ Python环境”窗口轻松安装其他软件包。默认情况下,在窗口中选择“概述”。您可以在那里选择“点子”。

然后,您可以通过在seach窗口中输入numpy来安装numpy,而无需进行其他工作。已经建议使用核心响应的“安装numpy”指令。

不过,一开始我有2个容易解决的问题:

  • “错误:无法找到vcvarsall.bat”:此问题已在此处解决。尽管我当时没有找到它,而是安装了PythonC ++编译器
  • 然后,安装继续,但由于其他内部异常而失败。安装.NET 3.5可以解决此问题。

最终安装完成。这花了一些时间(5分钟),所以不要提早取消该过程。

It is not improbable, that programmers looking for python on windows, also use the Python Tools for Visual Studio. In this case it is easy to install additional packages, by taking advantage of the included “Python Environment” Window. “Overview” is selected within the window as default. You can select “Pip” there.

Then you can install numpy without additional work by entering numpy into the seach window. The coresponding “install numpy” instruction is already suggested.

Nevertheless I had 2 easy to solve Problems in the beginning:

  • “error: Unable to find vcvarsall.bat”: This problem has been solved here. Although I did not find it at that time and instead installed the C++ Compiler for Python.
  • Then the installation continued but failed because of an additional inner exception. Installing .NET 3.5 solved this.

Finally the installation was done. It took some time (5 minutes), so don’t cancel the process to early.


如何使用Python + Selenium WebDriver保存和加载Cookie

问题:如何使用Python + Selenium WebDriver保存和加载Cookie

如何将Python的Selenium WebDriver中的所有cookie保存到txt文件,然后稍后加载?该文档没有对getCookies函数做太多说明。

How can I save all cookies in Python’s Selenium WebDriver to a txt-file, then load them later? The documentation doesn’t say much of anything about the getCookies function.


回答 0

您可以使用pickle将当前cookie保存为python对象。例如:

import pickle
import selenium.webdriver 

driver = selenium.webdriver.Firefox()
driver.get("http://www.google.com")
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))

然后再将它们添加回去:

import pickle
import selenium.webdriver 

driver = selenium.webdriver.Firefox()
driver.get("http://www.google.com")
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)

You can save the current cookies as a python object using pickle. For example:

import pickle
import selenium.webdriver 

driver = selenium.webdriver.Firefox()
driver.get("http://www.google.com")
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb"))

and later to add them back:

import pickle
import selenium.webdriver 

driver = selenium.webdriver.Firefox()
driver.get("http://www.google.com")
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
    driver.add_cookie(cookie)

回答 1

当您需要从一个会话到另一个会话的cookie时,还有另一种方法可以使用Chrome选项user-data-dir,以便将文件夹用作配置文件,我运行:

chrome_options = Options()
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("www.google.com")

您可以在此处执行检查人机交互的登录,然后执行此操作,然后每次使用该文件夹启动Webdriver时都需要我现在需要的cookie。您还可以手动安装扩展,并在每个会话中使用它们。在我运行的第二个时间里,所有的cookie都在那里:

chrome_options = Options()
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("www.google.com") #Now you can see  the cookies, the settings, extensions, etc, and the logins done in the previous session are present here. 

好处是您可以使用具有不同设置和Cookie的多个文件夹,无需加载,卸载Cookie,安装和卸载扩展程序,更改设置,通过代码更改登录名的扩展程序,因此无法中断程序的逻辑,等等这比通过代码完成所有操作要快。

When you need cookies from session to session there is another way to do it, use the Chrome options user-data-dir in order to use folders as profiles, I run:

#you need to: from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("www.google.com")

You can do here the logins that check for human interaction, I do this and then the cookies I need now every-time I start the Webdriver with that folder everything is in there. You can also manually install the Extensions and have them in every session. Secon time I run, all the cookies are there:

#you need to: from selenium.webdriver.chrome.options import Options    
chrome_options = Options()
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("www.google.com") #Now you can see  the cookies, the settings, extensions, etc, and the logins done in the previous session are present here. 

The advantage is you can use multiple folders with different settings and cookies, Extensions without the need to load, unload cookies, install and uninstall Extensions, change settings, change logins via code, and thus no way to have the logic of the program break, etc Also this is faster than havin to do it all by code.


回答 2

请记住,您只能为当前域添加cookie。如果您想为您的Google帐户添加Cookie,请执行

browser.get('http://google.com')
for cookie in cookies:
    browser.add_cookie(cookie)

Remember, you can only a add cookie for the current domain. If you wanna add a cookie for your Google account, do

browser.get('http://google.com')
for cookie in cookies:
    browser.add_cookie(cookie)

回答 3

基于@Eduard Florinescu的回答,但添加了更新的代码和缺少的导入:

$ cat work-auth.py 
#!/usr/bin/python3

# Setup:
# sudo apt-get install chromium-chromedriver
# sudo -H python3 -m pip install selenium

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('/usr/bin/chromedriver',options=chrome_options)
chrome_options.add_argument("user-data-dir=chrome-data") 
driver.get('https://www.somedomainthatrequireslogin.com')
time.sleep(30)  # Time to enter credentials
driver.quit()

$ cat work.py 
#!/usr/bin/python3

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('/usr/bin/chromedriver',options=chrome_options)
driver.get('https://www.somedomainthatrequireslogin.com')  # Already authenticated
time.sleep(10)
driver.quit()

Based on answer by @Eduard Florinescu but with newer code and missing import added:

$ cat work-auth.py 
#!/usr/bin/python3

# Setup:
# sudo apt-get install chromium-chromedriver
# sudo -H python3 -m pip install selenium

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('/usr/bin/chromedriver',options=chrome_options)
chrome_options.add_argument("user-data-dir=chrome-data") 
driver.get('https://www.somedomainthatrequireslogin.com')
time.sleep(30)  # Time to enter credentials
driver.quit()

$ cat work.py 
#!/usr/bin/python3

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('/usr/bin/chromedriver',options=chrome_options)
driver.get('https://www.somedomainthatrequireslogin.com')  # Already authenticated
time.sleep(10)
driver.quit()

回答 4

@Roel Van de Paar编写的代码仅作了少许修改,所有功劳都归功于他。我在Windows中使用它,并且在设置和添加Cookie方面都运行良好:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('chromedriver.exe',options=chrome_options)
driver.get('https://web.whatsapp.com')  # Already authenticated
time.sleep(30)

Just a slight modification for the code written by @Roel Van de Paar, as all credit goes to him. I am using this in Windows and it is working perfectly, both for setting and adding cookies:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--user-data-dir=chrome-data")
driver = webdriver.Chrome('chromedriver.exe',options=chrome_options)
driver.get('https://web.whatsapp.com')  # Already authenticated
time.sleep(30)

回答 5

这是我在Windows中使用的代码,它有效。

 for item in COOKIES.split(';'):
            name,value = item.split('=',1)
            name=name.replace(' ','').replace('\r','').replace('\n','')
            value = value.replace(' ','').replace('\r','').replace('\n','')
            cookie_dict={  
                    'name':name,
                    'value':value,
                    "domain": "",  # google chrome
                    "expires": "",
                    'path': '/',
                    'httpOnly': False,
                    'HostOnly': False,
                    'Secure': False
                    }
            self.driver_.add_cookie(cookie_dict)

this is code I used in windows, It works.

 for item in COOKIES.split(';'):
            name,value = item.split('=',1)
            name=name.replace(' ','').replace('\r','').replace('\n','')
            value = value.replace(' ','').replace('\r','').replace('\n','')
            cookie_dict={  
                    'name':name,
                    'value':value,
                    "domain": "",  # google chrome
                    "expires": "",
                    'path': '/',
                    'httpOnly': False,
                    'HostOnly': False,
                    'Secure': False
                    }
            self.driver_.add_cookie(cookie_dict)

回答 6

我的操作系统是Windows 10,Chrome版本是75.0.3770.100。我已经尝试过’user-data-dir’解决方案,但是没有用。尝试@ Eric Klien的解决方案也失败。最后,我将chrome设置设置为图片,它可以工作!但是在Windows Server 2012上却不起作用。

设置

my os is Windows 10, and the chrome version is 75.0.3770.100. I have tried the ‘user-data-dir’ solution, didn’t work. try the solution of @ Eric Klien fails too. finally, I make the chrome setting like the picture, it works!but it didn’t work on windows server 2012.

setting