分类目录归档:知识问答

如何在同一目录或子目录中导入类?

问题:如何在同一目录或子目录中导入类?

我有一个存储所有.py文件的目录。

bin/
   main.py
   user.py # where class User resides
   dir.py # where class Dir resides

我想从使用类user.pydir.pymain.py
如何将这些Python类导入main.py
此外,User如果user.py位于子目录中,如何导入类?

bin/
    dir.py
    main.py
    usr/
        user.py

I have a directory that stores all the .py files.

bin/
   main.py
   user.py # where class User resides
   dir.py # where class Dir resides

I want to use classes from user.py and dir.py in main.py.
How can I import these Python classes into main.py?
Furthermore, how can I import class User if user.py is in a sub directory?

bin/
    dir.py
    main.py
    usr/
        user.py

回答 0

Python 2

在与文件__init__.py相同的目录中创建一个名为的空文件。这将向Python表示“可以从此目录导入”。

然后做…

from user import User
from dir import Dir

如果文件位于子目录中,则同样适用-也将一个__init__.py放在子目录中,然后使用带点标记的常规import语句。对于每个级别的目录,您都需要添加到导入路径。

bin/
    main.py
    classes/
        user.py
        dir.py

因此,如果目录被命名为“ classes”,则可以这样做:

from classes.user import User
from classes.dir import Dir

Python 3

与上一个相同,但是.如果不使用子目录,则在模块名称前添加一个:

from .user import User
from .dir import Dir

Python 2

Make an empty file called __init__.py in the same directory as the files. That will signify to Python that it’s “ok to import from this directory”.

Then just do…

from user import User
from dir import Dir

The same holds true if the files are in a subdirectory – put an __init__.py in the subdirectory as well, and then use regular import statements, with dot notation. For each level of directory, you need to add to the import path.

bin/
    main.py
    classes/
        user.py
        dir.py

So if the directory was named “classes”, then you’d do this:

from classes.user import User
from classes.dir import Dir

Python 3

Same as previous, but prefix the module name with a . if not using a subdirectory:

from .user import User
from .dir import Dir

回答 1

我刚刚了解到(感谢martineau的评论),为了从同一目录中的文件导入类,您现在应该使用Python 3编写代码:

from .user import User
from .dir import Dir

I just learned (thanks to martineau’s comment) that, in order to import classes from files within the same directory, you would now write in Python 3:

from .user import User
from .dir import Dir

回答 2

在您的main.py

from user import Class

Class您要导入的类的名称在哪里。

如果要调用的方法Class,则可以使用以下方法调用它:

Class.method

请注意,__init__.py同一目录中应该有一个空文件。

In your main.py:

from user import Class

where Class is the name of the class you want to import.

If you want to call a method of Class, you can call it using:

Class.method

Note that there should be an empty __init__.py file in the same directory.


回答 3

如果不想将函数和类与模块混合使用,则可以导入模块并通过其名称进行访问

import util # imports util.py

util.clean()
util.setup(4)

或者您可以将函数和类导入代码中

from util import clean, setup
clean()
setup(4)

您可以使用wildchar *将模块中的所有内容导入代码

from util import *
clean()
setup(4)

You can import the module and have access through its name if you don’t want to mix functions and classes with yours

import util # imports util.py

util.clean()
util.setup(4)

or you can import the functions and classes to your code

from util import clean, setup
clean()
setup(4)

you can use wildchar * to import everything in that module to your code

from util import *
clean()
setup(4)

回答 4

在python3,__init__.py不再需要。如果控制台的当前目录是python脚本所在的目录,则一切正常

import user

但是,如果从不包含的其他目录中调用,则此方法将无效user.py
在这种情况下,请使用

from . import user

即使您要导入整个文件,而不是仅从那里导入一个类,也可以使用该方法。

In python3, __init__.py is no longer necessary. If the current directory of the console is the directory where the python script is located, everything works fine with

import user

However, this won’t work if called from a different directory, which does not contain user.py.
In that case, use

from . import user

This works even if you want to import the whole file instead of just a class from there.


回答 5

为了更容易理解:

第1步:转到一个目录,其中将包含所有目录

$ cd /var/tmp

步骤2:现在让我们制作一个class1.py文件,其文件名为Class1,并带有一些代码

$ cat > class1.py <<\EOF
class Class1:
    OKBLUE = '\033[94m'
    ENDC = '\033[0m'
    OK = OKBLUE + "[Class1 OK]: " + ENDC
EOF

步骤3:现在让我们制作一个class2.py文件,其文件名为Class2并带有一些代码

$ cat > class2.py <<\EOF
class Class2:
    OKBLUE = '\033[94m'
    ENDC = '\033[0m'
    OK = OKBLUE + "[Class2 OK]: " + ENDC
EOF

步骤4:现在让一个main.py可以执行一次以使用来自2个不同文件的Class1和Class2

$ cat > main.py <<\EOF
"""this is how we are actually calling class1.py and  from that file loading Class1"""
from class1 import Class1 
"""this is how we are actually calling class2.py and  from that file loading Class2"""
from class2 import Class2

print Class1.OK
print Class2.OK
EOF

步骤5:运行程序

$ python main.py

输出将是

[Class1 OK]: 
[Class2 OK]:

To make it more simple to understand:

Step 1: lets go to one directory, where all will be included

$ cd /var/tmp

Step 2: now lets make a class1.py file which has a class name Class1 with some code

$ cat > class1.py <<\EOF
class Class1:
    OKBLUE = '\033[94m'
    ENDC = '\033[0m'
    OK = OKBLUE + "[Class1 OK]: " + ENDC
EOF

Step 3: now lets make a class2.py file which has a class name Class2 with some code

$ cat > class2.py <<\EOF
class Class2:
    OKBLUE = '\033[94m'
    ENDC = '\033[0m'
    OK = OKBLUE + "[Class2 OK]: " + ENDC
EOF

Step 4: now lets make one main.py which will be execute once to use Class1 and Class2 from 2 different files

$ cat > main.py <<\EOF
"""this is how we are actually calling class1.py and  from that file loading Class1"""
from class1 import Class1 
"""this is how we are actually calling class2.py and  from that file loading Class2"""
from class2 import Class2

print Class1.OK
print Class2.OK
EOF

Step 5: Run the program

$ python main.py

The output would be

[Class1 OK]: 
[Class2 OK]:

回答 6

Python 3


一样directory

导入文件:log.py

导入类:SampleApp()

import log
if __name__ == "__main__":
    app = log.SampleApp()
    app.mainloop()

要么

目录是basic

导入文件:log.py

导入类:SampleApp()

from basic import log
if __name__ == "__main__":
    app = log.SampleApp()
    app.mainloop()

Python 3


Same directory.

import file:log.py

import class: SampleApp().

import log
if __name__ == "__main__":
    app = log.SampleApp()
    app.mainloop()

or

directory is basic.

import in file: log.py.

import class: SampleApp().

from basic import log
if __name__ == "__main__":
    app = log.SampleApp()
    app.mainloop()

回答 7

from user import User 
from dir import Dir 
from user import User 
from dir import Dir 

回答 8

如果user.py和dir.py不包含类,则

from .user import User
from .dir import Dir

不管用。然后,您应该导入为

from . import user
from . import dir

If user.py and dir.py are not including classes then

from .user import User
from .dir import Dir

is not working. You should then import as

from . import user
from . import dir

回答 9

我不确定为什么可以使用Pycharm build from file_in_same_dir import class_name

IDE对此有所抱怨,但似乎仍然可以使用。我正在使用Python 3.7

I’m not sure why this work but using Pycharm build from file_in_same_dir import class_name

The IDE complained about it but it seems it still worked. I’m using Python 3.7


回答 10

太简单了,创建一个文件__init__.py为classes目录,然后将其导入到脚本中,如下所示(全部导入)

from classes.myscript import *

仅导入选定的类

from classes.myscript import User
from classes.myscript import Dir

Just too brief, Create a file __init__.py is classes directory and then import it to your script like following (Import all case)

from classes.myscript import *

Import selected classes only

from classes.myscript import User
from classes.myscript import Dir

回答 11

从同一目录导入

from . import the_file_you_want_to_import 

从子目录导入目录应该包含

初始化 .py

档案以外的档案,然后

从目录导入your_file

to import from the same directory

from . import the_file_you_want_to_import 

to import from sub directory the directory should contain

init.py

file other than you files then

from directory import your_file


回答 12

Python3

采用

from .user import User inside dir.py file

use from class.dir import Dir inside main.py
or from class.usr import User inside main.py

像这样

Python3

use

from .user import User inside dir.py file

and

use from class.dir import Dir inside main.py
or from class.usr import User inside main.py

like so


在“ if”语句中设置多行条件的样式?[关闭]

问题:在“ if”语句中设置多行条件的样式?[关闭]

有时我将ifs中的长条条件分解为几行。最明显的方法是:

  if (cond1 == 'val1' and cond2 == 'val2' and
      cond3 == 'val3' and cond4 == 'val4'):
      do_something

在视觉上不是很吸引人,因为动作与条件融为一体。但是,这是使用正确的4个空格的Python缩进的自然方法。

目前,我正在使用:

  if (    cond1 == 'val1' and cond2 == 'val2' and
          cond3 == 'val3' and cond4 == 'val4'):
      do_something

但这不是很漂亮。:-)

您能推荐一种替代方法吗?

Sometimes I break long conditions in ifs onto several lines. The most obvious way to do this is:

  if (cond1 == 'val1' and cond2 == 'val2' and
      cond3 == 'val3' and cond4 == 'val4'):
      do_something

Isn’t very very appealing visually, because the action blends with the conditions. However, it is the natural way using correct Python indentation of 4 spaces.

For the moment I’m using:

  if (    cond1 == 'val1' and cond2 == 'val2' and
          cond3 == 'val3' and cond4 == 'val4'):
      do_something

But this isn’t very pretty. :-)

Can you recommend an alternative way?


回答 0

您不需要在第二条条件行上使用4个空格。可能使用:

if (cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'):
    do_something

另外,不要忘记空白比您想象的更灵活:

if (   
       cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'
   ):
    do_something
if    (cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'):
    do_something

两者都相当丑陋。

也许丢了括号(尽管《风格指南》不鼓励这样做)?

if cond1 == 'val1' and cond2 == 'val2' and \
   cond3 == 'val3' and cond4 == 'val4':
    do_something

这至少使您与众不同。

甚至:

if cond1 == 'val1' and cond2 == 'val2' and \
                       cond3 == 'val3' and \
                       cond4 == 'val4':
    do_something

我想我更喜欢:

if cond1 == 'val1' and \
   cond2 == 'val2' and \
   cond3 == 'val3' and \
   cond4 == 'val4':
    do_something

这是《样式指南》,(自2010年起)建议使用括号。

You don’t need to use 4 spaces on your second conditional line. Maybe use:

if (cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'):
    do_something

Also, don’t forget the whitespace is more flexible than you might think:

if (   
       cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'
   ):
    do_something
if    (cond1 == 'val1' and cond2 == 'val2' and 
       cond3 == 'val3' and cond4 == 'val4'):
    do_something

Both of those are fairly ugly though.

Maybe lose the brackets (the Style Guide discourages this though)?

if cond1 == 'val1' and cond2 == 'val2' and \
   cond3 == 'val3' and cond4 == 'val4':
    do_something

This at least gives you some differentiation.

Or even:

if cond1 == 'val1' and cond2 == 'val2' and \
                       cond3 == 'val3' and \
                       cond4 == 'val4':
    do_something

I think I prefer:

if cond1 == 'val1' and \
   cond2 == 'val2' and \
   cond3 == 'val3' and \
   cond4 == 'val4':
    do_something

Here’s the Style Guide, which (since 2010) recommends using brackets.


回答 1

在简并的情况下,我采用了以下内容:简而言之,其为AND或OR。

if all( [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4'] ):

if any( [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4'] ):

它可以刮掉几个字符,并清楚表明该条件没有任何微妙之处。

I’ve resorted to the following in the degenerate case where it’s simply AND’s or OR’s.

if all( [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4'] ):

if any( [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4'] ):

It shaves a few characters and makes it clear that there’s no subtlety to the condition.


回答 2

有人必须在这里提倡使用垂直空格!:)

if (     cond1 == val1
     and cond2 == val2
     and cond3 == val3
   ):
    do_stuff()

这使得每个条件都清晰可见。它还可以更清晰地表达更复杂的条件:

if (    cond1 == val1
     or 
        (     cond2_1 == val2_1
          and cond2_2 >= val2_2
          and cond2_3 != bad2_3
        )
   ):
    do_more_stuff()

是的,为了清楚起见,我们将权衡一些垂直房地产。IMO值得。

Someone has to champion use of vertical whitespace here! :)

if (     cond1 == val1
     and cond2 == val2
     and cond3 == val3
   ):
    do_stuff()

This makes each condition clearly visible. It also allows cleaner expression of more complex conditions:

if (    cond1 == val1
     or 
        (     cond2_1 == val2_1
          and cond2_2 >= val2_2
          and cond2_3 != bad2_3
        )
   ):
    do_more_stuff()

Yes, we’re trading off a bit of vertical real estate for clarity. Well worth it IMO.


回答 3

当我有一个非常大的if条件时,我更喜欢这种风格:

if (
    expr1
    and (expr2 or expr3)
    and hasattr(thingy1, '__eq__')
    or status=="HappyTimes"
):
    do_stuff()
else:
    do_other_stuff()

I prefer this style when I have a terribly large if-condition:

if (
    expr1
    and (expr2 or expr3)
    and hasattr(thingy1, '__eq__')
    or status=="HappyTimes"
):
    do_stuff()
else:
    do_other_stuff()

回答 4

这是我非常个人的看法:(在我看来)长时间条件是一种代码气味,建议将其重构为布尔返回函数/方法。例如:

def is_action__required(...):
    return (cond1 == 'val1' and cond2 == 'val2'
            and cond3 == 'val3' and cond4 == 'val4')

现在,如果我找到一种使多行条件看起来不错的方法,那么我可能会发现自己对满足这些条件感到满意,而无需进行重构。

另一方面,让它们扰乱我的审美意识可以促进重构。

因此,我的结论是,多个线路条件看起来很丑陋,这是避免它们的诱因。

Here’s my very personal take: long conditions are (in my view) a code smell that suggests refactoring into a boolean-returning function/method. For example:

def is_action__required(...):
    return (cond1 == 'val1' and cond2 == 'val2'
            and cond3 == 'val3' and cond4 == 'val4')

Now, if I found a way to make multi-line conditions look good, I would probably find myself content with having them and skip the refactoring.

On the other hand, having them perturb my aesthetic sense acts as an incentive for refactoring.

My conclusion, therefore, is that multiple line conditions should look ugly and this is an incentive to avoid them.


回答 5

这并没有太大改善,但是…

allCondsAreOK = (cond1 == 'val1' and cond2 == 'val2' and
                 cond3 == 'val3' and cond4 == 'val4')

if allCondsAreOK:
   do_something

This doesn’t improve so much but…

allCondsAreOK = (cond1 == 'val1' and cond2 == 'val2' and
                 cond3 == 'val3' and cond4 == 'val4')

if allCondsAreOK:
   do_something

回答 6

我建议将and关键字移至第二行,并将包含条件的所有行缩进,而不是四个空格:

if (cond1 == 'val1' and cond2 == 'val2'
  and cond3 == 'val3' and cond4 == 'val4'):
    do_something

这正是我在代码中解决此问题的方式。将关键字作为该行中的第一个单词使该条件更具可读性,并且减少空格的数量进一步将条件与动作区分开。

I suggest moving the and keyword to the second line and indenting all lines containing conditions with two spaces instead of four:

if (cond1 == 'val1' and cond2 == 'val2'
  and cond3 == 'val3' and cond4 == 'val4'):
    do_something

This is exactly how I solve this problem in my code. Having a keyword as the first word in the line makes the condition a lot more readable, and reducing the number of spaces further distinguishes condition from action.


回答 7

似乎值得引用PEP 0008(Python的官方样式指南),因为它以适度的长度评论了这个问题:

当- if语句的条件部分足够长以至于需要将其写成多行时,值得注意的是,两个字符关键字(即if),一个空格和一个左括号的组合会产生自然的4-多行条件的后续行的空格缩进。这可能与嵌套在if-statement中的缩进代码套件产生视觉冲突,该缩进代码套件自然也会缩进4个空格。对于如何(或是否)在视觉上进一步将这些条件行与if-statement 内的嵌套套件区分开,此PEP没有明确的位置。在这种情况下,可接受的选项包括但不限于:

# No extra indentation.
if (this_is_one_thing and
    that_is_another_thing):
    do_something()

# Add a comment, which will provide some distinction in editors
# supporting syntax highlighting.
if (this_is_one_thing and
    that_is_another_thing):
    # Since both conditions are true, we can frobnicate.
    do_something()

# Add some extra indentation on the conditional continuation line.
if (this_is_one_thing
        and that_is_another_thing):
    do_something()

注意上面引用中的“不限于”;除了在样式指南中建议的方法外,在此问题的其他答案中建议的一些方法也是可以接受的。

It seems worth quoting PEP 0008 (Python’s official style guide), since it comments upon this issue at modest length:

When the conditional part of an if -statement is long enough to require that it be written across multiple lines, it’s worth noting that the combination of a two character keyword (i.e. if ), plus a single space, plus an opening parenthesis creates a natural 4-space indent for the subsequent lines of the multiline conditional. This can produce a visual conflict with the indented suite of code nested inside the if -statement, which would also naturally be indented to 4 spaces. This PEP takes no explicit position on how (or whether) to further visually distinguish such conditional lines from the nested suite inside the if -statement. Acceptable options in this situation include, but are not limited to:

# No extra indentation.
if (this_is_one_thing and
    that_is_another_thing):
    do_something()

# Add a comment, which will provide some distinction in editors
# supporting syntax highlighting.
if (this_is_one_thing and
    that_is_another_thing):
    # Since both conditions are true, we can frobnicate.
    do_something()

# Add some extra indentation on the conditional continuation line.
if (this_is_one_thing
        and that_is_another_thing):
    do_something()

Note the “not limited to” in the quote above; besides the approaches suggested in the style guide, some of the ones suggested in other answers to this question are acceptable too.


回答 8

这就是我的工作,请记住“ all”和“ any”接受迭代,因此我将一个长条件放在列表中,然后让“ all”完成工作。

condition = [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4']

if all(condition):
   do_something

Here’s what I do, remember that “all” and “any” accepts an iterable, so I just put a long condition in a list and let “all” do the work.

condition = [cond1 == 'val1', cond2 == 'val2', cond3 == 'val3', cond4 == 'val4']

if all(condition):
   do_something

回答 9

我很惊讶没有看到我的首选解决方案,

if (cond1 == 'val1' and cond2 == 'val2'
    and cond3 == 'val3' and cond4 == 'val4'):
    do_something

由于and是一个关键字,因此我的编辑器将其突出显示,并且看起来与其下方的do_something完全不同。

I’m surprised not to see my preferred solution,

if (cond1 == 'val1' and cond2 == 'val2'
    and cond3 == 'val3' and cond4 == 'val4'):
    do_something

Since and is a keyword, it gets highlighted by my editor, and looks sufficiently different from the do_something below it.


回答 10

加上@krawyoti所说的话…长时间的气味会散发出来,因为它们难以阅读且难以理解。使用函数或变量可使代码更清晰。在Python中,我更喜欢使用垂直空间,将括号括起来,并将逻辑运算符放在每行的开头,以使表达式看起来不像“浮动”。

conditions_met = (
    cond1 == 'val1' 
    and cond2 == 'val2' 
    and cond3 == 'val3' 
    and cond4 == 'val4'
    )
if conditions_met:
    do_something

如果需要多次评估条件(例如在while循环中),则最好使用局部函数。

Adding to what @krawyoti said… Long conditions smell because they are difficult to read and difficult to understand. Using a function or a variable makes the code clearer. In Python, I prefer to use vertical space, enclose parenthesis, and place the logical operators at the beginning of each line so the expressions don’t look like “floating”.

conditions_met = (
    cond1 == 'val1' 
    and cond2 == 'val2' 
    and cond3 == 'val3' 
    and cond4 == 'val4'
    )
if conditions_met:
    do_something

If the conditions need to be evaluated more than once, as in a while loop, then using a local function is best.


回答 11

就个人而言,我喜欢在长if语句中添加含义。我将不得不在代码中进行搜索以找到合适的示例,但这是我想到的第一个示例:假设我碰巧遇到了一些古怪的逻辑,我想根据许多变量来显示特定页面。

英语:“如果登录的用户不是管理员教师,而是普通教师,而不是学生本身…”

if not user.isAdmin() and user.isTeacher() and not user.isStudent():
    doSomething()

当然,这看起来不错,但是如果需要大量阅读,请阅读这些内容。我们如何将逻辑分配给有意义的标签。“标签”实际上是变量名:

displayTeacherPanel = not user.isAdmin() and user.isTeacher() and not user.isStudent()
if displayTeacherPanel:
    showTeacherPanel()

这看起来似乎很愚蠢,但是您可能还有另一种情况,您仅在显示教师面板或默认情况下用户有权访问其他特定面板时才要显示其他项目:

if displayTeacherPanel or user.canSeeSpecialPanel():
    showSpecialPanel()

尝试在不使用变量来存储和标记逻辑的情况下编写上述条件,不仅会得到非常混乱,难以理解的逻辑语句,而且还会重复自己。尽管有合理的exceptions情况,但请记住:不要重复自己(DRY)。

Personally, I like to add meaning to long if-statements. I would have to search through code to find an appropriate example, but here’s the first example that comes to mind: let’s say I happen to run into some quirky logic where I want to display a certain page depending on many variables.

English: “If the logged-in user is NOT an administrator teacher, but is just a regular teacher, and is not a student themselves…”

if not user.isAdmin() and user.isTeacher() and not user.isStudent():
    doSomething()

Sure this might look fine, but reading those if statements is a lot of work. How about we assign the logic to label that makes sense. The “label” is actually the variable name:

displayTeacherPanel = not user.isAdmin() and user.isTeacher() and not user.isStudent()
if displayTeacherPanel:
    showTeacherPanel()

This may seem silly, but you might have yet another condition where you ONLY want to display another item if, and only if, you’re displaying the teacher panel OR if the user has access to that other specific panel by default:

if displayTeacherPanel or user.canSeeSpecialPanel():
    showSpecialPanel()

Try writing the above condition without using variables to store and label your logic, and not only do you end up with a very messy, hard-to-read logical statement, but you also just repeated yourself. While there are reasonable exceptions, remember: Don’t Repeat Yourself (DRY).


回答 12

“ all”和“ any”对于相同类型案例的许多条件都很好。但是他们总是评估所有条件。如本例所示:

def c1():
    print " Executed c1"
    return False
def c2():
    print " Executed c2"
    return False


print "simple and (aborts early!)"
if c1() and c2():
    pass

print

print "all (executes all :( )"
if all((c1(),c2())):
    pass

print

“all” and “any” are nice for the many conditions of same type case. BUT they always evaluates all conditions. As shown in this example:

def c1():
    print " Executed c1"
    return False
def c2():
    print " Executed c2"
    return False


print "simple and (aborts early!)"
if c1() and c2():
    pass

print

print "all (executes all :( )"
if all((c1(),c2())):
    pass

print

回答 13

(我对标识符进行了小幅修改,因为固定宽度的名称不代表真实代码-至少不代表我遇到的真实代码-这将掩盖示例的可读性。)

if (cond1 == "val1" and cond22 == "val2"
and cond333 == "val3" and cond4444 == "val4"):
    do_something

这对于“ and”和“ or”非常有效(重要的是它们排在第二行),但对于其他较长条件则远不如此。幸运的是,前者似乎是更常见的情况,而后者通常很容易用一个临时变量重写。(通常并不难,但是在重写时保留“和” /“或”的短路可能很困难或不太明显/可读性。)

由于我从您的博客文章中找到了有关C ++的问题,因此我将介绍我的C ++样式是相同的:

if (cond1 == "val1" and cond22 == "val2"
and cond333 == "val3" and cond4444 == "val4") {
    do_something
}

(I’ve lightly modified the identifiers as fixed-width names aren’t representative of real code – at least not real code that I encounter – and will belie an example’s readability.)

if (cond1 == "val1" and cond22 == "val2"
and cond333 == "val3" and cond4444 == "val4"):
    do_something

This works well for “and” and “or” (it’s important that they’re first on the second line), but much less so for other long conditions. Fortunately, the former seem to be the more common case while the latter are often easily rewritten with a temporary variable. (It’s usually not hard, but it can be difficult or much less obvious/readable to preserve the short-circuiting of “and”/”or” when rewriting.)

Since I found this question from your blog post about C++, I’ll include that my C++ style is identical:

if (cond1 == "val1" and cond22 == "val2"
and cond333 == "val3" and cond4444 == "val4") {
    do_something
}

回答 14

简单明了,还通过了pep8检查:

if (
    cond1 and
    cond2
):
    print("Hello World!")

近年来,我一直偏爱alland any函数,因为我很少将And和Or混合使用,因此效果很好,并且具有生成器理解失败的额外优势:

if all([
    cond1,
    cond2,
]):
    print("Hello World!")

只记得传递一个可迭代的值!传递N参数是不正确的。

注意:any就像很多or比较,all就像很多and比较。


这与生成器理解很好地结合在一起,例如:

# Check if every string in a list contains a substring:
my_list = [
    'a substring is like a string', 
    'another substring'
]

if all('substring' in item for item in my_list):
   print("Hello World!")

# or

if all(
    'substring' in item
    for item in my_list
):
    print("Hello World!")

更多信息:生成器理解

Plain and simple, also passes pep8 checks:

if (
    cond1 and
    cond2
):
    print("Hello World!")

In recent times I have been preferring the all and any functions, since I rarely mix And and Or comparisons this works well, and has the additional advantage of Failing Early with generators comprehension:

if all([
    cond1,
    cond2,
]):
    print("Hello World!")

Just remember to pass in a single iterable! Passing in N-arguments is not correct.

Note: any is like many or comparisons, all is like many and comparisons.


This combines nicely with generator comprehensions, for example:

# Check if every string in a list contains a substring:
my_list = [
    'a substring is like a string', 
    'another substring'
]

if all('substring' in item for item in my_list):
   print("Hello World!")

# or

if all(
    'substring' in item
    for item in my_list
):
    print("Hello World!")

More on: generator comprehension


回答 15

如果我们仅在条件和身体之间插入一条额外的空白行,而其余部分以规范的方式进行,该怎么办?

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'):

    do_something

ps我总是使用制表符,而不是空格。我无法微调…

What if we only insert an additional blank line between the condition and the body and do the rest in the canonical way?

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'):

    do_something

p.s. I always use tabs, not spaces; I cannot fine-tune…


回答 16

我通常要做的是:

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'
   ):
    do_something

这样,右括号和冒号在视觉上标志着我们病情的结束。

What I usually do is:

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'
   ):
    do_something

this way the closing brace and colon visually mark the end of our condition.


回答 17

也为if语句提供多条件的所有受访者与所提出的问题一样丑陋。您不能通过做同样的事情来解决此问题。

甚至PEP 0008的答案也是令人反感的。

这是一种更具可读性的方法

condition = random.randint(0, 100) # to demonstrate
anti_conditions = [42, 67, 12]
if condition not in anti_conditions:
    pass

要我吃我的话吗?说服我,您需要多条件,并且我会逐字打印出来,并出于娱乐目的而食用。

All respondents that also provide multi-conditionals for the if statement is just as ugly as the problem presented. You don’t solve this problem by doing the same thing..

Even the PEP 0008 answer is repulsive.

Here is a far more readable approach

condition = random.randint(0, 100) # to demonstrate
anti_conditions = [42, 67, 12]
if condition not in anti_conditions:
    pass

Want me to eat my words? Convince me you need multi-conditionals and I’ll literally print this and eat it for your amusement.


回答 18

我认为@zkanda的解决方案稍作改动就可以了。如果您在各自的列表中有条件和值,则可以使用列表推导进行比较,这将使添加条件/值对的过程更加通用。

conditions = [1, 2, 3, 4]
values = [1, 2, 3, 4]
if all([c==v for c, v in zip(conditions, values)]):
    # do something

如果我确实想对这样的语句进行硬编码,出于可读性考虑,我会这样写:

if (condition1==value1) and (condition2==value2) and \
   (condition3==value3) and (condition4==value4):

并与iand操作员一起提出另一个解决方案:

proceed = True
for c, v in zip(conditions, values):
    proceed &= c==v

if proceed:
    # do something

I think @zkanda’s solution would be good with a minor twist. If you had your conditions and values in their own respective lists, you could use a list comprehension to do the comparison, which would make things a bit more general for adding condition/value pairs.

conditions = [1, 2, 3, 4]
values = [1, 2, 3, 4]
if all([c==v for c, v in zip(conditions, values)]):
    # do something

If I did want to hard-code a statement like this, I would write it like this for legibility:

if (condition1==value1) and (condition2==value2) and \
   (condition3==value3) and (condition4==value4):

And just to throw another solution out there with an iand operator:

proceed = True
for c, v in zip(conditions, values):
    proceed &= c==v

if proceed:
    # do something

回答 19

出于完整性考虑,仅提供了一些其他随机想法。如果它们对您有用,请使用它们。否则,您最好尝试其他方法。

您也可以使用字典来做到这一点:

>>> x = {'cond1' : 'val1', 'cond2' : 'val2'}
>>> y = {'cond1' : 'val1', 'cond2' : 'val2'}
>>> x == y
True

这个选项比较复杂,但是您可能还会发现它有用:

class Klass(object):
    def __init__(self, some_vars):
        #initialize conditions here
    def __nonzero__(self):
        return (self.cond1 == 'val1' and self.cond2 == 'val2' and
                self.cond3 == 'val3' and self.cond4 == 'val4')

foo = Klass()
if foo:
    print "foo is true!"
else:
    print "foo is false!"

邓诺(Dunno)是否适合您,但这是您可以考虑的另一种选择。这是另一种方式:

class Klass(object):
    def __init__(self):
        #initialize conditions here
    def __eq__(self):
        return (self.cond1 == 'val1' and self.cond2 == 'val2' and
               self.cond3 == 'val3' and self.cond4 == 'val4')

x = Klass(some_values)
y = Klass(some_other_values)
if x == y:
    print 'x == y'
else:
    print 'x!=y'

我还没有测试过最后两个,但是如果您想使用这些概念,那么它们的概念应该足以让您开始工作。

(从记录来看,如果这只是一次性的事情,那么使用最初介绍的方法可能会更好。如果在很多地方进行比较,这些方法可能会增强可读性,从而使您不会因为它们有点hacky而感到难过。)

Just a few other random ideas for completeness’s sake. If they work for you, use them. Otherwise, you’re probably better off trying something else.

You could also do this with a dictionary:

>>> x = {'cond1' : 'val1', 'cond2' : 'val2'}
>>> y = {'cond1' : 'val1', 'cond2' : 'val2'}
>>> x == y
True

This option is more complicated, but you may also find it useful:

class Klass(object):
    def __init__(self, some_vars):
        #initialize conditions here
    def __nonzero__(self):
        return (self.cond1 == 'val1' and self.cond2 == 'val2' and
                self.cond3 == 'val3' and self.cond4 == 'val4')

foo = Klass()
if foo:
    print "foo is true!"
else:
    print "foo is false!"

Dunno if that works for you, but it’s another option to consider. Here’s one more way:

class Klass(object):
    def __init__(self):
        #initialize conditions here
    def __eq__(self):
        return (self.cond1 == 'val1' and self.cond2 == 'val2' and
               self.cond3 == 'val3' and self.cond4 == 'val4')

x = Klass(some_values)
y = Klass(some_other_values)
if x == y:
    print 'x == y'
else:
    print 'x!=y'

The last two I haven’t tested, but the concepts should be enough to get you going if that’s what you want to go with.

(And for the record, if this is just a one time thing, you’re probably just better off using the method you presented at first. If you’re doing the comparison in lots of places, these methods may enhance readability enough to make you not feel so bad about the fact that they are kind of hacky.)


回答 20

我一直在努力寻找一种合适的方法来做到这一点,所以我只是想出了一个主意(不是灵丹妙药,因为这主要是一个品味问题)。

if bool(condition1 and
        condition2 and
        ...
        conditionN):
    foo()
    bar()

与我见过的其他解决方案相比,我发现此解决方案有一些优点,即,您恰好获得了额外的4个缩进空间(布尔),允许所有条件垂直排列,并且if语句的主体可以缩进一种清晰的方式 这也保留了对布尔运算符进行短路评估的好处,但是当然会增加基本上不执行任何操作的函数调用的开销。您可能会(有效地)争辩说,可以在此处使用任何返回其参数的函数而不是bool,但是就像我说的那样,这只是一个主意,最终是一个品味问题。

有趣的是,当我写这篇文章并思考“问题”时,我想到了另一个想法,它消除了函数调用的开销。为什么不通过使用额外的圆括号来表明我们将要输入复杂的条件?再说2个,相对于if语句的主体,给子条件一个2的空格缩进。例:

if (((foo and
      bar and
      frob and
      ninja_bear))):
    do_stuff()

我之所以喜欢这样,是因为当您看着它时,头上的铃铛立即响起,说:“嘿,这是一件复杂的事情!” 。是的,我知道括号并不能帮助提高可读性,但是这些条件应该很少出现,并且当它们出现时,您无论如何都要停下来仔细阅读它们(因为它们很复杂))。

无论如何,只有两个我在这里没有看到的建议。希望这可以帮助某人:)

I’ve been struggling to find a decent way to do this as well, so I just came up with an idea (not a silver bullet, since this is mainly a matter of taste).

if bool(condition1 and
        condition2 and
        ...
        conditionN):
    foo()
    bar()

I find a few merits in this solution compared to others I’ve seen, namely, you get exactly an extra 4 spaces of indentation (bool), allowing all conditions to line up vertically, and the body of the if statement can be indented in a clear(ish) way. This also keeps the benefits of short-circuit evaluation of boolean operators, but of course adds the overhead of a function call that basically does nothing. You could argue (validly) that any function returning its argument could be used here instead of bool, but like I said, it’s just an idea and it’s ultimately a matter of taste.

Funny enough, as I was writing this and thinking about the “problem”, I came up with yet another idea, which removes the overhead of a function call. Why not indicate that we’re about to enter a complex condition by using extra pairs of parentheses? Say, 2 more, to give a nice 2 space indent of the sub-conditions relative to the body of the if statement. Example:

if (((foo and
      bar and
      frob and
      ninja_bear))):
    do_stuff()

I kind of like this because when you look at it, a bell immediatelly rings in your head saying “hey, there’s a complex thing going on here!”. Yes, I know that parentheses don’t help readability, but these conditions should appear rarely enough, and when they do show up, you are going to have to stop and read them carefuly anyway (because they’re complex).

Anyway, just two more proposals that I haven’t seen here. Hope this helps someone :)


回答 21

您可以将其分为两行

total = cond1 == 'val' and cond2 == 'val2' and cond3 == 'val3' and cond4 == val4
if total:
    do_something()

甚至一次添加一个条件。这样,至少可以将混乱与混乱分开if

You could split it into two lines

total = cond1 == 'val' and cond2 == 'val2' and cond3 == 'val3' and cond4 == val4
if total:
    do_something()

Or even add on one condition at a time. That way, at least it separates the clutter from the if.


回答 22

我知道这个线程很旧,但是我有一些Python 2.7代码,PyCharm(4.5)仍然抱怨这种情况:

if foo is not None:
    if (cond1 == 'val1' and cond2 == 'val2' and
        cond3 == 'val3' and cond4 == 'val4'):
            # some comment about do_something
            do_something

即使PEP8警告“带有与下一个逻辑行相同的缩进的可视缩进行”,实际代码也完全可以吗?这不是“缩进过度”吗?

…有时候我希望Python会咬住子弹,并且用花括号将其消失。我不知道这些年来由于偶然的错误压痕而意外引入了多少个错误…

I know this thread is old, but I have some Python 2.7 code and PyCharm (4.5) still complains about this case:

if foo is not None:
    if (cond1 == 'val1' and cond2 == 'val2' and
        cond3 == 'val3' and cond4 == 'val4'):
            # some comment about do_something
            do_something

Even with the PEP8 warning “visually indented line with same indent as next logical line”, the actual code is completely OK? It’s not “over-indenting?”

…there are times I wish Python would’ve bit the bullet and just gone with curly braces. I wonder how many bugs have been accidentally introduced over the years due to accidental mis-indentation…


回答 23

将您的条件打包到一个列表中,然后再做。喜欢:

if False not in Conditions:
    do_something

Pack your conditions into a list, then do smth. like:

if False not in Conditions:
    do_something

回答 24

我发现当我的条件很长时,我通常会有一个简短的代码主体。在这种情况下,我只是将身体缩进两次,因此:

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'):
        do_something

I find that when I have long conditions, I often have a short code body. In that case, I just double-indent the body, thus:

if (cond1 == 'val1' and cond2 == 'val2' and
    cond3 == 'val3' and cond4 == 'val4'):
        do_something

回答 25

  if cond1 == 'val1' and \
     cond2 == 'val2' and \
     cond3 == 'val3' and \
     cond4 == 'val4':
      do_something

或者,如果更清楚:

  if cond1 == 'val1'\
     and cond2 == 'val2'\
     and cond3 == 'val3'\
     and cond4 == 'val4':
      do_something

在这种情况下,缩进没有理由应为4的倍数,例如,请参见“与开孔定界符对齐”:

http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Indentation#Indentation

  if cond1 == 'val1' and \
     cond2 == 'val2' and \
     cond3 == 'val3' and \
     cond4 == 'val4':
      do_something

or if this is clearer:

  if cond1 == 'val1'\
     and cond2 == 'val2'\
     and cond3 == 'val3'\
     and cond4 == 'val4':
      do_something

There is no reason indent should be a multiple of 4 in this case, e.g. see “Aligned with opening delimiter”:

http://google-styleguide.googlecode.com/svn/trunk/pyguide.html?showone=Indentation#Indentation


回答 26

这是另一种方法:

cond_list = ['cond1 == "val1"','cond2=="val2"','cond3=="val3"','cond4=="val4"']
if all([eval(i) for i in cond_list]):
 do something

这也使您可以轻松地添加另一个条件,而无需更改if语句,只需将另一个条件附加到列表即可:

cond_list.append('cond5=="val5"')

Here’s another approach:

cond_list = ['cond1 == "val1"','cond2=="val2"','cond3=="val3"','cond4=="val4"']
if all([eval(i) for i in cond_list]):
 do something

This also makes it easy to add another condition easily without changing the if statement by simply appending another condition to the list:

cond_list.append('cond5=="val5"')

回答 27

我通常使用:

if ((cond1 == 'val1' and cond2 == 'val2' and
     cond3 == 'val3' and cond4 == 'val4')):
    do_something()

I usually use:

if ((cond1 == 'val1' and cond2 == 'val2' and
     cond3 == 'val3' and cond4 == 'val4')):
    do_something()

回答 28

如果我们的if&else条件必须在其中执行多个语句,则我们可以像下面这样编写。每当我们有if else例子时,里面都有一个语句。

谢谢它为我工作。

#!/usr/bin/python
import sys
numberOfArgument =len(sys.argv)
weblogic_username =''
weblogic_password = ''
weblogic_admin_server_host =''
weblogic_admin_server_port =''


if numberOfArgument == 5:
        weblogic_username = sys.argv[1]
        weblogic_password = sys.argv[2]
        weblogic_admin_server_host =sys.argv[3]
        weblogic_admin_server_port=sys.argv[4]
elif numberOfArgument <5:
        print " weblogic UserName, weblogic Password and weblogic host details are Mandatory like, defalutUser, passwordForDefaultUser, t3s://server.domainname:7001 ."
        weblogic_username = raw_input("Enter Weblogic user Name")
        weblogic_password = raw_input('Enter Weblogic user Password')
        weblogic_admin_server_host = raw_input('Enter Weblogic admin host ')
        weblogic_admin_server_port = raw_input('Enter Weblogic admin port')
#enfelif
#endIf

if our if & an else condition has to execute multiple statement inside of it than we can write like below. Every when we have if else example with one statement inside of it .

Thanks it work for me.

#!/usr/bin/python
import sys
numberOfArgument =len(sys.argv)
weblogic_username =''
weblogic_password = ''
weblogic_admin_server_host =''
weblogic_admin_server_port =''


if numberOfArgument == 5:
        weblogic_username = sys.argv[1]
        weblogic_password = sys.argv[2]
        weblogic_admin_server_host =sys.argv[3]
        weblogic_admin_server_port=sys.argv[4]
elif numberOfArgument <5:
        print " weblogic UserName, weblogic Password and weblogic host details are Mandatory like, defalutUser, passwordForDefaultUser, t3s://server.domainname:7001 ."
        weblogic_username = raw_input("Enter Weblogic user Name")
        weblogic_password = raw_input('Enter Weblogic user Password')
        weblogic_admin_server_host = raw_input('Enter Weblogic admin host ')
        weblogic_admin_server_port = raw_input('Enter Weblogic admin port')
#enfelif
#endIf

回答 29

请原谅我,但是碰巧我对#Python的了解不如在座的任何人,但是碰巧我在3D BIM建模中编写自己的对象时发现了类似的东西,因此我将使算法适应python的

我在这里发现的问题是双面的:

  1. 对于可能会尝试解密脚本的人,我认为我似乎很陌生。
  2. 如果更改了这些值(最可能的话),或者必须添加新的条件(损坏的架构),则代码维护的成本将很高。

要绕过所有这些问题,您的脚本必须像这样

param_Val01 = Value 01   #give a meaningful name for param_Val(i) preferable an integer
param_Val02 = Value 02
param_Val03 = Value 03
param_Val04 = Value 04   # and ... etc

conditions = 0           # this is a value placeholder

########
Add script that if true will make:

conditions = conditions + param_Val01   #value of placeholder is updated
########

### repeat as needed


if conditions = param_Val01 + param_Val02 + param_Val03 + param_Val04:
    do something

这种方法的优点:

  1. 脚本可读。

  2. 脚本可以轻松维护。

  3. 条件是对表示所需条件的值之和进行的1比较操作。
  4. 无需多级条件

希望对大家有帮助

Pardon my noobness, but it happens that I’m not as knowledgeable of #Python as anyone of you here, but it happens that I have found something similar when scripting my own objects in a 3D BIM modeling, so I will adapt my algorithm to that of python.

The problem that I find here, is double sided:

  1. Values my seem foreign for someone who may try to decipher the script.
  2. Code maintenance will come at a high cost, if those values are changed (most probable), or if new conditions must be added (broken schema)

Do to bypass all these problems, your script must go like this

param_Val01 = Value 01   #give a meaningful name for param_Val(i) preferable an integer
param_Val02 = Value 02
param_Val03 = Value 03
param_Val04 = Value 04   # and ... etc

conditions = 0           # this is a value placeholder

########
Add script that if true will make:

conditions = conditions + param_Val01   #value of placeholder is updated
########

### repeat as needed


if conditions = param_Val01 + param_Val02 + param_Val03 + param_Val04:
    do something

Pros of this method:

  1. Script is readable.

  2. Script can be easy maintained.

  3. conditions is a 1 comparison operation to a sum of values that represents the desired conditions.
  4. No need for multilevel conditions

Hope it help you all


使用Python将列表写入文件

问题:使用Python将列表写入文件

因为writelines()不插入换行符,这是将列表写入文件的最干净的方法吗?

file.writelines(["%s\n" % item  for item in list])

似乎会有一种标准的方法…

Is this the cleanest way to write a list to a file, since writelines() doesn’t insert newline characters?

file.writelines(["%s\n" % item  for item in list])

It seems like there would be a standard way…


回答 0

您可以使用循环:

with open('your_file.txt', 'w') as f:
    for item in my_list:
        f.write("%s\n" % item)

在Python 2中,您也可以使用

with open('your_file.txt', 'w') as f:
    for item in my_list:
        print >> f, item

如果您热衷于单个函数调用,请至少移除方括号[],以使要打印的字符串一次生成一个(一个genexp而不是一个listcomp)-没有理由占用所有需要的内存具体化整个字符串列表。

You can use a loop:

with open('your_file.txt', 'w') as f:
    for item in my_list:
        f.write("%s\n" % item)

In Python 2, you can also use

with open('your_file.txt', 'w') as f:
    for item in my_list:
        print >> f, item

If you’re keen on a single function call, at least remove the square brackets [], so that the strings to be printed get made one at a time (a genexp rather than a listcomp) — no reason to take up all the memory required to materialize the whole list of strings.


回答 1

您将如何处理该文件?该文件是否存在于人类或具有明确互操作性要求的其他程序中?

如果您只是尝试将列表序列化到磁盘以供同一python应用程序稍后使用,则应该对列表进行腌制

import pickle

with open('outfile', 'wb') as fp:
    pickle.dump(itemlist, fp)

读回:

with open ('outfile', 'rb') as fp:
    itemlist = pickle.load(fp)

What are you going to do with the file? Does this file exist for humans, or other programs with clear interoperability requirements?

If you are just trying to serialize a list to disk for later use by the same python app, you should be pickleing the list.

import pickle

with open('outfile', 'wb') as fp:
    pickle.dump(itemlist, fp)

To read it back:

with open ('outfile', 'rb') as fp:
    itemlist = pickle.load(fp)

回答 2

比较简单的方法是:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(itemlist))

您可以使用生成器表达式来确保项目列表中的所有项目都是字符串:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(str(item) for item in itemlist))

请记住,所有itemlist列表都必须在内存中,因此,请注意内存消耗。

The simpler way is:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(itemlist))

You could ensure that all items in item list are strings using a generator expression:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(str(item) for item in itemlist))

Remember that all itemlist list need to be in memory, so, take care about the memory consumption.


回答 3

使用Python 3Python 2.6+语法:

with open(filepath, 'w') as file_handler:
    for item in the_list:
        file_handler.write("{}\n".format(item))

这是与平台无关的。它还以换行符结束最后一行,这是UNIX的最佳实践

从Python 3.6开始,"{}\n".format(item)可以用f字符串替换:f"{item}\n"

Using Python 3 and Python 2.6+ syntax:

with open(filepath, 'w') as file_handler:
    for item in the_list:
        file_handler.write("{}\n".format(item))

This is platform-independent. It also terminates the final line with a newline character, which is a UNIX best practice.

Starting with Python 3.6, "{}\n".format(item) can be replaced with an f-string: f"{item}\n".


回答 4

还有另一种方式。使用simplejson(在python 2.6中包含为json)序列化为json

>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()

如果您检查output.txt:

[1、2、3、4]

这很有用,因为语法是pythonic的,它是人类可读的,并且可以由其他语言的其他程序读取。

Yet another way. Serialize to json using simplejson (included as json in python 2.6):

>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()

If you examine output.txt:

[1, 2, 3, 4]

This is useful because the syntax is pythonic, it’s human readable, and it can be read by other programs in other languages.


回答 5

我认为探索使用genexp的好处会很有趣,所以这是我的看法。

问题中的示例使用方括号创建临时列表,因此等效于:

file.writelines( list( "%s\n" % item for item in list ) )

它不必要地构造了将要写出的所有行的临时列表,这可能会消耗大量内存,具体取决于列表的大小以及输出的详细str(item)程度。

放下方括号(相当于删除list()上面的包装调用)将改为将临时生成器传递给file.writelines()

file.writelines( "%s\n" % item for item in list )

该生成器将item按需创建换行终止的对象表示形式(即,当对象被写出时)。这样做有几个方面的好处:

  • 内存开销很小,即使列表很大
  • 如果str(item)速度较慢,则在处理每个项目时文件中都有可见的进度

这样可以避免出现内存问题,例如:

In [1]: import os

In [2]: f = file(os.devnull, "w")

In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop

In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
...
MemoryError

(通过,我通过将Python的最大虚拟内存限制为〜100MB触发了此错误ulimit -v 102400)。

一方面,此方法实际上并没有比原始方法快:

In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop

In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop

(Linux上的Python 2.6.2)

I thought it would be interesting to explore the benefits of using a genexp, so here’s my take.

The example in the question uses square brackets to create a temporary list, and so is equivalent to:

file.writelines( list( "%s\n" % item for item in list ) )

Which needlessly constructs a temporary list of all the lines that will be written out, this may consume significant amounts of memory depending on the size of your list and how verbose the output of str(item) is.

Drop the square brackets (equivalent to removing the wrapping list() call above) will instead pass a temporary generator to file.writelines():

file.writelines( "%s\n" % item for item in list )

This generator will create newline-terminated representation of your item objects on-demand (i.e. as they are written out). This is nice for a couple of reasons:

  • Memory overheads are small, even for very large lists
  • If str(item) is slow there’s visible progress in the file as each item is processed

This avoids memory issues, such as:

In [1]: import os

In [2]: f = file(os.devnull, "w")

In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop

In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
...
MemoryError

(I triggered this error by limiting Python’s max. virtual memory to ~100MB with ulimit -v 102400).

Putting memory usage to one side, this method isn’t actually any faster than the original:

In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop

In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop

(Python 2.6.2 on Linux)


回答 6

因为我很懒…

import json
a = [1,2,3]
with open('test.txt', 'w') as f:
    f.write(json.dumps(a))

#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
    a = json.loads(f.read())

Because i’m lazy….

import json
a = [1,2,3]
with open('test.txt', 'w') as f:
    f.write(json.dumps(a))

#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
    a = json.loads(f.read())

回答 7

将列表序列化为带有逗号分隔值的文本文件

mylist = dir()
with open('filename.txt','w') as f:
    f.write( ','.join( mylist ) )

Serialize list into text file with comma sepparated value

mylist = dir()
with open('filename.txt','w') as f:
    f.write( ','.join( mylist ) )

回答 8

一般来说

以下是writelines()方法的语法

fileObject.writelines( sequence )

#!/usr/bin/python

# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]

# Write sequence of lines at the end of the file.
line = fo.writelines( seq )

# Close opend file
fo.close()

参考

http://www.tutorialspoint.com/python/file_writelines.htm

In General

Following is the syntax for writelines() method

fileObject.writelines( sequence )

Example

#!/usr/bin/python

# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]

# Write sequence of lines at the end of the file.
line = fo.writelines( seq )

# Close opend file
fo.close()

Reference

http://www.tutorialspoint.com/python/file_writelines.htm


回答 9

file.write('\n'.join(list))
file.write('\n'.join(list))

回答 10

with open ("test.txt","w")as fp:
   for line in list12:
       fp.write(line+"\n")
with open ("test.txt","w")as fp:
   for line in list12:
       fp.write(line+"\n")

回答 11

如果您使用的是python3,则还可以使用print函数,如下所示。

f = open("myfile.txt","wb")
print(mylist, file=f)

You can also use the print function if you’re on python3 as follows.

f = open("myfile.txt","wb")
print(mylist, file=f)

回答 12

你为什么不尝试

file.write(str(list))

Why don’t you try

file.write(str(list))

回答 13

此逻辑将首先将list中的项目转换为string(str)。有时列表中包含一个元组,例如

alist = [(i12,tiger), 
(113,lion)]

此逻辑将在新行中写入文件每个元组。我们稍后可以eval在读取文件时加载每个元组时使用:

outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence:    # iterate over the list items
   outfile.write(str(item) + '\n') # write to the file
outfile.close()   # close the file 

This logic will first convert the items in list to string(str). Sometimes the list contains a tuple like

alist = [(i12,tiger), 
(113,lion)]

This logic will write to file each tuple in a new line. We can later use eval while loading each tuple when reading the file:

outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence:    # iterate over the list items
   outfile.write(str(item) + '\n') # write to the file
outfile.close()   # close the file 

回答 14

迭代和添加换行符的另一种方法:

for item in items:
    filewriter.write(f"{item}" + "\n")

Another way of iterating and adding newline:

for item in items:
    filewriter.write(f"{item}" + "\n")

回答 15

在python> 3中,您可以将print*用于参数解包:

with open("fout.txt", "w") as fout:
    print(*my_list, sep="\n", file=fout)

In python>3 you can use print and * for argument unpacking:

with open("fout.txt", "w") as fout:
    print(*my_list, sep="\n", file=fout)

回答 16

Python3中,您可以使用此循环

with open('your_file.txt', 'w') as f:
    for item in list:
        f.print("", item)

In Python3 You Can use this loop

with open('your_file.txt', 'w') as f:
    for item in list:
        f.print("", item)

回答 17

让avg作为列表,然后:

In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')

您可以根据需要使用%e%s

Let avg be the list, then:

In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')

You can use %e or %s depending on your requirement.


回答 18

poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file

工作原理:首先,使用内置的打开功能打开文件,并指定文件名称和我们要打开文件的方式。该模式可以是读取模式(’r’),写入模式(’w’)或追加模式(’a’)。我们还可以指定是以文本模式(’t’)还是二进制模式(’b’)阅读,书写或追加内容。实际上,还有更多可用的模式,help(open)将为您提供有关它们的更多详细信息。默认情况下,open()将文件视为“ t”扩展文件,并以“ r’ead”模式将其打开。在我们的示例中,我们首先以写文本模式打开文件,然后使用文件对象的write方法写入文件,然后最终关闭文件。

上面的示例来自Swaroop C H. swaroopch.com 的书“ A Byte of Python”。

poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file

How It Works: First, open a file by using the built-in open function and specifying the name of the file and the mode in which we want to open the file. The mode can be a read mode (’r’), write mode (’w’) or append mode (’a’). We can also specify whether we are reading, writing, or appending in text mode (’t’) or binary mode (’b’). There are actually many more modes available and help(open) will give you more details about them. By default, open() considers the file to be a ’t’ext file and opens it in ’r’ead mode. In our example, we first open the file in write text mode and use the write method of the file object to write to the file and then we finally close the file.

The above example is from the book “A Byte of Python” by Swaroop C H. swaroopch.com


与Project Euler的速度比较:C,Python,Erlang,Haskell

问题:与Project Euler的速度比较:C,Python,Erlang,Haskell

我已经采取了问题#12项目欧拉作为编程锻炼和比较我的(肯定不是最优的)实现在C,Python和Erlang和Haskell的。为了获得更高的执行时间,我搜索的第一个三角形数的除数大于1000,而不是原始问题中所述的500。

结果如下:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

带有PyPy的Python:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

摘要:

  • C:100%
  • Python:692%(使用PyPy时为118%)
  • Erlang:436%(135%感谢RichardC)
  • 哈斯克尔:1421%

我认为C具有很大的优势,因为它使用long进行计算,而不使用其他任意三个整数。另外,它不需要先加载运行时(其他加载项吗?)。

问题1: Erlang,Python和Haskell是否会由于使用任意长度的整数而导致速度降低,或者只要值小于,它们是否会失去速度MAXINT

问题2: 为什么Haskell这么慢?是否有编译器标志可以使您刹车?或者它是我的实现?(后者很有可能是因为Haskell是一本对我有七个印章的书。)

问题3: 您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

编辑:

问题4: 我的功能实现是否允许LCO(最后一次调用优化,又称为尾部递归消除),从而避免在调用堆栈上添加不必要的帧?

尽管我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图在四种语言中尽可能地实现相同的算法。


使用的源代码:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.

The result is the following:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

Python with PyPy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

Summary:

  • C: 100%
  • Python: 692% (118% with PyPy)
  • Erlang: 436% (135% thanks to RichardC)
  • Haskell: 1421%

I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn’t need to load a runtime first (Do the others?).

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question 2: Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

EDIT:

Question 4: Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?

I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.


Source codes used:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

回答 0

使用GHC 7.0.3gcc 4.4.6Linux 2.6.29一个x86_64的Core2双核(2.5GHz的)机器上,编译使用ghc -O2 -fllvm -fforce-recomp用于Haskell和gcc -O3 -lm为C.

  • 您的C例程运行8.4秒(比您的运行速度快,原因可能是-O3
  • Haskell解决方案可在36秒内运行(由于显示-O2标记)
  • 您的factorCount'代码未明确输入且默认为Integer(感谢Daniel在这里纠正我的误诊!)。使用给出显式类型签名(无论如何都是标准做法)Int,时间更改为11.1秒
  • factorCount'你不必要的呼唤fromIntegral。修复不会导致任何变化(编译器很聪明,对您来说很幸运)。
  • modrem更快更充分的地方使用了。这将时间更改为8.5秒
  • factorCount'不断应用两个永不更改的自变量(numbersqrt)。工人/包装工人的转型为我们提供了:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

没错,7.95秒。始终比C解决方案快半秒。没有-fllvm标记,我仍然会收到消息8.182 seconds,因此NCG后端在这种情况下也表现良好。

结论:Haskell很棒。

结果代码

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

编辑:现在我们已经探索了,让我们解决问题

问题1:erlang,python和haskell是否由于使用任意长度的整数而失去速度,还是只要它们的值小于MAXINT,它们就不会丢失吗?

在Haskell中,使用Integer速度慢于Int但慢多少取决于所执行的计算。幸运的是(对于64位计算机)Int就足够了。出于可移植性的考虑,您可能应该重写我的代码以使用Int64Word64(C不是唯一带有的语言long)。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

这就是我上面回答的。答案是

  • 0)通过使用优化 -O2
  • 1)尽可能使用快速(特别是:不可装箱)类型
  • 2)rem不是mod(经常被遗忘的优化),并且
  • 3)worker / wrapper转换(也许是最常见的优化)。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,这不是问题。做得好,很高兴您考虑了这一点。

Using GHC 7.0.3, gcc 4.4.6, Linux 2.6.29 on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp for Haskell and gcc -O3 -lm for C.

  • Your C routine runs in 8.4 seconds (faster than your run probably because of -O3)
  • The Haskell solution runs in 36 seconds (due to the -O2 flag)
  • Your factorCount' code isn’t explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds
  • in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you).
  • You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds.
  • factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

That’s right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm flag I’m still getting 8.182 seconds, so the NCG backend is doing well in this case too.

Conclusion: Haskell is awesome.

Resulting Code

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

EDIT: So now that we’ve explored that, lets address the questions

Question 1: Do erlang, python and haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

In Haskell, using Integer is slower than Int but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int is sufficient. For portability sake you should probably rewrite my code to use Int64 or Word64 (C isn’t the only language with a long).

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

That was what I answered above. The answer was

  • 0) Use optimization via -O2
  • 1) Use fast (notably: unbox-able) types when possible
  • 2) rem not mod (a frequently forgotten optimization) and
  • 3) worker/wrapper transformation (perhaps the most common optimization).

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, that wasn’t the issue. Good work and glad you considered this.


回答 1

Erlang实现存在一些问题。作为以下内容的基准,我测得的未经修改的Erlang程序的执行时间为47.6秒,而C代码为12.7秒。

如果要运行计算密集型的Erlang代码,您应该做的第一件事就是使用本机代码。使用进行编译,erlc +native euler12时间缩短至41.3秒。但是,这比这种代码的本机编译要低得多(仅为15%),问题是您使用-compile(export_all)。这对于实验很有用,但是所有功能都可以从外部访问的事实导致本机编译器非常保守。(普通的BEAM仿真器不会受到太大影响。)用替换该声明可以-export([solve/0]).提供更好的加速:31.5秒(比基线快将近35%)。

但是代码本身有一个问题:对于factorCount循环中的每个迭代,您都可以执行以下测试:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C代码不会执行此操作。通常,在同一代码的不同实现之间进行公平的比较可能比较棘手,特别是如果算法是数字的,因为您需要确保它们实际上在做相同的事情。尽管某个实现最终会达到相同的结果,但由于某个地方的某些类型转换而导致的一种实现中的轻微舍入错误可能会导致其执行的迭代次数要多于另一个实现。

为了消除这种可能的错误源(并消除每次迭代中的额外测试),我重新编写了factorCount函数,如下所示,该函数非常类似于C代码:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

此重写(no export_all)和本机编译为我提供了以下运行时间:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

与C代码相比,还算不错:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

考虑到Erlang根本不适合编写数字代码,在这样的程序上,它仅比C慢50%。

最后,关于您的问题:

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

是的,有点。在Erlang中,没有办法说“结合使用32/64位算术”,因此,除非编译器可以证明整数的某些界限(通常不能),否则它必须检查所有计算以查看是否可以将它们放入单个标记的单词中,或者是否必须将它们转换为堆分配的bignum。即使在运行时实际上没有使用大数,也必须执行这些检查。另一方面,这意味着您知道,如果您突然给它比以前更大的输入,该算法将永远不会因为意外的整数环绕而失败。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,您的Erlang代码在上次通话优化方面是正确的。

There are some problems with the Erlang implementation. As baseline for the following, my measured execution time for your unmodified Erlang program was 47.6 seconds, compared to 12.7 seconds for the C code.

The first thing you should do if you want to run computationally intensive Erlang code is to use native code. Compiling with erlc +native euler12 got the time down to 41.3 seconds. This is however a much lower speedup (just 15%) than expected from native compilation on this kind of code, and the problem is your use of -compile(export_all). This is useful for experimentation, but the fact that all functions are potentially reachable from the outside causes the native compiler to be very conservative. (The normal BEAM emulator is not that much affected.) Replacing this declaration with -export([solve/0]). gives a much better speedup: 31.5 seconds (almost 35% from the baseline).

But the code itself has a problem: for each iteration in the factorCount loop, you perform this test:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

The C code doesn’t do this. In general, it can be tricky to make a fair comparison between different implementations of the same code, and in particular if the algorithm is numerical, because you need to be sure that they are actually doing the same thing. A slight rounding error in one implementation due to some typecast somewhere may cause it to do many more iterations than the other even though both eventually reach the same result.

To eliminate this possible error source (and get rid of the extra test in each iteration), I rewrote the factorCount function as follows, closely modelled on the C code:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

This rewrite, no export_all, and native compilation, gave me the following run time:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

which is not too bad compared to the C code:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

considering that Erlang is not at all geared towards writing numerical code, being only 50% slower than C on a program like this is pretty good.

Finally, regarding your questions:

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Yes, somewhat. In Erlang, there is no way of saying “use 32/64-bit arithmetic with wrap-around”, so unless the compiler can prove some bounds on your integers (and it usually can’t), it must check all computations to see if they can fit in a single tagged word or if it has to turn them into heap-allocated bignums. Even if no bignums are ever used in practice at runtime, these checks will have to be performed. On the other hand, that means you know that the algorithm will never fail because of an unexpected integer wraparound if you suddenly give it larger inputs than before.

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, your Erlang code is correct with respect to last call optimization.


回答 2

关于Python优化,除了使用PyPy(在代码零更改的情况下实现相当不错的加速)之外,您还可以使用PyPy的翻译工具链来编译兼容RPython的版本,或者使用Cython来构建扩展模块,两者Cython模块比我的测试中的C版本要快,而Cython模块的速度几乎是后者的两倍。作为参考,我还包括C和PyPy基准测试结果:

C(与编译gcc -O3 -lm

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython(使用最新的PyPy修订版c2f583445aee

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

RPython版本有几个关键更改。要转换为独立程序,您需要定义您的target,在这种情况下为main函数。它应该接受,sys.argv因为它是唯一的参数,并且需要返回一个int。您可以使用translate.py对其进行翻译,% translate.py euler12-rpython.py后者将转换为C并为您编译。

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

Cython版本被重写为扩展模块_euler12.pyx,我从普通的python文件导入并调用了该模块。在_euler12.pyx本质上是相同的版本,有一些额外的静态类型声明。setup.py具有使用来构建扩展的普通样板python setup.py build_ext --inplace

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

老实说,我对RPython或Cython的经验很少,并对结果感到惊喜。如果您使用的是CPython,则在Cython扩展模块中编写CPU密集型代码似乎是优化程序的一种简便方法。

In regards to Python optimization, in addition to using PyPy (for pretty impressive speed-ups with zero change to your code), you could use PyPy’s translation toolchain to compile an RPython-compliant version, or Cython to build an extension module, both of which are faster than the C version in my testing, with the Cython module nearly twice as fast. For reference I include C and PyPy benchmark results as well:

C (compiled with gcc -O3 -lm)

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython (using latest PyPy revision, c2f583445aee)

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

The RPython version has a couple of key changes. To translate into a standalone program you need to define your target, which in this case is the main function. It’s expected to accept sys.argv as it’s only argument, and is required to return an int. You can translate it by using translate.py, % translate.py euler12-rpython.py which translates to C and compiles it for you.

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

The Cython version was rewritten as an extension module _euler12.pyx, which I import and call from a normal python file. The _euler12.pyx is essentially the same as your version, with some additional static type declarations. The setup.py has the normal boilerplate to build the extension, using python setup.py build_ext --inplace.

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

I honestly have very little experience with either RPython or Cython, and was pleasantly surprised at the results. If you are using CPython, writing your CPU-intensive bits of code in a Cython extension module seems like a really easy way to optimize your program.


回答 3

问题3:您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

C实现是次优的(由Thomas M. DuBuisson暗示),该版本使用64位整数(即long数据类型)。稍后我将研究程序集列表,但经过深思熟虑的猜测,编译后的代码中存在一些内存访问,这使得使用64位整数的速度大大降低。这就是生成的代码(因为可以在SSE寄存器中容纳更少的64位整数,或者将双精度数舍入为64位整数的事实比较慢)。

这是修改后的代码(只需将int替换为long,并且我显式内联了factorCount,尽管我认为gcc -O3不需要这样做):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

运行+计时它给出:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

作为参考,Thomas在前面的答案中的haskell实现给出:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

结论:ghc是一个很棒的编译器,它没有任何缺点,但是gcc通常会生成更快的代码。

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

The C implementation is suboptimal (as hinted at by Thomas M. DuBuisson), the version uses 64-bit integers (i.e. long datatype). I’ll investigate the assembly listing later, but with an educated guess, there are some memory accesses going on in the compiled code, which make using 64-bit integers significantly slower. It’s that or generated code (be it the fact that you can fit less 64-bit ints in a SSE register or round a double to a 64-bit integer is slower).

Here is the modified code (simply replace long with int and I explicitly inlined factorCount, although I do not think that this is necessary with gcc -O3):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

Running + timing it gives:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

For reference, the haskell implementation by Thomas in the earlier answer gives:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

Conclusion: Taking nothing away from ghc, its a great compiler, but gcc normally generates faster code.


回答 4

看一下这个博客。在过去的一年左右的时间里,他用Haskell和Python解决了一些Project Euler问题,并且他通常发现Haskell的速度要快得多。我认为在这些语言之间,它与您的流畅程度和编码风格有关。

说到Python速度,您使用的是错误的实现!尝试使用PyPy,对于这样的事情,您会发现它的运行速度要快得多。

Take a look at this blog. Over the past year or so he’s done a few of the Project Euler problems in Haskell and Python, and he’s generally found Haskell to be much faster. I think that between those languages it has more to do with your fluency and coding style.

When it comes to Python speed, you’re using the wrong implementation! Try PyPy, and for things like this you’ll find it to be much, much faster.


回答 5

通过使用Haskell软件包中的某些功能,可以大大加快Haskell的实现。在这种情况下,我使用了素数,它仅与“ cabal install primes”一起安装;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

时间:

您的原始程序:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

改进的实施

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

如您所见,这台计算机在您运行16秒钟的同一台计算机上运行的时间为38毫秒:)

编译命令:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

Your Haskell implementation could be greatly sped up by using some functions from Haskell packages. In this case I used primes, which is just installed with ‘cabal install primes’ ;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

Timings:

Your original program:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

Improved implementation

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

As you can see, this one runs in 38 milliseconds on the same machine where yours ran in 16 seconds :)

Compilation commands:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

回答 6

纯娱乐。以下是更“原生”的Haskell实现:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

使用 ghc -O3,这可以在我的机器(1.73GHz Core i7)上持续运行0.55-0.58秒。

对于C版本,更有效的factorCount函数:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

使用来将main中的longs更改为int gcc -O3 -lm,这始终在0.31-0.35秒内运行。

如果利用第n个三角形数= n *(n + 1)/ 2,并且n和(n + 1)具有完全不同的素因式分解这一事实,这两个函数都可以运行得更快。可以乘以一半以找出整体的因素数量。以下:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

会将c代码运行时间减少到0.17-0.19秒,并且它可以处理更大的搜索-大于10000的因素在我的计算机上花费大约43秒。我对感兴趣的读者留下了类似的haskell提速。

Just for fun. The following is a more ‘native’ Haskell implementation:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

Using ghc -O3, this consistently runs in 0.55-0.58 seconds on my machine (1.73GHz Core i7).

A more efficient factorCount function for the C version:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

Changing longs to ints in main, using gcc -O3 -lm, this consistently runs in 0.31-0.35 seconds.

Both can be made to run even faster if you take advantage of the fact that the nth triangle number = n*(n+1)/2, and n and (n+1) have completely disparate prime factorizations, so the number of factors of each half can be multiplied to find the number of factors of the whole. The following:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

will reduce the c code run time to 0.17-0.19 seconds, and it can handle much larger searches — greater than 10000 factors takes about 43 seconds on my machine. I leave a similar haskell speedup to the interested reader.


回答 7

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

这不太可能。关于Erlang和Haskell,我不能说太多(嗯,下面可能是关于Haskell的话),但是我可以指出Python中的许多其他瓶颈。程序每次尝试使用Python中的某些值执行操作时,都应验证这些值是否来自正确的类型,这会花费一些时间。您的factorCount函数只是分配一个具有range (1, isquare + 1)不同时间的列表,而运行时malloc风格的内存分配要比在C语言中对带有计数器的范围进行迭代要慢得多。factorCount()方法被多次调用,因此分配了很多列表。另外,让我们不要忘记Python是被解释的,而CPython解释器并没有将重点放在优化上。

编辑:哦,嗯,我注意到您正在使用Python 3,因此range()不会返回列表,而是返回一个生成器。在这种情况下,我关于分配列表的观点是错误的:该函数只是分配range对象,尽管这些对象效率不高,但效率却不如分配包含许多项目的列表。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

您在使用拥抱吗?拥抱是一个相当慢的解释器。如果您正在使用它,也许您可​​以获得GHC的美好时光 -但我只是混淆假设,一个好的Haskell编译器在幕后所做的工作非常有趣,并且超出了我的理解范围:)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

我会说你在玩一个有趣的游戏。了解各种语言的最好的部分就是尽可能以最不同的方式使用它们:)但是我离题,我只是对此没有任何建议。抱歉,在这种情况下,希望有人能为您提供帮助:)

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

据我所记得,您只需要确保递归调用是返回值之前的最后一个命令。换句话说,下面的函数可以使用这种优化:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

但是,如果您的函数如下所示,则您将没有这样的优化,因为在递归调用之后有一个运算(乘法):

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

我将操作分为一些局部变量,以明确执行哪些操作。但是,最常见的是如下所示查看这些功能,但对于我要指出的内容,它们是等效的:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

注意,由编译器/解释器决定是否进行尾递归。例如,如果我没有记错的话,Python解释器就不会这样做(我之所以使用Python是因为它的语法很流利)。无论如何,如果您发现奇怪的东西,例如带有两个参数的阶乘函数(并且其中一个参数的名称具有诸如accaccumulator等等),现在您知道人们为什么这样做了:)

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

This is unlikely. I cannot say much about Erlang and Haskell (well, maybe a bit about Haskell below) but I can point a lot of other bottlenecks in Python. Every time the program tries to execute an operation with some values in Python, it should verify whether the values are from the proper type, and it costs a bit of time. Your factorCount function just allocates a list with range (1, isquare + 1) various times, and runtime, malloc-styled memory allocation is way slower than iterating on a range with a counter as you do in C. Notably, the factorCount() is called multiple times and so allocates a lot of lists. Also, let us not forget that Python is interpreted and the CPython interpreter has no great focus on being optimized.

EDIT: oh, well, I note that you are using Python 3 so range() does not return a list, but a generator. In this case, my point about allocating lists is half-wrong: the function just allocates range objects, which are inefficient nonetheless but not as inefficient as allocating a list with a lot of items.

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Are you using Hugs? Hugs is a considerably slow interpreter. If you are using it, maybe you can get a better time with GHC – but I am only cogitating hypotesis, the kind of stuff a good Haskell compiler does under the hood is pretty fascinating and way beyond my comprehension :)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

I’d say you are playing an unfunny game. The best part of knowing various languages is to use them the most different way possible :) But I digress, I just do not have any recommendation for this point. Sorry, I hope someone can help you in this case :)

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

As far as I remember, you just need to make sure that your recursive call is the last command before returning a value. In other words, a function like the one below could use such optimization:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

However, you would not have such optimization if your function were such as the one below, because there is an operation (multiplication) after the recursive call:

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

I separated the operations in some local variables for make it clear which operations are executed. However, the most usual is to see these functions as below, but they are equivalent for the point I am making:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

Note that it is up to the compiler/interpreter to decide if it will make tail recursion. For example, the Python interpreter does not do it if I remember well (I used Python in my example only because of its fluent syntax). Anyway, if you find strange stuff such as factorial functions with two parameters (and one of the parameters has names such as acc, accumulator etc.) now you know why people do it :)


回答 8

使用Haskell,您实际上不需要显式考虑递归。

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

在上面的代码中,我用普通的列表操作替换了@Thomas答案中的显式递归。该代码仍然执行完全相同的操作,而无需我们担心尾部递归。它运行(〜7.49s)比@Thomas的答案(〜7.04s)在我的装有GHC 7.6.2的机器上的速度慢约6%,而@Raedwulf的C版本的运行速度约为3.15s。一年以来,GHC似乎有所改善。

PS。我知道这是一个老问题,我在Google搜索中偶然发现了这个问题(现在我忘记了正在搜索的内容…)。只是想评论有关LCO的问题,并总体上表达我对Haskell的看法。我想对最佳答案发表评论,但评论不允许使用代码块。

With Haskell, you really don’t need to think in recursions explicitly.

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

In the above code, I have replaced explicit recursions in @Thomas’ answer with common list operations. The code still does exactly the same thing without us worrying about tail recursion. It runs (~ 7.49s) about 6% slower than the version in @Thomas’ answer (~ 7.04s) on my machine with GHC 7.6.2, while the C version from @Raedwulf runs ~ 3.15s. It seems GHC has improved over the year.

PS. I know it is an old question, and I stumble upon it from google searches (I forgot what I was searching, now…). Just wanted to comment on the question about LCO and express my feelings about Haskell in general. I wanted to comment on the top answer, but comments do not allow code blocks.


回答 9

有关C版本的更多数字和说明。显然,这些年来没有人这样做。记住要对这个答案进行投票,这样它才能在每个人的视野和学习中获得领先。

第一步:作者程序的基准

笔记本电脑规格:

  • CPU i3 M380(931 MHz-最大省电模式)
  • 4GB内存
  • Win7 64位
  • Microsoft Visual Studio 2012旗舰版
  • Cygwin与GCC 4.9.3
  • Python 2.7.10

命令:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

文件名是: integertype_architecture_compiler.exe

  • 现在,integertype与原始程序相同(稍后会详细介绍)
  • 架构是x86还是x64,具体取决于编译器设置
  • 编译器是gcc还是vs2012

第二步:再次调查,改进并进行基准测试

VS比gcc快250%。两种编译器的速度应该相似。显然,代码或编译器选项出了点问题。让我们调查一下!

第一个关注点是整数类型。转换可能会很昂贵,并且一致性对于更好的代码生成和优化很重要。所有整数应为同一类型。

这是一个混合的混乱intlong现在。我们将对此进行改进。使用什么类型?最快的。要对它们进行基准测试!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

整数类型为int long int32_t uint32_t int64_tand uint64_tfrom#include <stdint.h>

在C中有很多整数类型,还有一些带符号/无符号可玩,以及选择编译为x86或x64(不要与实际整数大小混淆)的选择。有很多版本可以编译和运行^^

第三步:了解数字

明确的结论:

  • 32位整数比64位等效项快200%
  • 无符号的64位整数比有符号的64位快25%(不幸的是,我对此没有任何解释)

技巧问题:“ C中int和long的大小是多少?”
正确的答案是:int的大小和C中的long大小没有明确定义!

从C规范:

int至少为32位
长至少是一个int

从gcc手册页(-m32和-m64标志):

32位环境将int,long和指针设置为32位,并生成可在任何i386系统上运行的代码。
64位环境将int设置为32位长,将指针设置为64位,并为AMD的x86-64体系结构生成代码。

从MSDN文档(数据类型范围)https://msdn.microsoft.com/zh-cn/library/s3f49ktz%28v=vs.110%29.aspx

int,4字节,也称为有符号
long,4字节,也称为long int和有符号long int

结论:吸取的教训

  • 32位整数比64位整数快。

  • 标准整数类型在C和C ++中都没有很好的定义,它们随编译器和体系结构的不同而不同。需要一致性和可预测性时,请使用中的uint32_t整数族#include <stdint.h>

  • 速度问题解决了。所有其他语言都落后了数百%,C&C ++再次获胜!他们总是这样做。下一个改进将是使用OpenMP:D的多线程

Some more numbers and explanations for the C version. Apparently noone did it during all those years. Remember to upvote this answer so it can get on top for everyone to see and learn.

Step One: Benchmark of the author’s programs

Laptop Specifications:

  • CPU i3 M380 (931 MHz – maximum battery saving mode)
  • 4GB memory
  • Win7 64 bits
  • Microsoft Visual Studio 2012 Ultimate
  • Cygwin with gcc 4.9.3
  • Python 2.7.10

Commands:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

.

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

Filenames are: integertype_architecture_compiler.exe

  • integertype is the same as the original program for now (more on that later)
  • architecture is x86 or x64 depending on the compiler settings
  • compiler is gcc or vs2012

Step Two: Investigate, Improve and Benchmark Again

VS is 250% faster than gcc. The two compilers should give a similar speed. Obviously, something is wrong with the code or the compiler options. Let’s investigate!

The first point of interest is the integer types. Conversions can be expensive and consistency is important for better code generation & optimizations. All integers should be the same type.

It’s a mixed mess of int and long right now. We’re going to improve that. What type to use? The fastest. Gotta benchmark them’all!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

Integer types are int long int32_t uint32_t int64_t and uint64_t from #include <stdint.h>

There are LOTS of integer types in C, plus some signed/unsigned to play with, plus the choice to compile as x86 or x64 (not to be confused with the actual integer size). That is a lot of versions to compile and run ^^

Step Three: Understanding the Numbers

Definitive conclusions:

  • 32 bits integers are ~200% faster than 64 bits equivalents
  • unsigned 64 bits integers are 25 % faster than signed 64 bits (Unfortunately, I have no explanation for that)

Trick question: “What are the sizes of int and long in C?”
The right answer is: The size of int and long in C are not well-defined!

From the C spec:

int is at least 32 bits
long is at least an int

From the gcc man page (-m32 and -m64 flags):

The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system.
The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD’s x86-64 architecture.

From MSDN documentation (Data Type Ranges) https://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.110%29.aspx :

int, 4 bytes, also knows as signed
long, 4 bytes, also knows as long int and signed long int

To Conclude This: Lessons Learned

  • 32 bits integers are faster than 64 bits integers.

  • Standard integers types are not well defined in C nor C++, they vary depending on compilers and architectures. When you need consistency and predictability, use the uint32_t integer family from #include <stdint.h>.

  • Speed issues solved. All other languages are back hundreds percent behind, C & C++ win again ! They always do. The next improvement will be multithreading using OpenMP :D


回答 10

查看您的Erlang实现。时间包括启动整个虚拟机,运行程序以及停止虚拟机。非常确定,设置和停止erlang vm会花费一些时间。

如果计时是在erlang虚拟机本身内部完成的,则结果将有所不同,因为在这种情况下,我们仅拥有有关程序的实际时间。否则,我相信启动和加载Erlang Vm的过程加上停止它(如您在程序中所述)所花费的总时间都包含在您用于计时的方法的总时间中。程序正在输出。考虑使用erlang计时本身,当我们想在虚拟机本身中计时程序时使用 timer:tc/1 or timer:tc/2 or timer:tc/3。这样,来自erlang的结果将排除启动和停止/杀死/停止虚拟机所花费的时间。那就是我的理由,考虑一下,然后再次尝试基准测试。

我实际上建议我们尝试在那些语言的运行时内对那些具有运行时的语言计时(对于具有运行时的语言),以便获得精确的值。例如,C与Erlang,Python和Haskell一样,没有启动和关闭运行时系统的开销(对此有98%的把握-我可以纠正)。因此(基于此推理),我最后说,该基准对于在运行时系统之上运行的语言而言不够精确/不够合理。让我们再次进行这些更改。

编辑:即使所有语言都有运行时系统,启动每种语言和停止它的开销也会有所不同。所以我建议我们从运行时系统中进行计时(适用于这种语言)。众所周知,Erlang VM在启动时会有相当大的开销!

Looking at your Erlang implementation. The timing has included the start up of the entire virtual machine, running your program and halting the virtual machine. Am pretty sure that setting up and halting the erlang vm takes some time.

If the timing was done within the erlang virtual machine itself, results would be different as in that case we would have the actual time for only the program in question. Otherwise, i believe that the total time taken by the process of starting and loading of the Erlang Vm plus that of halting it (as you put it in your program) are all included in the total time which the method you are using to time the program is outputting. Consider using the erlang timing itself which we use when we want to time our programs within the virtual machine itself timer:tc/1 or timer:tc/2 or timer:tc/3. In this way, the results from erlang will exclude the time taken to start and stop/kill/halt the virtual machine. That is my reasoning there, think about it, and then try your bench mark again.

I actually suggest that we try to time the program (for languages that have a runtime), within the runtime of those languages in order to get a precise value. C for example has no overhead of starting and shutting down a runtime system as does Erlang, Python and Haskell (98% sure of this – i stand correction). So (based on this reasoning) i conclude by saying that this benchmark wasnot precise /fair enough for languages running on top of a runtime system. Lets do it again with these changes.

EDIT: besides even if all the languages had runtime systems, the overhead of starting each and halting it would differ. so i suggest we time from within the runtime systems (for the languages for which this applies). The Erlang VM is known to have considerable overhead at start up!


回答 11

问题1:Erlang,Python和Haskell是否由于使用任意长度的整数而导致速度下降,还是只要它们的值小于MAXINT,它们就不会丢失吗?

对于Erlang,可以否定回答第一个问题。通过适当地使用Erlang可以回答最后一个问题,如下所示:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

因为它比您最初的C示例要快,所以我想会有很多问题,因为其他问题已经详细介绍了。

这个Erlang模块在便宜的上网本上执行大约需要5秒钟的时间。它使用erlang中的网络线程模型,从而演示了如何利用事件模型。它可以分布在许多节点上。而且速度很快。不是我的代码。

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

以下测试在以下环境下进行:Intel(R)Atom(TM)CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question one can be answered in the negative for Erlang. The last question is answered by using Erlang appropriately, as in:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

Since it’s faster than your initial C example, I would guess there are numerous problems as others have already covered in detail.

This Erlang module executes on a cheap netbook in about 5 seconds … It uses the network threads model in erlang and, as such demonstrates how to take advantage of the event model. It could be distributed over many nodes. And it’s fast. Not my code.

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

The test below took place on an: Intel(R) Atom(TM) CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

回答 12

C ++ 11,<我20ms的运行在这里

我了解您想获得一些技巧来帮助您改善特定于语言的知识,但是由于本文对此已作了详尽介绍,因此我想为可能看过您对问题的数学评论等内容的人添加一些上下文,并想知道为什么这样做代码太慢了。

该答案主要是提供上下文,以希望帮助人们更轻松地评估您的问题/其他答案中的代码。

这段代码仅基于以下几点使用了与丑陋语言无关的(丑陋的)优化:

  1. 每个训练序列号的形式为n(n + 1)/ 2
  2. n和n + 1是互质的
  3. 除数是一个乘法函数

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

我的台式机平均要花19毫秒,而笔记本电脑平均要花80毫秒,这与我在这里看到的大多数其他代码相去甚远。毫无疑问,仍有许多优化措施可用。

C++11, < 20ms for meRun it here

I understand that you want tips to help improve your language specific knowledge, but since that has been well covered here, I thought I would add some context for people who may have looked at the mathematica comment on your question, etc, and wondered why this code was so much slower.

This answer is mainly to provide context to hopefully help people evaluate the code in your question / other answers more easily.

This code uses only a couple of (uglyish) optimisations, unrelated to the language used, based on:

  1. every traingle number is of the form n(n+1)/2
  2. n and n+1 are coprime
  3. the number of divisors is a multiplicative function

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

That takes around 19ms on average for my desktop and 80ms for my laptop, a far cry from most of the other code I’ve seen here. And there are, no doubt, many optimisations still available.


回答 13

尝试去:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

我得到:

原始C版本:9.1690 100%
执行:8.2520 111%

但是使用:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

我得到:

原始c版本:9.1690 100%
thaumkid的c版本:0.1060 8650%
初版:8.2520 111%
次版:0.0230 39865%

我也尝试了Python3.6和pypy3.3-5.5-alpha:

原始C版本:8.629 100%
thaumkid的C版本:0.109 7916%
Python3.6:54.795 16%
pypy3.3-5.5-alpha:13.291 65%

然后用下面的代码我得到:

原始c版本:8.629 100%
thaumkid的c版本:0.109 8650%
Python3.6:1.489 580%
pypy3.3-5.5-alpha:0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

Trying GO:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

I get:

original c version: 9.1690 100%
go: 8.2520 111%

But using:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

I get:

original c version: 9.1690 100%
thaumkid’s c version: 0.1060 8650%
first go version: 8.2520 111%
second go version: 0.0230 39865%

I also tried Python3.6 and pypy3.3-5.5-alpha:

original c version: 8.629 100%
thaumkid’s c version: 0.109 7916%
Python3.6: 54.795 16%
pypy3.3-5.5-alpha: 13.291 65%

and then with following code I got:

original c version: 8.629 100%
thaumkid’s c version: 0.109 8650%
Python3.6: 1.489 580%
pypy3.3-5.5-alpha: 0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

回答 14

更改: case (divisor(T,round(math:sqrt(T))) > 500) of

至: case (divisor(T,round(math:sqrt(T))) > 1000) of

这将为Erlang多进程示例提供正确的答案。

Change: case (divisor(T,round(math:sqrt(T))) > 500) of

To: case (divisor(T,round(math:sqrt(T))) > 1000) of

This will produce the correct answer for the Erlang multi-process example.


回答 15

我假设只有在涉及的因素有很多小因素的情况下,因素的数量才会很大。因此,我使用了thaumkid出色的算法,但首先使用了对因数的近似值,该值永远不会太小。这很简单:检查最多29个主要因子,然后检查剩余数量并计算n个因子的上限。用它来计算因子数量的上限,如果该数量足够高,请计算因子的确切数量。

下面的代码不需要出于正确性的这种假设,而是要快速。似乎可行;十万分之一的数字中只有大约一个给出的估计值足够高,需要进行全面检查。

这是代码:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

这将在0.7秒内找到第14,753,024个三角形,其中有13824个因数;在34秒内找到了879,207,615个三角形,其中有61,440个因数;在10分钟5秒内发现了第123,524,486,975个三角形,具有138,240个因数;在第26,467,792,064个三角形中,具有172,032个因数。 21分25秒(2.4 GHz Core2 Duo),因此该代码平均每个数字仅需要116个处理器周期。最后一个三角形数字本身大于2 ^ 68,因此

I made the assumption that the number of factors is only large if the numbers involved have many small factors. So I used thaumkid’s excellent algorithm, but first used an approximation to the factor count that is never too small. It’s quite simple: Check for prime factors up to 29, then check the remaining number and calculate an upper bound for the nmber of factors. Use this to calculate an upper bound for the number of factors, and if that number is high enough, calculate the exact number of factors.

The code below doesn’t need this assumption for correctness, but to be fast. It seems to work; only about one in 100,000 numbers gives an estimate that is high enough to require a full check.

Here’s the code:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

This finds the 14,753,024th triangular with 13824 factors in about 0.7 seconds, the 879,207,615th triangular number with 61,440 factors in 34 seconds, the 12,524,486,975th triangular number with 138,240 factors in 10 minutes 5 seconds, and the 26,467,792,064th triangular number with 172,032 factors in 21 minutes 25 seconds (2.4GHz Core2 Duo), so this code takes only 116 processor cycles per number on average. The last triangular number itself is larger than 2^68, so


回答 16

我将“ Jannich Brendle”版本修改为1000,而不是500。并列出了euler12.bin,euler12.erl,p12dist.erl的结果。两种erl代码都使用“ + native”进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

I modified “Jannich Brendle” version to 1000 instead 500. And list the result of euler12.bin, euler12.erl, p12dist.erl. Both erl codes use ‘+native’ to compile.

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

回答 17

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

超时时间

2.79s用户0.00s系统99%cpu 2.794

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

time ./a.out

2.79s user 0.00s system 99% cpu 2.794 total


如何在Python中使用“ with open”打开多个文件?

问题:如何在Python中使用“ with open”打开多个文件?

我想一次更改几个文件,前提是我可以写入所有文件。我想知道是否可以将多个打开的调用与该with语句组合:

try:
  with open('a', 'w') as a and open('b', 'w') as b:
    do_something()
except IOError as e:
  print 'Operation failed: %s' % e.strerror

如果不可能,那么解决该问题的优雅解决方案会是什么样?

I want to change a couple of files at one time, iff I can write to all of them. I’m wondering if I somehow can combine the multiple open calls with the with statement:

try:
  with open('a', 'w') as a and open('b', 'w') as b:
    do_something()
except IOError as e:
  print 'Operation failed: %s' % e.strerror

If that’s not possible, what would an elegant solution to this problem look like?


回答 0

从Python 2.7(或分别为3.1)开始,您可以编写

with open('a', 'w') as a, open('b', 'w') as b:
    do_something()

在Python的早期版本中,有时可以使用 contextlib.nested()嵌套上下文管理器。但是,这对于打开多个文件无法正常工作-有关详细信息,请参见链接的文档。


在极少数情况下,您想一次同时打开可变数量的文件,可以使用contextlib.ExitStack从Python 3.3版开始的:

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # Do something with "files"

大多数情况下,您拥有一组可变的文件,但是您可能想要一个接一个地打开它们。

As of Python 2.7 (or 3.1 respectively) you can write

with open('a', 'w') as a, open('b', 'w') as b:
    do_something()

In earlier versions of Python, you can sometimes use contextlib.nested() to nest context managers. This won’t work as expected for opening multiples files, though — see the linked documentation for details.


In the rare case that you want to open a variable number of files all at the same time, you can use contextlib.ExitStack, starting from Python version 3.3:

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # Do something with "files"

Most of the time you have a variable set of files, you likely want to open them one after the other, though.


回答 1

只需替换and,,您就可以完成:

try:
    with open('a', 'w') as a, open('b', 'w') as b:
        do_something()
except IOError as e:
    print 'Operation failed: %s' % e.strerror

Just replace and with , and you’re done:

try:
    with open('a', 'w') as a, open('b', 'w') as b:
        do_something()
except IOError as e:
    print 'Operation failed: %s' % e.strerror

回答 2

对于一次打开多个文件或较长的文件路径,将内容分解成多行可能很有用。@Sven Marnach在《Python样式指南》中对另一个答案的评论中建议:

with open('/path/to/InFile.ext', 'r') as file_1, \
     open('/path/to/OutFile.ext', 'w') as file_2:
    file_2.write(file_1.read())

For opening many files at once or for long file paths, it may be useful to break things up over multiple lines. From the Python Style Guide as suggested by @Sven Marnach in comments to another answer:

with open('/path/to/InFile.ext', 'r') as file_1, \
     open('/path/to/OutFile.ext', 'w') as file_2:
    file_2.write(file_1.read())

回答 3

嵌套语句可以完成相同的工作,我认为处理起来更简单。

假设您有inFile.txt,并想同时将其写入两个outFile。

with open("inFile.txt", 'r') as fr:
    with open("outFile1.txt", 'w') as fw1:
        with open("outFile2.txt", 'w') as fw2:
            for line in fr.readlines():
                fw1.writelines(line)
                fw2.writelines(line)

编辑:

我不了解拒绝投票的原因。我在发布答案之前测试了我的代码,它可以按预期工作:就像问题所要求的那样,它写入所有outFile。无重复写入或写入失败。所以我真的很想知道为什么我的答案被认为是错误的,次优的或类似的东西。

Nested with statements will do the same job, and in my opinion, are more straightforward to deal with.

Let’s say you have inFile.txt, and want to write it into two outFile’s simultaneously.

with open("inFile.txt", 'r') as fr:
    with open("outFile1.txt", 'w') as fw1:
        with open("outFile2.txt", 'w') as fw2:
            for line in fr.readlines():
                fw1.writelines(line)
                fw2.writelines(line)

EDIT:

I don’t understand the reason of the downvote. I tested my code before publishing my answer, and it works as desired: It writes to all of outFile’s, just as the question asks. No duplicate writing or failing to write. So I am really curious to know why my answer is considered to be wrong, suboptimal or anything like that.


回答 4

因为Python 3.3,你可以使用类ExitStackcontextlib模块到安全地
打开文件的任意数量

它可以管理动态数量的上下文感知对象,这意味着如果您不知道要处理多少文件,它将证明特别有用。

实际上,文档中提到的规范用例正在管理动态数量的文件。

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # All opened files will automatically be closed at the end of
    # the with statement, even if attempts to open files later
    # in the list raise an exception

如果您对这些细节感兴趣,下面是一个通用示例,以说明ExitStack操作方式:

from contextlib import ExitStack

class X:
    num = 1

    def __init__(self):
        self.num = X.num
        X.num += 1

    def __repr__(self):
        cls = type(self)
        return '{cls.__name__}{self.num}'.format(cls=cls, self=self)

    def __enter__(self):
        print('enter {!r}'.format(self))
        return self.num

    def __exit__(self, exc_type, exc_value, traceback):
        print('exit {!r}'.format(self))
        return True

xs = [X() for _ in range(3)]

with ExitStack() as stack:
    print(len(stack._exit_callbacks)) # number of callbacks called on exit
    nums = [stack.enter_context(x) for x in xs]
    print(len(stack._exit_callbacks))

print(len(stack._exit_callbacks))
print(nums)

输出:

0
enter X1
enter X2
enter X3
3
exit X3
exit X2
exit X1
0
[1, 2, 3]

Since Python 3.3, you can use the class ExitStack from the contextlib module to safely
open an arbitrary number of files.

It can manage a dynamic number of context-aware objects, which means that it will prove especially useful if you don’t know how many files you are going to handle.

In fact, the canonical use-case that is mentioned in the documentation is managing a dynamic number of files.

with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    # All opened files will automatically be closed at the end of
    # the with statement, even if attempts to open files later
    # in the list raise an exception

If you are interested in the details, here is a generic example in order to explain how ExitStack operates:

from contextlib import ExitStack

class X:
    num = 1

    def __init__(self):
        self.num = X.num
        X.num += 1

    def __repr__(self):
        cls = type(self)
        return '{cls.__name__}{self.num}'.format(cls=cls, self=self)

    def __enter__(self):
        print('enter {!r}'.format(self))
        return self.num

    def __exit__(self, exc_type, exc_value, traceback):
        print('exit {!r}'.format(self))
        return True

xs = [X() for _ in range(3)]

with ExitStack() as stack:
    print(len(stack._exit_callbacks)) # number of callbacks called on exit
    nums = [stack.enter_context(x) for x in xs]
    print(len(stack._exit_callbacks))

print(len(stack._exit_callbacks))
print(nums)

Output:

0
enter X1
enter X2
enter X3
3
exit X3
exit X2
exit X1
0
[1, 2, 3]

回答 5

使用python 2.6它将无法正常工作,我们必须使用以下方式打开多个文件:

with open('a', 'w') as a:
    with open('b', 'w') as b:

With python 2.6 It will not work, we have to use below way to open multiple files:

with open('a', 'w') as a:
    with open('b', 'w') as b:

回答 6

回答较晚(8年),但对于希望将多个文件合并为一个文件的人,以下功能可能会有所帮助:

def multi_open(_list):
    out=""
    for x in _list:
        try:
            with open(x) as f:
                out+=f.read()
        except:
            pass
            # print(f"Cannot open file {x}")
    return(out)

fl = ["C:/bdlog.txt", "C:/Jts/tws.vmoptions", "C:/not.exist"]
print(multi_open(fl))

2018-10-23 19:18:11.361 PROFILE  [Stop Drivers] [1ms]
2018-10-23 19:18:11.361 PROFILE  [Parental uninit] [0ms]
...
# This file contains VM parameters for Trader Workstation.
# Each parameter should be defined in a separate line and the
...

Late answer (8 yrs), but for someone looking to join multiple files into one, the following function may be of help:

def multi_open(_list):
    out=""
    for x in _list:
        try:
            with open(x) as f:
                out+=f.read()
        except:
            pass
            # print(f"Cannot open file {x}")
    return(out)

fl = ["C:/bdlog.txt", "C:/Jts/tws.vmoptions", "C:/not.exist"]
print(multi_open(fl))

2018-10-23 19:18:11.361 PROFILE  [Stop Drivers] [1ms]
2018-10-23 19:18:11.361 PROFILE  [Parental uninit] [0ms]
...
# This file contains VM parameters for Trader Workstation.
# Each parameter should be defined in a separate line and the
...

将字符串拆分为具有多个单词边界定界符的单词

问题:将字符串拆分为具有多个单词边界定界符的单词

我认为我想做的是一项相当普通的任务,但是我在网络上找不到任何参考。我的文字带有标点符号,我想要一个单词列表。

"Hey, you - what are you doing here!?"

应该

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

但是Python str.split()只能使用一个参数,因此在用空格分割后,所有单词都带有标点符号。有任何想法吗?

I think what I want to do is a fairly common task but I’ve found no reference on the web. I have text with punctuation, and I want a list of the words.

"Hey, you - what are you doing here!?"

should be

['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

But Python’s str.split() only works with one argument, so I have all words with the punctuation after I split with whitespace. Any ideas?


回答 0

正则表达式合理的情况:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

A case where regular expressions are justified:

import re
DATA = "Hey, you - what are you doing here!?"
print re.findall(r"[\w']+", DATA)
# Prints ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

回答 1

re.split()

re.split(pattern,string [,maxsplit = 0])

按模式分割字符串。如果在模式中使用了捕获括号,则模式中所有组的文本也将作为结果列表的一部分返回。如果maxsplit不为零,则最多会发生maxsplit分割,并将字符串的其余部分作为列表的最后一个元素返回。(不兼容说明:在原始的Python 1.5发行版中,maxsplit被忽略。此问题已在以后的发行版中修复。)

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']

re.split()

re.split(pattern, string[, maxsplit=0])

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. If maxsplit is nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the list. (Incompatibility note: in the original Python 1.5 release, maxsplit was ignored. This has been fixed in later releases.)

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']

回答 2

另一种无需使用正则表达式的快速方法是首先替换字符,如下所示:

>>> 'a;bcd,ef g'.replace(';',' ').replace(',',' ').split()
['a', 'bcd', 'ef', 'g']

Another quick way to do this without a regexp is to replace the characters first, as below:

>>> 'a;bcd,ef g'.replace(';',' ').replace(',',' ').split()
['a', 'bcd', 'ef', 'g']

回答 3

如此众多的答案,但我找不到有效解决问题标题真正要求的解决方案(拆分多个可能的分隔符,相反,许多答案拆分成一个单词而不是单词,这是不同的)。因此,这是标题中问题的答案,该问题依赖于Python的标准高效re模块:

>>> import re  # Will be splitting on: , <space> - ! ? :
>>> filter(None, re.split("[, \-!?:]+", "Hey, you - what are you doing here!?"))
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

哪里:

  • […]比赛一个隔板内上市,
  • \-在正则表达式是在这里以防止特殊解释-为字符范围指示器(如在A-Z),
  • +跳过一个或多个分隔符(它可以省略感谢filter(),但是这将不必要地产生匹配隔板之间空字符串),并
  • filter(None, …) 删除可能由前导和尾随分隔符创建的空字符串(因为空字符串具有错误的布尔值)。

re.split()正如问题标题所要求的那样,这恰好是“用多个分隔符分隔”。

此外,该解决方案还可以避免在其他一些解决方案中发现的单词中非ASCII字符的问题(请参见ghostdog74的答案的第一条评论)。

re模块比“手动”执行Python循环和测试要高效得多(在速度和简洁性方面)!

So many answers, yet I can’t find any solution that does efficiently what the title of the questions literally asks for (splitting on multiple possible separators—instead, many answers split on anything that is not a word, which is different). So here is an answer to the question in the title, that relies on Python’s standard and efficient re module:

>>> import re  # Will be splitting on: , <space> - ! ? :
>>> filter(None, re.split("[, \-!?:]+", "Hey, you - what are you doing here!?"))
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

where:

  • the […] matches one of the separators listed inside,
  • the \- in the regular expression is here to prevent the special interpretation of - as a character range indicator (as in A-Z),
  • the + skips one or more delimiters (it could be omitted thanks to the filter(), but this would unnecessarily produce empty strings between matched separators), and
  • filter(None, …) removes the empty strings possibly created by leading and trailing separators (since empty strings have a false boolean value).

This re.split() precisely “splits with multiple separators”, as asked for in the question title.

This solution is furthermore immune to the problems with non-ASCII characters in words found in some other solutions (see the first comment to ghostdog74’s answer).

The re module is much more efficient (in speed and concision) than doing Python loops and tests “by hand”!


回答 4

另一种方式,没有正则表达式

import string
punc = string.punctuation
thestring = "Hey, you - what are you doing here!?"
s = list(thestring)
''.join([o for o in s if not o in punc]).split()

Another way, without regex

import string
punc = string.punctuation
thestring = "Hey, you - what are you doing here!?"
s = list(thestring)
''.join([o for o in s if not o in punc]).split()

回答 5

专业提示:使用 string.translate用于Python最快的字符串操作。

一些证明…

首先,慢速的方式(对不起pprzemek):

>>> import timeit
>>> S = 'Hey, you - what are you doing here!?'
>>> def my_split(s, seps):
...     res = [s]
...     for sep in seps:
...         s, res = res, []
...         for seq in s:
...             res += seq.split(sep)
...     return res
... 
>>> timeit.Timer('my_split(S, punctuation)', 'from __main__ import S,my_split; from string import punctuation').timeit()
54.65477919578552

接下来,我们使用re.findall()(由建议的答案给出)。快多了:

>>> timeit.Timer('findall(r"\w+", S)', 'from __main__ import S; from re import findall').timeit()
4.194725036621094

最后,我们使用translate

>>> from string import translate,maketrans,punctuation 
>>> T = maketrans(punctuation, ' '*len(punctuation))
>>> timeit.Timer('translate(S, T).split()', 'from __main__ import S,T,translate').timeit()
1.2835021018981934

说明:

string.translate是用C实现的,与Python中的许多字符串操作函数不同,string.translate 它不会产生新的字符串。因此,它与字符串替换一样快。

不过,这有点尴尬,因为它需要翻译表才能执行此操作。您可以使用maketrans()便利功能制作翻译表。此处的目的是将所有不需要的字符转换为空格。一对一的替代品。同样,不会产生任何新数据。所以这很快

接下来,我们使用好old split()split()默认情况下,它将对所有空白字符起作用,将它们分组在一起以进行拆分。结果将是您想要的单词列表。而且这种方法的速度几乎快了4倍re.findall()

Pro-Tip: Use string.translate for the fastest string operations Python has.

Some proof…

First, the slow way (sorry pprzemek):

>>> import timeit
>>> S = 'Hey, you - what are you doing here!?'
>>> def my_split(s, seps):
...     res = [s]
...     for sep in seps:
...         s, res = res, []
...         for seq in s:
...             res += seq.split(sep)
...     return res
... 
>>> timeit.Timer('my_split(S, punctuation)', 'from __main__ import S,my_split; from string import punctuation').timeit()
54.65477919578552

Next, we use re.findall() (as given by the suggested answer). MUCH faster:

>>> timeit.Timer('findall(r"\w+", S)', 'from __main__ import S; from re import findall').timeit()
4.194725036621094

Finally, we use translate:

>>> from string import translate,maketrans,punctuation 
>>> T = maketrans(punctuation, ' '*len(punctuation))
>>> timeit.Timer('translate(S, T).split()', 'from __main__ import S,T,translate').timeit()
1.2835021018981934

Explanation:

string.translate is implemented in C and unlike many string manipulation functions in Python, string.translate does not produce a new string. So it’s about as fast as you can get for string substitution.

It’s a bit awkward, though, as it needs a translation table in order to do this magic. You can make a translation table with the maketrans() convenience function. The objective here is to translate all unwanted characters to spaces. A one-for-one substitute. Again, no new data is produced. So this is fast!

Next, we use good old split(). split() by default will operate on all whitespace characters, grouping them together for the split. The result will be the list of words that you want. And this approach is almost 4x faster than re.findall()!


回答 6

我遇到了类似的难题,不想使用’re’模块。

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

print my_split('1111  2222 3333;4444,5555;6666', [' ', ';', ','])
['1111', '', '2222', '3333', '4444', '5555', '6666']

I had a similar dilemma and didn’t want to use ‘re’ module.

def my_split(s, seps):
    res = [s]
    for sep in seps:
        s, res = res, []
        for seq in s:
            res += seq.split(sep)
    return res

print my_split('1111  2222 3333;4444,5555;6666', [' ', ';', ','])
['1111', '', '2222', '3333', '4444', '5555', '6666']

回答 7

首先,我想与其他人同意,正则表达式或str.translate(...)基于基础的解决方案性能最高。对于我的用例,此功能的性能并不重要,因此我想添加我考虑的该标准的想法。

我的主要目标是将其他一些答案中的想法归纳为一个解决方案,该解决方案可用于包含不仅仅是正则表达式单词的字符串(即,将标点字符的显式子集列入黑名单而将单词字符列入白名单)。

请注意,在任何方法中,都可能会考虑使用 string.punctuation代替手动定义的列表。

选项1-重新订阅

我很惊讶地发现到目前为止没有答案使用re.sub(…)。我发现这是解决此问题的一种简单自然的方法。

import re

my_str = "Hey, you - what are you doing here!?"

words = re.split(r'\s+', re.sub(r'[,\-!?]', ' ', my_str).strip())

在此解决方案中,我将调用嵌套到re.sub(...)内部re.split(...)-但如果性能至关重要,则在外部编译正则表达式可能会有所益处-对于我的用例而言,差异并不明显,因此我更喜欢简单性和可读性。

选项2-str.replace

这是另外几行,但是它具有可扩展的优点,而不必检查是否需要在正则表达式中转义某个字符。

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
for r in replacements:
    my_str = my_str.replace(r, ' ')

words = my_str.split()

能够将str.replace映射到字符串本来会很好,但是我不认为可以使用不可变的字符串来完成,并且在映射到字符列表时可以工作,对每个字符运行每个替换听起来太过分了。(编辑:有关功能示例,请参阅下一个选项。)

选项3-functools.reduce

(在Python 2中,reduce它可以在全局命名空间中使用,而无需从functools导入。)

import functools

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
my_str = functools.reduce(lambda s, sep: s.replace(sep, ' '), replacements, my_str)
words = my_str.split()

First, I want to agree with others that the regex or str.translate(...) based solutions are most performant. For my use case the performance of this function wasn’t significant, so I wanted to add ideas that I considered with that criteria.

My main goal was to generalize ideas from some of the other answers into one solution that could work for strings containing more than just regex words (i.e., blacklisting the explicit subset of punctuation characters vs whitelisting word characters).

Note that, in any approach, one might also consider using string.punctuation in place of a manually defined list.

Option 1 – re.sub

I was surprised to see no answer so far uses re.sub(…). I find it a simple and natural approach to this problem.

import re

my_str = "Hey, you - what are you doing here!?"

words = re.split(r'\s+', re.sub(r'[,\-!?]', ' ', my_str).strip())

In this solution, I nested the call to re.sub(...) inside re.split(...) — but if performance is critical, compiling the regex outside could be beneficial — for my use case, the difference wasn’t significant, so I prefer simplicity and readability.

Option 2 – str.replace

This is a few more lines, but it has the benefit of being expandable without having to check whether you need to escape a certain character in regex.

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
for r in replacements:
    my_str = my_str.replace(r, ' ')

words = my_str.split()

It would have been nice to be able to map the str.replace to the string instead, but I don’t think it can be done with immutable strings, and while mapping against a list of characters would work, running every replacement against every character sounds excessive. (Edit: See next option for a functional example.)

Option 3 – functools.reduce

(In Python 2, reduce is available in global namespace without importing it from functools.)

import functools

my_str = "Hey, you - what are you doing here!?"

replacements = (',', '-', '!', '?')
my_str = functools.reduce(lambda s, sep: s.replace(sep, ' '), replacements, my_str)
words = my_str.split()

回答 8

join = lambda x: sum(x,[])  # a.k.a. flatten1([[1],[2,3],[4]]) -> [1,2,3,4]
# ...alternatively...
join = lambda lists: [x for l in lists for x in l]

然后这变成了三层:

fragments = [text]
for token in tokens:
    fragments = join(f.split(token) for f in fragments)

说明

这就是在Haskell中被称为List monad的东西。monad背后的想法是,一旦“在monad中”,您就“停留在monad中”,直到有东西将您带出。例如在Haskell中,假设您将python range(n) -> [1,2,...,n]函数映射到List上。如果结果是一个列表,它将被原地追加到列表中,因此您将获得类似map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0]。这称为map-append(或mappend,或类似的东西)。这里的想法是,您要执行此操作(拆分令牌),并且每当执行此操作时,您都将结果加入列表。

您可以将其抽象为一个函数,并且tokens=string.punctuation默认情况下具有。

这种方法的优点:

  • 这种方法(与基于朴素的基于正则表达式的方法不同)可以与任意长度的令牌一起使用(正则表达式也可以使用更高级的语法)。
  • 您不仅限于代币;您可以使用任意逻辑代替每个标记,例如,“标记”之一可以是根据嵌套括号的拆分方式进行拆分的函数。
join = lambda x: sum(x,[])  # a.k.a. flatten1([[1],[2,3],[4]]) -> [1,2,3,4]
# ...alternatively...
join = lambda lists: [x for l in lists for x in l]

Then this becomes a three-liner:

fragments = [text]
for token in tokens:
    fragments = join(f.split(token) for f in fragments)

Explanation

This is what in Haskell is known as the List monad. The idea behind the monad is that once “in the monad” you “stay in the monad” until something takes you out. For example in Haskell, say you map the python range(n) -> [1,2,...,n] function over a List. If the result is a List, it will be append to the List in-place, so you’d get something like map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0]. This is known as map-append (or mappend, or maybe something like that). The idea here is that you’ve got this operation you’re applying (splitting on a token), and whenever you do that, you join the result into the list.

You can abstract this into a function and have tokens=string.punctuation by default.

Advantages of this approach:

  • This approach (unlike naive regex-based approaches) can work with arbitrary-length tokens (which regex can also do with more advanced syntax).
  • You are not restricted to mere tokens; you could have arbitrary logic in place of each token, for example one of the “tokens” could be a function which splits according to how nested parentheses are.

回答 9

尝试这个:

import re

phrase = "Hey, you - what are you doing here!?"
matches = re.findall('\w+', phrase)
print matches

这将打印 ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

try this:

import re

phrase = "Hey, you - what are you doing here!?"
matches = re.findall('\w+', phrase)
print matches

this will print ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']


回答 10

两次使用替换:

a = '11223FROM33344INTO33222FROM3344'
a.replace('FROM', ',,,').replace('INTO', ',,,').split(',,,')

结果是:

['11223', '33344', '33222', '3344']

Use replace two times:

a = '11223FROM33344INTO33222FROM3344'
a.replace('FROM', ',,,').replace('INTO', ',,,').split(',,,')

results in:

['11223', '33344', '33222', '3344']

回答 11

我喜欢re,但是这是我的解决方案:

from itertools import groupby
sep = ' ,-!?'
s = "Hey, you - what are you doing here!?"
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

sep .__ contains__是’in’运算符使用的方法。基本上和

lambda ch: ch in sep

但是这里比较方便。

groupby获取我们的字符串和函数。它使用该函数将字符串分成几组:每当函数值更改时,就会生成一个新的组。因此,sep .__ contains__正是我们需要的。

groupby返回一对对的序列,其中pair [0]是我们函数的结果,而pair [1]是一个组。使用‘if not k’我们用分隔符过滤掉组(因为sep .__ contains__在分隔符上为True 的结果)。好了,就是这样-现在我们有了一系列的组,每个组都是一个单词(组实际上是一个可迭代的,因此我们使用join将其转换为字符串)。

该解决方案非常通用,因为它使用一个函数来分隔字符串(可以按需要的任何条件进行拆分)。另外,它不会创建中间字符串/列表(您可以删除联接,并且表达式将变得很懒,因为每个组都是迭代器)

I like re, but here is my solution without it:

from itertools import groupby
sep = ' ,-!?'
s = "Hey, you - what are you doing here!?"
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

sep.__contains__ is a method used by ‘in’ operator. Basically it is the same as

lambda ch: ch in sep

but is more convenient here.

groupby gets our string and function. It splits string in groups using that function: whenever a value of function changes – a new group is generated. So, sep.__contains__ is exactly what we need.

groupby returns a sequence of pairs, where pair[0] is a result of our function and pair[1] is a group. Using ‘if not k’ we filter out groups with separators (because a result of sep.__contains__ is True on separators). Well, that’s all – now we have a sequence of groups where each one is a word (group is actually an iterable so we use join to convert it to string).

This solution is quite general, because it uses a function to separate string (you can split by any condition you need). Also, it doesn’t create intermediate strings/lists (you can remove join and the expression will become lazy, since each group is an iterator)


回答 12

您可以使用pandas的series.str.split方法来获得相同的结果,而不是使用re模块功能re.split。

首先,使用上面的字符串创建一个系列,然后将该方法应用于该系列。

thestring = pd.Series("Hey, you - what are you doing here!?") thestring.str.split(pat = ',|-')

参数pat接受定界符,并将拆分后的字符串作为数组返回。这里,两个定界符使用|传递。(或运算符)。输出如下:

[Hey, you , what are you doing here!?]

Instead of using a re module function re.split you can achieve the same result using the series.str.split method of pandas.

First, create a series with the above string and then apply the method to the series.

thestring = pd.Series("Hey, you - what are you doing here!?") thestring.str.split(pat = ',|-')

parameter pat takes the delimiters and returns the split string as an array. Here the two delimiters are passed using a | (or operator). The output is as follows:

[Hey, you , what are you doing here!?]


回答 13

我正在重新熟悉Python,并需要同样的东西。findall解决方案可能更好,但是我想到了:

tokens = [x.strip() for x in data.split(',')]

I’m re-acquainting myself with Python and needed the same thing. The findall solution may be better, but I came up with this:

tokens = [x.strip() for x in data.split(',')]

回答 14

使用maketrans和翻译,您可以轻松整齐地进行操作

import string
specials = ',.!?:;"()<>[]#$=-/'
trans = string.maketrans(specials, ' '*len(specials))
body = body.translate(trans)
words = body.strip().split()

using maketrans and translate you can do it easily and neatly

import string
specials = ',.!?:;"()<>[]#$=-/'
trans = string.maketrans(specials, ' '*len(specials))
body = body.translate(trans)
words = body.strip().split()

回答 15

在Python 3中,您可以使用PY4E-Python for Everybody中的方法

我们可以通过使用字符串的方法解决这两个问题lowerpunctuationtranslate。该translate是最微妙的方法。这是有关以下内容的文档translate

your_string.translate(your_string.maketrans(fromstr, tostr, deletestr))

将中的字符替换为中fromstr相同位置的tostr字符,并删除中的所有字符deletestr。该fromstrtostr可以为空字符串和deletestr可以省略参数。

您可以看到“标点符号”:

In [10]: import string

In [11]: string.punctuation
Out[11]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'  

例如:

In [12]: your_str = "Hey, you - what are you doing here!?"

In [13]: line = your_str.translate(your_str.maketrans('', '', string.punctuation))

In [14]: line = line.lower()

In [15]: words = line.split()

In [16]: print(words)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

有关更多信息,您可以参考:

In Python 3, your can use the method from PY4E – Python for Everybody.

We can solve both these problems by using the string methods lower, punctuation, and translate. The translate is the most subtle of the methods. Here is the documentation for translate:

your_string.translate(your_string.maketrans(fromstr, tostr, deletestr))

Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

Your can see the “punctuation”:

In [10]: import string

In [11]: string.punctuation
Out[11]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'  

For your example:

In [12]: your_str = "Hey, you - what are you doing here!?"

In [13]: line = your_str.translate(your_str.maketrans('', '', string.punctuation))

In [14]: line = line.lower()

In [15]: words = line.split()

In [16]: print(words)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

For more information, you can refer:


回答 16

实现此目的的另一种方法是使用自然语言工具包(nltk)。

import nltk
data= "Hey, you - what are you doing here!?"
word_tokens = nltk.tokenize.regexp_tokenize(data, r'\w+')
print word_tokens

打印: ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

这种方法的最大缺点是您需要安装nltk软件包

好处是,一旦获得令牌,您就可以使用其余的nltk软件包做很多有趣的事情

Another way to achieve this is to use the Natural Language Tool Kit (nltk).

import nltk
data= "Hey, you - what are you doing here!?"
word_tokens = nltk.tokenize.regexp_tokenize(data, r'\w+')
print word_tokens

This prints: ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

The biggest drawback of this method is that you need to install the nltk package.

The benefits are that you can do a lot of fun stuff with the rest of the nltk package once you get your tokens.


回答 17

首先,我不认为您的意图是在拆分函数中实际使用标点符号作为分隔符。您的描述表明您只是想从结果字符串中消除标点符号。

我经常遇到这种情况,而我通常的解决方案不需要重新输入。

具有列表理解功能的单行lambda函数:

(要求import string):

split_without_punc = lambda text : [word.strip(string.punctuation) for word in 
    text.split() if word.strip(string.punctuation) != '']

# Call function
split_without_punc("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']


功能(传统)

作为传统函数,这仍然只有两行具有列表理解功能(除了import string):

def split_without_punctuation2(text):

    # Split by whitespace
    words = text.split()

    # Strip punctuation from each word
    return [word.strip(ignore) for word in words if word.strip(ignore) != '']

split_without_punctuation2("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

它自然也会使收缩和带连字符的单词保持完整。您总是可以text.replace("-", " ")在分割之前使用连字符将其转换为空格。

没有Lambda或列表理解的常规功能

对于更通用的解决方案(您可以在其中指定要消除的字符),并且无需列表理解,您将获得:

def split_without(text: str, ignore: str) -> list:

    # Split by whitespace
    split_string = text.split()

    # Strip any characters in the ignore string, and ignore empty strings
    words = []
    for word in split_string:
        word = word.strip(ignore)
        if word != '':
            words.append(word)

    return words

# Situation-specific call to general function
import string
final_text = split_without("Hey, you - what are you doing?!", string.punctuation)
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

当然,您也可以始终将lambda函数概括为任何指定的字符串。

First of all, I don’t think that your intention is to actually use punctuation as delimiters in the split functions. Your description suggests that you simply want to eliminate punctuation from the resultant strings.

I come across this pretty frequently, and my usual solution doesn’t require re.

One-liner lambda function w/ list comprehension:

(requires import string):

split_without_punc = lambda text : [word.strip(string.punctuation) for word in 
    text.split() if word.strip(string.punctuation) != '']

# Call function
split_without_punc("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']


Function (traditional)

As a traditional function, this is still only two lines with a list comprehension (in addition to import string):

def split_without_punctuation2(text):

    # Split by whitespace
    words = text.split()

    # Strip punctuation from each word
    return [word.strip(ignore) for word in words if word.strip(ignore) != '']

split_without_punctuation2("Hey, you -- what are you doing?!")
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

It will also naturally leave contractions and hyphenated words intact. You can always use text.replace("-", " ") to turn hyphens into spaces before the split.

General Function w/o Lambda or List Comprehension

For a more general solution (where you can specify the characters to eliminate), and without a list comprehension, you get:

def split_without(text: str, ignore: str) -> list:

    # Split by whitespace
    split_string = text.split()

    # Strip any characters in the ignore string, and ignore empty strings
    words = []
    for word in split_string:
        word = word.strip(ignore)
        if word != '':
            words.append(word)

    return words

# Situation-specific call to general function
import string
final_text = split_without("Hey, you - what are you doing?!", string.punctuation)
# returns ['Hey', 'you', 'what', 'are', 'you', 'doing']

Of course, you can always generalize the lambda function to any specified string of characters as well.


回答 18

首先,在循环中执行任何RegEx操作之前,请始终使用re.compile(),因为它比常规操作更快。

因此对于您的问题,请先编译模式,然后对其执行操作。

import re
DATA = "Hey, you - what are you doing here!?"
reg_tok = re.compile("[\w']+")
print reg_tok.findall(DATA)

First of all, always use re.compile() before performing any RegEx operation in a loop because it works faster than normal operation.

so for your problem first compile the pattern and then perform action on it.

import re
DATA = "Hey, you - what are you doing here!?"
reg_tok = re.compile("[\w']+")
print reg_tok.findall(DATA)

回答 19

这是一些解释的答案。

st = "Hey, you - what are you doing here!?"

# replace all the non alpha-numeric with space and then join.
new_string = ''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])
# output of new_string
'Hey  you  what are you doing here  '

# str.split() will remove all the empty string if separator is not provided
new_list = new_string.split()

# output of new_list
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

# we can join it to get a complete string without any non alpha-numeric character
' '.join(new_list)
# output
'Hey you what are you doing'

或者一行,我们可以这样:

(''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])).split()

# output
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

更新的答案

Here is the answer with some explanation.

st = "Hey, you - what are you doing here!?"

# replace all the non alpha-numeric with space and then join.
new_string = ''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])
# output of new_string
'Hey  you  what are you doing here  '

# str.split() will remove all the empty string if separator is not provided
new_list = new_string.split()

# output of new_list
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

# we can join it to get a complete string without any non alpha-numeric character
' '.join(new_list)
# output
'Hey you what are you doing'

or in one line, we can do like this:

(''.join([x.replace(x, ' ') if not x.isalnum() else x for x in st])).split()

# output
['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

updated answer


回答 20

创建一个函数,将两个字符串(要拆分的源字符串和定界符的splitlist字符串)作为输入,并输出一个拆分词列表:

def split_string(source, splitlist):
    output = []  # output list of cleaned words
    atsplit = True
    for char in source:
        if char in splitlist:
            atsplit = True
        else:
            if atsplit:
                output.append(char)  # append new word after split
                atsplit = False
            else: 
                output[-1] = output[-1] + char  # continue copying characters until next split
    return output

Create a function that takes as input two strings (the source string to be split and the splitlist string of delimiters) and outputs a list of split words:

def split_string(source, splitlist):
    output = []  # output list of cleaned words
    atsplit = True
    for char in source:
        if char in splitlist:
            atsplit = True
        else:
            if atsplit:
                output.append(char)  # append new word after split
                atsplit = False
            else: 
                output[-1] = output[-1] + char  # continue copying characters until next split
    return output

回答 21

我喜欢pprzemek的解决方案,因为它不假定定界符是单个字符,并且不尝试利用正则表达式(如果分隔符的数目太长了,这将不能很好地工作)。

为了清楚起见,以下是上述解决方案的可读性更高的版本:

def split_string_on_multiple_separators(input_string, separators):
    buffer = [input_string]
    for sep in separators:
        strings = buffer
        buffer = []  # reset the buffer
        for s in strings:
            buffer = buffer + s.split(sep)

    return buffer

I like pprzemek’s solution because it does not assume that the delimiters are single characters and it doesn’t try to leverage a regex (which would not work well if the number of separators got to be crazy long).

Here’s a more readable version of the above solution for clarity:

def split_string_on_multiple_separators(input_string, separators):
    buffer = [input_string]
    for sep in separators:
        strings = buffer
        buffer = []  # reset the buffer
        for s in strings:
            buffer = buffer + s.split(sep)

    return buffer

回答 22

遇到了与@ooboo相同的问题,并找到了这个主题@ ghostdog74启发了我,也许有人觉得我的解决方案有用

str1='adj:sg:nom:m1.m2.m3:pos'
splitat=':.'
''.join([ s if s not in splitat else ' ' for s in str1]).split()

如果您不想在空格处分割,请在空格处输入内容并使用相同的字符分割。

got same problem as @ooboo and find this topic @ghostdog74 inspired me, maybe someone finds my solution usefull

str1='adj:sg:nom:m1.m2.m3:pos'
splitat=':.'
''.join([ s if s not in splitat else ' ' for s in str1]).split()

input something in space place and split using same character if you dont want to split at spaces.


回答 23

这是我与多个决策者共同努力的结果:

def msplit( str, delims ):
  w = ''
  for z in str:
    if z not in delims:
        w += z
    else:
        if len(w) > 0 :
            yield w
        w = ''
  if len(w) > 0 :
    yield w

Here is my go at a split with multiple deliminaters:

def msplit( str, delims ):
  w = ''
  for z in str:
    if z not in delims:
        w += z
    else:
        if len(w) > 0 :
            yield w
        w = ''
  if len(w) > 0 :
    yield w

回答 24

我认为以下是满足您需求的最佳答案:

\W+ 可能适合这种情况,但可能不适合其他情况。

filter(None, re.compile('[ |,|\-|!|?]').split( "Hey, you - what are you doing here!?")

I think the following is the best answer to suite your needs :

\W+ maybe suitable for this case, but may not be suitable for other cases.

filter(None, re.compile('[ |,|\-|!|?]').split( "Hey, you - what are you doing here!?")

回答 25

这是我的看法。

def split_string(source,splitlist):
    splits = frozenset(splitlist)
    l = []
    s1 = ""
    for c in source:
        if c in splits:
            if s1:
                l.append(s1)
                s1 = ""
        else:
            print s1
            s1 = s1 + c
    if s1:
        l.append(s1)
    return l

>>>out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
>>>print out
>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

Heres my take on it….

def split_string(source,splitlist):
    splits = frozenset(splitlist)
    l = []
    s1 = ""
    for c in source:
        if c in splits:
            if s1:
                l.append(s1)
                s1 = ""
        else:
            print s1
            s1 = s1 + c
    if s1:
        l.append(s1)
    return l

>>>out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
>>>print out
>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

回答 26

我喜欢replace()最好的方式。以下过程将字符串中定义的所有分隔符更改splitlist为第一个分隔符splitlist,然后在该分隔符上拆分文本。它还说明是否splitlist碰巧是一个空字符串。它返回单词列表,其中没有空字符串。

def split_string(text, splitlist):
    for sep in splitlist:
        text = text.replace(sep, splitlist[0])
    return filter(None, text.split(splitlist[0])) if splitlist else [text]

I like the replace() way the best. The following procedure changes all separators defined in a string splitlist to the first separator in splitlist and then splits the text on that one separator. It also accounts for if splitlist happens to be an empty string. It returns a list of words, with no empty strings in it.

def split_string(text, splitlist):
    for sep in splitlist:
        text = text.replace(sep, splitlist[0])
    return filter(None, text.split(splitlist[0])) if splitlist else [text]

回答 27

def get_words(s):
    l = []
    w = ''
    for c in s.lower():
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l

这是用法:

>>> s = "Hey, you - what are you doing here!?"
>>> print get_words(s)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']
def get_words(s):
    l = []
    w = ''
    for c in s.lower():
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l

Here is the usage:

>>> s = "Hey, you - what are you doing here!?"
>>> print get_words(s)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']

回答 28

如果要进行可逆操作(保留定界符),则可以使用以下功能:

def tokenizeSentence_Reversible(sentence):
    setOfDelimiters = ['.', ' ', ',', '*', ';', '!']
    listOfTokens = [sentence]

    for delimiter in setOfDelimiters:
        newListOfTokens = []
        for ind, token in enumerate(listOfTokens):
            ll = [([delimiter, w] if ind > 0 else [w]) for ind, w in enumerate(token.split(delimiter))]
            listOfTokens = [item for sublist in ll for item in sublist] # flattens.
            listOfTokens = filter(None, listOfTokens) # Removes empty tokens: ''
            newListOfTokens.extend(listOfTokens)

        listOfTokens = newListOfTokens

    return listOfTokens

If you want a reversible operation (preserve the delimiters), you can use this function:

def tokenizeSentence_Reversible(sentence):
    setOfDelimiters = ['.', ' ', ',', '*', ';', '!']
    listOfTokens = [sentence]

    for delimiter in setOfDelimiters:
        newListOfTokens = []
        for ind, token in enumerate(listOfTokens):
            ll = [([delimiter, w] if ind > 0 else [w]) for ind, w in enumerate(token.split(delimiter))]
            listOfTokens = [item for sublist in ll for item in sublist] # flattens.
            listOfTokens = filter(None, listOfTokens) # Removes empty tokens: ''
            newListOfTokens.extend(listOfTokens)

        listOfTokens = newListOfTokens

    return listOfTokens

回答 29

我最近需要执行此操作,但需要一个与标准库str.split函数有些匹配的函数,当使用0或1个参数调用时,该函数的行为与标准库相同。

def split_many(string, *separators):
    if len(separators) == 0:
        return string.split()
    if len(separators) > 1:
        table = {
            ord(separator): ord(separator[0])
            for separator in separators
        }
        string = string.translate(table)
    return string.split(separators[0])

注意:仅当分隔符由单个字符组成时(如我的用例),此功能才有用。

I recently needed to do this but wanted a function that somewhat matched the standard library str.split function, this function behaves the same as standard library when called with 0 or 1 arguments.

def split_many(string, *separators):
    if len(separators) == 0:
        return string.split()
    if len(separators) > 1:
        table = {
            ord(separator): ord(separator[0])
            for separator in separators
        }
        string = string.translate(table)
    return string.split(separators[0])

NOTE: This function is only useful when your separators consist of a single character (as was my usecase).


建议使用哪个Python内存分析器?[关闭]

问题:建议使用哪个Python内存分析器?[关闭]

我想知道我的Python应用程序的内存使用情况,尤其想知道哪些代码块/部分或对象消耗了最多的内存。Google搜索显示商用的是Python Memory Validator(仅限Windows)。

开源的是PySizerHeapy

我没有尝试过任何人,所以我想知道哪个是最好的考虑因素:

  1. 提供大多数细节。

  2. 我必须要做最少的更改,也可以不做任何更改。

I want to know the memory usage of my Python application and specifically want to know what code blocks/portions or objects are consuming most memory. Google search shows a commercial one is Python Memory Validator (Windows only).

And open source ones are PySizer and Heapy.

I haven’t tried anyone, so I wanted to know which one is the best considering:

  1. Gives most details.

  2. I have to do least or no changes to my code.


回答 0

很容易使用。在代码中的某些时候,您必须编写以下代码:

from guppy import hpy
h = hpy()
print(h.heap())

这将为您提供如下输出:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

您还可以从何处引用对象,并获取有关该对象的统计信息,但是以某种方式,该文档上的文档很少。

还有一个用Tk编写的图形浏览器。

Heapy is quite simple to use. At some point in your code, you have to write the following:

from guppy import hpy
h = hpy()
print(h.heap())

This gives you some output like this:

Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
0  35144  27  2140412  26   2140412  26 str
1  38397  29  1309020  16   3449432  42 tuple
2    530   0   739856   9   4189288  50 dict (no owner)

You can also find out from where objects are referenced and get statistics about that, but somehow the docs on that are a bit sparse.

There is a graphical browser as well, written in Tk.


回答 1

由于没有人提到它,因此我将指向我的模块memory_profiler,该能够打印内存使用情况的报告,并且可以在Unix和Windows上运行(在最后一个版本中需要psutil)。输出不是很详细,但是目标是让您概述代码在哪里消耗了更多的内存,而不是对分配的对象进行详尽的分析。

在用函数修饰功能@profile并使用-m memory_profiler标记运行代码后,它将打印如下一行报告:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

Since nobody has mentioned it I’ll point to my module memory_profiler which is capable of printing line-by-line report of memory usage and works on Unix and Windows (needs psutil on this last one). Output is not very detailed but the goal is to give you an overview of where the code is consuming more memory and not a exhaustive analysis on allocated objects.

After decorating your function with @profile and running your code with the -m memory_profiler flag it will print a line-by-line report like this:

Line #    Mem usage  Increment   Line Contents
==============================================
     3                           @profile
     4      5.97 MB    0.00 MB   def my_func():
     5     13.61 MB    7.64 MB       a = [1] * (10 ** 6)
     6    166.20 MB  152.59 MB       b = [2] * (2 * 10 ** 7)
     7     13.61 MB -152.59 MB       del b
     8     13.61 MB    0.00 MB       return a

回答 2

我推荐Dowser。设置非常容易,您需要对代码进行零更改。您可以通过简单的Web界面随时查看每种类型的对象计数,查看活动对象列表,查看对活动对象的引用。

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

您导入memdebug,然后调用memdebug.start。就这样。

我没有尝试过PySizer或Heapy。我会很感激别人的评论。

更新

上面的代码用于CherryPy 2.XCherryPy 3.Xserver.quickstart方法已删除,并且engine.start不带有该blocking标志。因此,如果您正在使用CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

I recommend Dowser. It is very easy to setup, and you need zero changes to your code. You can view counts of objects of each type through time, view list of live objects, view references to live objects, all from the simple web interface.

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.server.quickstart()
    cherrypy.engine.start(blocking=False)

You import memdebug, and call memdebug.start. That’s all.

I haven’t tried PySizer or Heapy. I would appreciate others’ reviews.

UPDATE

The above code is for CherryPy 2.X, CherryPy 3.X the server.quickstart method has been removed and engine.start does not take the blocking flag. So if you are using CherryPy 3.X

# memdebug.py

import cherrypy
import dowser

def start(port):
    cherrypy.tree.mount(dowser.Root())
    cherrypy.config.update({
        'environment': 'embedded',
        'server.socket_port': port
    })
    cherrypy.engine.start()

回答 3

考虑objgraph库(请参阅http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks以获得示例用例)。


回答 4

up(还有另一个)是Python的Memory Usage Profiler。该工具集的重点在于识别内存泄漏。

Muppy试图帮助开发人员识别Python应用程序的内存泄漏。它可以跟踪运行时的内存使用情况,并标识泄漏的对象。另外,提供了一些工具,这些工具可以定位未释放对象的来源。

Muppy is (yet another) Memory Usage Profiler for Python. The focus of this toolset is laid on the identification of memory leaks.

Muppy tries to help developers to identity memory leaks of Python applications. It enables the tracking of memory usage during runtime and the identification of objects which are leaking. Additionally, tools are provided which allow to locate the source of not released objects.


回答 5

我正在为Python开发一个名为memprof的内存分析器:

http://jmdana.github.io/memprof/

它允许您在执行装饰方法期间记录和绘制变量的内存使用情况。您只需要使用以下方法导入库:

from memprof import memprof

并使用以下方法装饰您的方法:

@memprof

这是有关情节外观的示例:

在此处输入图片说明

该项目托管在GitHub中:

https://github.com/jmdana/memprof

I’m developing a memory profiler for Python called memprof:

http://jmdana.github.io/memprof/

It allows you to log and plot the memory usage of your variables during the execution of the decorated methods. You just have to import the library using:

from memprof import memprof

And decorate your method using:

@memprof

This is an example on how the plots look like:

enter image description here

The project is hosted in GitHub:

https://github.com/jmdana/memprof


回答 6

我发现苹果酱比Heapy或PySizer更具功能。如果您恰巧正在运行wsgi Webapp,则Dozer是Dowser的一个很好的中间件包装

I found meliae to be much more functional than Heapy or PySizer. If you happen to be running a wsgi webapp, then Dozer is a nice middleware wrapper of Dowser


回答 7

也尝试pytracemalloc项目,该项目提供每个Python行号的内存使用情况。

编辑(2014/04):现在它具有Qt GUI来分析快照。

Try also the pytracemalloc project which provides the memory usage per Python line number.

EDIT (2014/04): It now has a Qt GUI to analyze snapshots.


如何在Django queryset过滤中执行不等于?

问题:如何在Django queryset过滤中执行不等于?

在Django模型QuerySets中,我看到比较值存在__gt__lt,但是存在__ne// !=/ <>不等于?)。

我想使用不等于过滤掉:

例:

Model:
    bool a;
    int x;

我想要

results = Model.objects.exclude(a=true, x!=5)

!=不正确的语法。我试过__ne<>

我最终使用:

results = Model.objects.exclude(a=true, x__lt=5).exclude(a=true, x__gt=5)

In Django model QuerySets, I see that there is a __gt and __lt for comparitive values, but is there a __ne/!=/<> (not equals?)

I want to filter out using a not equals:

Example:

Model:
    bool a;
    int x;

I want

results = Model.objects.exclude(a=true, x!=5)

The != is not correct syntax. I tried __ne, <>.

I ended up using:

results = Model.objects.exclude(a=true, x__lt=5).exclude(a=true, x__gt=5)

回答 0

也许Q对象可以解决这个问题。我从未使用过它们,但似乎可以将它们取反并组合起来,就像普通的python表达式一样。

更新:我只是尝试了一下,它似乎工作得很好:

>>> from myapp.models import Entry
>>> from django.db.models import Q

>>> Entry.objects.filter(~Q(id = 3))

[<Entry: Entry object>, <Entry: Entry object>, <Entry: Entry object>, ...]

Maybe Q objects could be of help for this problem. I’ve never used them but it seems they can be negated and combined much like normal python expressions.

Update: I Just tried it out, it seems to work pretty well:

>>> from myapp.models import Entry
>>> from django.db.models import Q

>>> Entry.objects.filter(~Q(id = 3))

[<Entry: Entry object>, <Entry: Entry object>, <Entry: Entry object>, ...]

回答 1

您的查询似乎有一个双重否定,您想排除x不为5的所有行,换句话说,您想包括x为5的所有行。我相信这可以解决问题。

results = Model.objects.filter(x=5).exclude(a=true)

要回答您的特定问题,不存在“不等于”的问题,但这可能是因为django同时提供了“过滤器”和“排除”方法,因此您始终可以切换逻辑回合以获得所需的结果。

Your query appears to have a double negative, you want to exclude all rows where x is not 5, so in other words you want to include all rows where x IS 5. I believe this will do the trick.

results = Model.objects.filter(x=5).exclude(a=true)

To answer your specific question, there is no “not equal to” but that’s probably because django has both “filter” and “exclude” methods available so you can always just switch the logic round to get the desired result.


回答 2

field=value查询中的语法是的简写field__exact=value。也就是说,Django将查询运算符放在标识符中的查询字段上。Django支持以下运算符:

exact
iexact
contains
icontains
in
gt
gte
lt
lte
startswith
istartswith
endswith
iendswith
range
year
month
day
week_day
isnull
search
regex
iregex

我敢肯定,如Dave Vogt所建议的那样,将它们与Q对象结合使用,filter()或者exclude()Jason Baker所建议的那样使用,您将获得几乎所有可能查询所需的确切信息。

the field=value syntax in queries is a shorthand for field__exact=value. That is to say that Django puts query operators on query fields in the identifiers. Django supports the following operators:

exact
iexact
contains
icontains
in
gt
gte
lt
lte
startswith
istartswith
endswith
iendswith
range
year
month
day
week_day
isnull
search
regex
iregex

I’m sure by combining these with the Q objects as Dave Vogt suggests and using filter() or exclude() as Jason Baker suggests you’ll get exactly what you need for just about any possible query.


回答 3

使用Django 1.7创建自定义查找很容易。Django官方文档中有一个__ne查找示例。

您需要先创建查找本身:

from django.db.models import Lookup

class NotEqual(Lookup):
    lookup_name = 'ne'

    def as_sql(self, qn, connection):
        lhs, lhs_params = self.process_lhs(qn, connection)
        rhs, rhs_params = self.process_rhs(qn, connection)
        params = lhs_params + rhs_params
        return '%s <> %s' % (lhs, rhs), params

然后,您需要注册:

from django.db.models.fields import Field
Field.register_lookup(NotEqual)

现在,您可以__ne像下面这样在查询中使用查找:

results = Model.objects.exclude(a=True, x__ne=5)

It’s easy to create a custom lookup with Django 1.7. There’s an __ne lookup example in Django official documentation.

You need to create the lookup itself first:

from django.db.models import Lookup

class NotEqual(Lookup):
    lookup_name = 'ne'

    def as_sql(self, qn, connection):
        lhs, lhs_params = self.process_lhs(qn, connection)
        rhs, rhs_params = self.process_rhs(qn, connection)
        params = lhs_params + rhs_params
        return '%s <> %s' % (lhs, rhs), params

Then you need to register it:

from django.db.models.fields import Field
Field.register_lookup(NotEqual)

And now you can use the __ne lookup in your queries like this:

results = Model.objects.exclude(a=True, x__ne=5)

回答 4

Django 1.9 / 1.10中,有三个选项。

  1. excludefilter

    results = Model.objects.exclude(a=true).filter(x=5)
  2. 使用Q()对象~运算符

    from django.db.models import Q
    object_list = QuerySet.filter(~Q(a=True), x=5)
  3. 注册自定义查找功能

    from django.db.models import Lookup
    from django.db.models.fields import Field
    
    @Field.register_lookup
    class NotEqual(Lookup):
        lookup_name = 'ne'
    
        def as_sql(self, compiler, connection):
            lhs, lhs_params = self.process_lhs(compiler, connection)
            rhs, rhs_params = self.process_rhs(compiler, connection)
            params = lhs_params + rhs_params
            return '%s <> %s' % (lhs, rhs), params

    register_lookup在加装饰的Django 1.8和启用自定义查找和往常一样:

    results = Model.objects.exclude(a=True, x__ne=5)

In Django 1.9/1.10 there are three options.

  1. Chain exclude and filter

    results = Model.objects.exclude(a=true).filter(x=5)
    
  2. Use Q() objects and the ~ operator

    from django.db.models import Q
    object_list = QuerySet.filter(~Q(a=True), x=5)
    
  3. Register a custom lookup function

    from django.db.models import Lookup
    from django.db.models.fields import Field
    
    @Field.register_lookup
    class NotEqual(Lookup):
        lookup_name = 'ne'
    
        def as_sql(self, compiler, connection):
            lhs, lhs_params = self.process_lhs(compiler, connection)
            rhs, rhs_params = self.process_rhs(compiler, connection)
            params = lhs_params + rhs_params
            return '%s <> %s' % (lhs, rhs), params
    

    The register_lookup decorator was added in Django 1.8 and enables custom lookup as usual:

    results = Model.objects.exclude(a=True, x__ne=5)
    

回答 5

虽然与模型,您可以用过滤=__gt__gte__lt__lte,你不能使用ne!=或者<>。但是,使用Q对象可以实现更好的过滤。

您可以避免链接QuerySet.filter()QuerySet.exlude(),并使用以下代码:

from django.db.models import Q
object_list = QuerySet.filter(~Q(field='not wanted'), field='wanted')

While with the Models, you can filter with =, __gt, __gte, __lt, __lte, you cannot use ne, != or <>. However, you can achieve better filtering on using the Q object.

You can avoid chaining QuerySet.filter() and QuerySet.exlude(), and use this:

from django.db.models import Q
object_list = QuerySet.filter(~Q(field='not wanted'), field='wanted')

回答 6

等待设计决策。同时,使用exclude()

Django问题追踪器具有引人注目的条目#5763,标题为“ Queryset没有“不相等”的过滤运算符”。值得注意的是,(截至2016年4月)它是​​“在9年前开放”(在Django石器时代),“在4年前关闭”和“在5个月前最后一次更改”。

通读讨论,这很有趣。基本上,一些人认为__ne应增加也有人说exclude()是清晰的,因此__ne 应该添加。

(我同意前者,因为后者的论点大致相当于说Python不应该拥有!===,因为它已经拥有not了……)

Pending design decision. Meanwhile, use exclude()

The Django issue tracker has the remarkable entry #5763, titled “Queryset doesn’t have a “not equal” filter operator”. It is remarkable because (as of April 2016) it was “opened 9 years ago” (in the Django stone age), “closed 4 years ago”, and “last changed 5 months ago”.

Read through the discussion, it is interesting. Basically, some people argue __ne should be added while others say exclude() is clearer and hence __ne should not be added.

(I agree with the former, because the latter argument is roughly equivalent to saying Python should not have != because it has == and not already…)


回答 7

使用排除和过滤

results = Model.objects.filter(x=5).exclude(a=true)

Using exclude and filter

results = Model.objects.filter(x=5).exclude(a=true)

回答 8

您应该使用filterexclude喜欢这个

results = Model.objects.exclude(a=true).filter(x=5)

You should use filter and exclude like this

results = Model.objects.exclude(a=true).filter(x=5)

回答 9

代码的最后一位将排除x!= 5并且a为True的所有对象。尝试这个:

results = Model.objects.filter(a=False, x=5)

请记住,上一行中的=符号将False赋给参数a,并将数字5赋给参数x。它不是在检查是否相等。因此,实际上没有任何方法可以在查询调用中使用!=符号。

The last bit of code will exclude all objects where x!=5 and a is True. Try this:

results = Model.objects.filter(a=False, x=5)

Remember, the = sign in the above line is assigning False to the parameter a and the number 5 to the parameter x. It’s not checking for equality. Thus, there isn’t really any way to use the != symbol in a query call.


回答 10

结果= Model.objects.filter(a = True).exclude(x = 5)
生成此sql:
从tablex中选择*,其中a = 0且x!= 5
sql取决于您的True / False字段的表示方式以及数据库引擎。Django代码是您所需要的。

results = Model.objects.filter(a = True).exclude(x = 5)
Generetes this sql:
select * from tablex where a != 0 and x !=5
The sql depends on how your True/False field is represented, and the database engine. The django code is all you need though.

回答 11

Django-model-values(公开:作者)提供了NotEqual查找的实现,如此答案所示。它还为此提供了语法支持:

from model_values import F
Model.objects.exclude(F.x != 5, a=True)

Django-model-values (disclosure: author) provides an implementation of the NotEqual lookup, as in this answer. It also provides syntactic support for it:

from model_values import F
Model.objects.exclude(F.x != 5, a=True)

回答 12

您要查找的所有具有a=false 或的 对象x=5。在Django中,|充当查询集OR之间的运算符:

results = Model.objects.filter(a=false)|Model.objects.filter(x=5)

What you are looking for are all objects that have either a=false or x=5. In Django, | serves as OR operator between querysets:

results = Model.objects.filter(a=false)|Model.objects.filter(x=5)

回答 13

这将给您想要的结果。

from django.db.models import Q
results = Model.objects.exclude(Q(a=True) & ~Q(x=5))

对于不相等,可以~在相等查询上使用。显然,Q可以用来达到相等的查询。

This will give your desired result.

from django.db.models import Q
results = Model.objects.exclude(Q(a=True) & ~Q(x=5))

for not equal you can use ~ on an equal query. obviously, Q can be used to reach the equal query.


回答 14

注意此问题的许多错误答案!

Gerard的逻辑是正确的,尽管它将返回一个列表而不是一个查询集(这可能无关紧要)。

如果需要查询集,请使用Q:

from django.db.models import Q
results = Model.objects.filter(Q(a=false) | Q(x=5))

Watch out for lots of incorrect answers to this question!

Gerard’s logic is correct, though it will return a list rather than a queryset (which might not matter).

If you need a queryset, use Q:

from django.db.models import Q
results = Model.objects.filter(Q(a=false) | Q(x=5))

如果不存在,Python中的open()不会创建文件

问题:如果不存在,Python中的open()不会创建文件

如果文件以读/写方式打开,或者以不存在的方式创建,然后以读/写方式打开,最好的方法是什么?根据我的阅读,file = open('myfile.dat', 'rw')应该这样做吗?

它对我不起作用(Python 2.6.2),我想知道这是否是版本问题,或者不应该那样工作或做什么。

最重要的是,我只需要解决这个问题。我对其他东西很好奇,但是我所需要的只是做开始部分的好方法。

封闭目录可由用户和组而非其他用户(我在Linux系统上…因此权限775)可写,确切的错误是:

IOError:没有这样的文件或目录。

What is the best way to open a file as read/write if it exists, or if it does not, then create it and open it as read/write? From what I read, file = open('myfile.dat', 'rw') should do this, right?

It is not working for me (Python 2.6.2) and I’m wondering if it is a version problem, or not supposed to work like that or what.

The bottom line is, I just need a solution for the problem. I am curious about the other stuff, but all I need is a nice way to do the opening part.

The enclosing directory was writeable by user and group, not other (I’m on a Linux system… so permissions 775 in other words), and the exact error was:

IOError: no such file or directory.


回答 0

您应该使用open以下w+模式:

file = open('myfile.dat', 'w+')

You should use open with the w+ mode:

file = open('myfile.dat', 'w+')

回答 1

以下方法的优点是,即使在途中引发了异常,文件也会在块的末尾正确关闭。它等效于try-finally,但要短得多。

with open("file.dat","a+") as f:
    f.write(...)
    ...

a +打开一个文件以进行附加和读取。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。如果该文件不存在,它将创建一个用于读取和写入的新文件。- Python的文件模式

seek()方法设置文件的当前位置。

f.seek(pos [, (0|1|2)])
pos .. position of the r/w pointer
[] .. optionally
() .. one of ->
  0 .. absolute position
  1 .. relative position to current
  2 .. relative position from end

只允许使用“ rwab +”字符;必须完全是“ rwa”之一-请参阅Stack Overflow问题Python文件模式详细信息

The advantage of the following approach is that the file is properly closed at the block’s end, even if an exception is raised on the way. It’s equivalent to try-finally, but much shorter.

with open("file.dat","a+") as f:
    f.write(...)
    ...

a+ Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing. –Python file modes

seek() method sets the file’s current position.

f.seek(pos [, (0|1|2)])
pos .. position of the r/w pointer
[] .. optionally
() .. one of ->
  0 .. absolute position
  1 .. relative position to current
  2 .. relative position from end

Only “rwab+” characters are allowed; there must be exactly one of “rwa” – see Stack Overflow question Python file modes detail.


回答 2

好的做法是使用以下方法:

import os

writepath = 'some/path/to/file.txt'

mode = 'a' if os.path.exists(writepath) else 'w'
with open(writepath, mode) as f:
    f.write('Hello, world!\n')

Good practice is to use the following:

import os

writepath = 'some/path/to/file.txt'

mode = 'a' if os.path.exists(writepath) else 'w'
with open(writepath, mode) as f:
    f.write('Hello, world!\n')

回答 3

将“ rw”更改为“ w +”

或使用“ a +”进行附加(不删除现有内容)

Change “rw” to “w+”

Or use ‘a+’ for appending (not erasing existing content)


回答 4

>>> import os
>>> if os.path.exists("myfile.dat"):
...     f = file("myfile.dat", "r+")
... else:
...     f = file("myfile.dat", "w")

r +表示读/写

>>> import os
>>> if os.path.exists("myfile.dat"):
...     f = file("myfile.dat", "r+")
... else:
...     f = file("myfile.dat", "w")

r+ means read/write


回答 5

从python 3.4开始,您应该使用pathlib“触摸”文件。
与该线程中提出的解决方案相比,这是一种更为优雅的解决方案。

from pathlib import Path

filename = Path('myfile.txt')
filename.touch(exist_ok=True)  # will create file, if it exists will do nothing
file = open(filename)

与目录相同:

filename.mkdir(parents=True, exist_ok=True)

Since python 3.4 you should use pathlib to “touch” files.
It is a much more elegant solution than the proposed ones in this thread.

from pathlib import Path

filename = Path('myfile.txt')
filename.touch(exist_ok=True)  # will create file, if it exists will do nothing
file = open(filename)

Same thing with directories:

filename.mkdir(parents=True, exist_ok=True)

回答 6

我的答案:

file_path = 'myfile.dat'
try:
    fp = open(file_path)
except IOError:
    # If not exists, create the file
    fp = open(file_path, 'w+')

My answer:

file_path = 'myfile.dat'
try:
    fp = open(file_path)
except IOError:
    # If not exists, create the file
    fp = open(file_path, 'w+')

回答 7

'''
w  write mode
r  read mode
a  append mode

w+  create file if it doesn't exist and open it in write mode
r+  open for reading and writing. Does not create file.
a+  create file if it doesn't exist and open it in append mode
'''

例:

file_name = 'my_file.txt'
f = open(file_name, 'w+')  # open file in write mode
f.write('python rules')
f.close()

我希望这有帮助。[仅供参考,请使用python 3.6.2版]

'''
w  write mode
r  read mode
a  append mode

w+  create file if it doesn't exist and open it in write mode
r+  open for reading and writing. Does not create file.
a+  create file if it doesn't exist and open it in append mode
'''

example:

file_name = 'my_file.txt'
f = open(file_name, 'w+')  # open file in write mode
f.write('python rules')
f.close()

I hope this helps. [FYI am using python version 3.6.2]


回答 8

open('myfile.dat', 'a') 为我工作,就好。

在py3k中,您的代码将引发ValueError

>>> open('myfile.dat', 'rw')
Traceback (most recent call last):
  File "<pyshell#34>", line 1, in <module>
    open('myfile.dat', 'rw')
ValueError: must have exactly one of read/write/append mode

在python-2.6中它引发了IOError

open('myfile.dat', 'a') works for me, just fine.

in py3k your code raises ValueError:

>>> open('myfile.dat', 'rw')
Traceback (most recent call last):
  File "<pyshell#34>", line 1, in <module>
    open('myfile.dat', 'rw')
ValueError: must have exactly one of read/write/append mode

in python-2.6 it raises IOError.


回答 9

采用:

import os

f_loc = r"C:\Users\Russell\Desktop\myfile.dat"

# Create the file if it does not exist
if not os.path.exists(f_loc):
    open(f_loc, 'w').close()

# Open the file for appending and reading
with open(f_loc, 'a+') as f:
    #Do stuff

注意:打开文件后必须将其关闭,with上下文管理器是让Python为您解决此问题的一种好方法。

Use:

import os

f_loc = r"C:\Users\Russell\Desktop\myfile.dat"

# Create the file if it does not exist
if not os.path.exists(f_loc):
    open(f_loc, 'w').close()

# Open the file for appending and reading
with open(f_loc, 'a+') as f:
    #Do stuff

Note: Files have to be closed after you open them, and the with context manager is a nice way of letting Python take care of this for you.


回答 10

您想对文件做什么?只写它还是读和写?

'w''a'将允许写入,如果文件不存在,则会创建该文件。

如果需要读取文件,则在打开文件之前必须先将其存在。您可以在打开它之前测试它的存在或使用try / except。

What do you want to do with file? Only writing to it or both read and write?

'w', 'a' will allow write and will create the file if it doesn’t exist.

If you need to read from a file, the file has to be exist before open it. You can test its existence before opening it or use a try/except.


回答 11

我认为r+不是rw。我只是一个入门者,这就是我在文档中看到的内容。

I think it’s r+, not rw. I’m just a starter, and that’s what I’ve seen in the documentation.


回答 12

使用w +写入文件,如果存在则截断,使用r +读取文件,如果文件不存在则创建一个,但不写入(并返回null),或者使用a +创建新文件或追加到现有文件。

Put w+ for writing the file, truncating if it exist, r+ to read the file, creating one if it don’t exist but not writing (and returning null) or a+ for creating a new file or appending to a existing one.


回答 13

因此,您想将数据写入文件,但前提是该文件尚不存在?

通过使用鲜为人知的x模式open()而不是通常的w模式,可以轻松解决此问题。例如:

 >>> with open('somefile', 'wt') as f:
 ...     f.write('Hello\n')
...
>>> with open('somefile', 'xt') as f:
...     f.write('Hello\n')
...
 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'somefile'
  >>>

如果文件是二进制模式,请使用模式xb而不是xt。

So You want to write data to a file, but only if it doesn’t already exist?.

This problem is easily solved by using the little-known x mode to open() instead of the usual w mode. For example:

 >>> with open('somefile', 'wt') as f:
 ...     f.write('Hello\n')
...
>>> with open('somefile', 'xt') as f:
...     f.write('Hello\n')
...
 Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'somefile'
  >>>

If the file is binary mode, use mode xb instead of xt.


回答 14

如果您想打开它进行读写,那么我假设您不想在打开文件时截断它,并且希望在打开文件后立即读取文件。所以这是我正在使用的解决方案:

file = open('myfile.dat', 'a+')
file.seek(0, 0)

If you want to open it to read and write, I’m assuming you don’t want to truncate it as you open it and you want to be able to read the file right after opening it. So this is the solution I’m using:

file = open('myfile.dat', 'a+')
file.seek(0, 0)

回答 15

也许这会有所帮助

首先将os模块导入py文件

import os

然后创建一个名为save_file的变量并将其设置为您要制作html或txt的文件,在这种情况下,将其设置为txt文件

save_file = "history.txt"

然后定义一个函数,该函数将使用os.path.is file方法检查文件是否存在,如果不存在,它将创建一个文件

def check_into():
if os.path.isfile(save_file):
    print("history file exists..... \nusing for writting....")
else:
    print("history file not exists..... \ncreating it..... ")
    file = open(save_file, 'w')
    time.sleep(2)
    print('file created ')
    file.close()

最后调用函数

check_into()

may be this will help

first import os module into your py file

import os

then create a variable named save_file and set it to file you want to make html or txt in this case a txt file

save_file = "history.txt"

then define a function that will use os.path.is file method to check if file exist and if not it will create a file

def check_into():
if os.path.isfile(save_file):
    print("history file exists..... \nusing for writting....")
else:
    print("history file not exists..... \ncreating it..... ")
    file = open(save_file, 'w')
    time.sleep(2)
    print('file created ')
    file.close()

and at last call the function

check_into()

回答 16

import os, platform
os.chdir('c:\\Users\\MS\\Desktop')

try :
    file = open("Learn Python.txt","a")
    print('this file is exist')
except:
    print('this file is not exist')
file.write('\n''Hello Ashok')

fhead = open('Learn Python.txt')

for line in fhead:

    words = line.split()
print(words)
import os, platform
os.chdir('c:\\Users\\MS\\Desktop')

try :
    file = open("Learn Python.txt","a")
    print('this file is exist')
except:
    print('this file is not exist')
file.write('\n''Hello Ashok')

fhead = open('Learn Python.txt')

for line in fhead:

    words = line.split()
print(words)

反转/反转字典映射

问题:反转/反转字典映射

给定这样的字典:

my_map = {'a': 1, 'b': 2}

如何将这张地图倒置即可:

inv_map = {1: 'a', 2: 'b'}

Given a dictionary like so:

my_map = {'a': 1, 'b': 2}

How can one invert this map to get:

inv_map = {1: 'a', 2: 'b'}

回答 0

对于Python 2.7.x

inv_map = {v: k for k, v in my_map.iteritems()}

对于Python 3+:

inv_map = {v: k for k, v in my_map.items()}

For Python 2.7.x

inv_map = {v: k for k, v in my_map.iteritems()}

For Python 3+:

inv_map = {v: k for k, v in my_map.items()}

回答 1

假设字典中的值是唯一的:

dict((v, k) for k, v in my_map.iteritems())

Assuming that the values in the dict are unique:

dict((v, k) for k, v in my_map.iteritems())

回答 2

如果中的值my_map不是唯一的:

inv_map = {}
for k, v in my_map.iteritems():
    inv_map[v] = inv_map.get(v, [])
    inv_map[v].append(k)

If the values in my_map aren’t unique:

inv_map = {}
for k, v in my_map.iteritems():
    inv_map[v] = inv_map.get(v, [])
    inv_map[v].append(k)

回答 3

为此,同时保留映射类型(假设它是a dictdict子类):

def inverse_mapping(f):
    return f.__class__(map(reversed, f.items()))

To do this while preserving the type of your mapping (assuming that it is a dict or a dict subclass):

def inverse_mapping(f):
    return f.__class__(map(reversed, f.items()))

回答 4

尝试这个:

inv_map = dict(zip(my_map.values(), my_map.keys()))

(请注意,字典视图上的Python文档明确地保证了这一点,.keys()并且.values()其元素具有相同的顺序,这使得上述方法可以工作。)

或者:

inv_map = dict((my_map[k], k) for k in my_map)

或使用python 3.0的dict理解

inv_map = {my_map[k] : k for k in my_map}

Try this:

inv_map = dict(zip(my_map.values(), my_map.keys()))

(Note that the Python docs on dictionary views explicitly guarantee that .keys() and .values() have their elements in the same order, which allows the approach above to work.)

Alternatively:

inv_map = dict((my_map[k], k) for k in my_map)

or using python 3.0’s dict comprehensions

inv_map = {my_map[k] : k for k in my_map}

回答 5

另一种更实用的方法:

my_map = { 'a': 1, 'b':2 }
dict(map(reversed, my_map.items()))

Another, more functional, way:

my_map = { 'a': 1, 'b':2 }
dict(map(reversed, my_map.items()))

回答 6

这扩展了Robert的答案,适用于字典中的值不是唯一的情况。

class ReversibleDict(dict):

    def reversed(self):
        """
        Return a reversed dict, with common values in the original dict
        grouped into a list in the returned dict.

        Example:
        >>> d = ReversibleDict({'a': 3, 'c': 2, 'b': 2, 'e': 3, 'd': 1, 'f': 2})
        >>> d.reversed()
        {1: ['d'], 2: ['c', 'b', 'f'], 3: ['a', 'e']}
        """

        revdict = {}
        for k, v in self.iteritems():
            revdict.setdefault(v, []).append(k)
        return revdict

实现受到限制,因为您不能使用reversed两次并取回原始文件。因此它不是对称的。已通过Python 2.6测试。是一个我用来打印结果字典的用例。

如果您宁愿使用a而set不是a list,并且可能存在对此有意义的无序应用程序,而不是setdefault(v, []).append(k)use setdefault(v, set()).add(k)

This expands upon the answer by Robert, applying to when the values in the dict aren’t unique.

class ReversibleDict(dict):

    def reversed(self):
        """
        Return a reversed dict, with common values in the original dict
        grouped into a list in the returned dict.

        Example:
        >>> d = ReversibleDict({'a': 3, 'c': 2, 'b': 2, 'e': 3, 'd': 1, 'f': 2})
        >>> d.reversed()
        {1: ['d'], 2: ['c', 'b', 'f'], 3: ['a', 'e']}
        """

        revdict = {}
        for k, v in self.iteritems():
            revdict.setdefault(v, []).append(k)
        return revdict

The implementation is limited in that you cannot use reversed twice and get the original back. It is not symmetric as such. It is tested with Python 2.6. Here is a use case of how I am using to print the resultant dict.

If you’d rather use a set than a list, and there could exist unordered applications for which this makes sense, instead of setdefault(v, []).append(k), use setdefault(v, set()).add(k).


回答 7

我们还可以使用重复键反转字典defaultdict

from collections import Counter, defaultdict

def invert_dict(d):
    d_inv = defaultdict(list)
    for k, v in d.items():
        d_inv[v].append(k)
    return d_inv

text = 'aaa bbb ccc ddd aaa bbb ccc aaa' 
c = Counter(text.split()) # Counter({'aaa': 3, 'bbb': 2, 'ccc': 2, 'ddd': 1})
dict(invert_dict(c)) # {1: ['ddd'], 2: ['bbb', 'ccc'], 3: ['aaa']}  

这里

与使用的等效技术相比,此技术更简单,更快dict.setdefault()

We can also reverse a dictionary with duplicate keys using defaultdict:

from collections import Counter, defaultdict

def invert_dict(d):
    d_inv = defaultdict(list)
    for k, v in d.items():
        d_inv[v].append(k)
    return d_inv

text = 'aaa bbb ccc ddd aaa bbb ccc aaa' 
c = Counter(text.split()) # Counter({'aaa': 3, 'bbb': 2, 'ccc': 2, 'ddd': 1})
dict(invert_dict(c)) # {1: ['ddd'], 2: ['bbb', 'ccc'], 3: ['aaa']}  

See here:

This technique is simpler and faster than an equivalent technique using dict.setdefault().


回答 8

例如,您有以下字典:

dict = {'a': 'fire', 'b': 'ice', 'c': 'fire', 'd': 'water'}

而且您想以相反的形式获取它:

inverted_dict = {'fire': ['a', 'c'], 'ice': ['b'], 'water': ['d']}

第一个解决方案。要在字典中反转键值对,请使用for-loop方法:

# Use this code to invert dictionaries that have non-unique values

inverted_dict = dict()
for key, value in dict.items():
    inverted_dict.setdefault(value, list()).append(key)

第二解决方案。使用字典理解方法进行反演:

# Use this code to invert dictionaries that have unique values

inverted_dict = {value: key for key, value in dict.items()}

第三解。使用还原反转方法(取决于第二种解决方案):

# Use this code to invert dictionaries that have lists of values

dict = {value: key for key in inverted_dict for value in my_map[key]}

For instance, you have the following dictionary:

dict = {'a': 'fire', 'b': 'ice', 'c': 'fire', 'd': 'water'}

And you wanna get it in such an inverted form:

inverted_dict = {'fire': ['a', 'c'], 'ice': ['b'], 'water': ['d']}

First Solution. For inverting key-value pairs in your dictionary use a for-loop approach:

# Use this code to invert dictionaries that have non-unique values

inverted_dict = dict()
for key, value in dict.items():
    inverted_dict.setdefault(value, list()).append(key)

Second Solution. Use a dictionary comprehension approach for inversion:

# Use this code to invert dictionaries that have unique values

inverted_dict = {value: key for key, value in dict.items()}

Third Solution. Use reverting the inversion approach (relies on second solution):

# Use this code to invert dictionaries that have lists of values

dict = {value: key for key in inverted_dict for value in my_map[key]}

回答 9

列表和字典理解的结合。可以处理重复的钥匙

{v:[i for i in d.keys() if d[i] == v ] for k,v in d.items()}

Combination of list and dictionary comprehension. Can handle duplicate keys

{v:[i for i in d.keys() if d[i] == v ] for k,v in d.items()}

回答 10

如果值不是唯一的,并且您有点硬核:

inv_map = dict(
    (v, [k for (k, xx) in filter(lambda (key, value): value == v, my_map.items())]) 
    for v in set(my_map.values())
)

特别是对于大字典,请注意,此解决方案的效率远不及Python反向/反转映射的答案,因为它会循环items()多次。

If the values aren’t unique, and you’re a little hardcore:

inv_map = dict(
    (v, [k for (k, xx) in filter(lambda (key, value): value == v, my_map.items())]) 
    for v in set(my_map.values())
)

Especially for a large dict, note that this solution is far less efficient than the answer Python reverse / invert a mapping because it loops over items() multiple times.


回答 11

除了上面建议的其他功能之外,如果您喜欢lambdas:

invert = lambda mydict: {v:k for k, v in mydict.items()}

或者,您也可以采用这种方式:

invert = lambda mydict: dict( zip(mydict.values(), mydict.keys()) )

In addition to the other functions suggested above, if you like lambdas:

invert = lambda mydict: {v:k for k, v in mydict.items()}

Or, you could do it this way too:

invert = lambda mydict: dict( zip(mydict.values(), mydict.keys()) )

回答 12

我认为最好的方法是定义一个类。这是“对称字典”的实现:

class SymDict:
    def __init__(self):
        self.aToB = {}
        self.bToA = {}

    def assocAB(self, a, b):
        # Stores and returns a tuple (a,b) of overwritten bindings
        currB = None
        if a in self.aToB: currB = self.bToA[a]
        currA = None
        if b in self.bToA: currA = self.aToB[b]

        self.aToB[a] = b
        self.bToA[b] = a
        return (currA, currB)

    def lookupA(self, a):
        if a in self.aToB:
            return self.aToB[a]
        return None

    def lookupB(self, b):
        if b in self.bToA:
            return self.bToA[b]
        return None

如果需要,删除和迭代方法很容易实现。

与反转整个字典(这似乎是此页面上最流行的解决方案)相比,这种实现方式效率更高。更不用说,您可以根据需要在自己的SymDict中添加或删除值,并且逆字典将始终保持有效-如果仅对整个字典进行一次反向操作,则情况并非如此。

I think the best way to do this is to define a class. Here is an implementation of a “symmetric dictionary”:

class SymDict:
    def __init__(self):
        self.aToB = {}
        self.bToA = {}

    def assocAB(self, a, b):
        # Stores and returns a tuple (a,b) of overwritten bindings
        currB = None
        if a in self.aToB: currB = self.bToA[a]
        currA = None
        if b in self.bToA: currA = self.aToB[b]

        self.aToB[a] = b
        self.bToA[b] = a
        return (currA, currB)

    def lookupA(self, a):
        if a in self.aToB:
            return self.aToB[a]
        return None

    def lookupB(self, b):
        if b in self.bToA:
            return self.bToA[b]
        return None

Deletion and iteration methods are easy enough to implement if they’re needed.

This implementation is way more efficient than inverting an entire dictionary (which seems to be the most popular solution on this page). Not to mention, you can add or remove values from your SymDict as much as you want, and your inverse-dictionary will always stay valid — this isn’t true if you simply reverse the entire dictionary once.


回答 13

这处理非唯一值,并保留了唯一情况的大部分外观。

inv_map = {v:[k for k in my_map if my_map[k] == v] for v in my_map.itervalues()}

对于Python 3.x,请替换itervaluesvalues

This handles non-unique values and retains much of the look of the unique case.

inv_map = {v:[k for k in my_map if my_map[k] == v] for v in my_map.itervalues()}

For Python 3.x, replace itervalues with values.


回答 14

函数对于类型列表的值是对称的;执行reverse_dict(reverse_dict(dictionary))时,元组被覆盖到列表中

def reverse_dict(dictionary):
    reverse_dict = {}
    for key, value in dictionary.iteritems():
        if not isinstance(value, (list, tuple)):
            value = [value]
        for val in value:
            reverse_dict[val] = reverse_dict.get(val, [])
            reverse_dict[val].append(key)
    for key, value in reverse_dict.iteritems():
        if len(value) == 1:
            reverse_dict[key] = value[0]
    return reverse_dict

Function is symmetric for values of type list; Tuples are coverted to lists when performing reverse_dict(reverse_dict(dictionary))

def reverse_dict(dictionary):
    reverse_dict = {}
    for key, value in dictionary.iteritems():
        if not isinstance(value, (list, tuple)):
            value = [value]
        for val in value:
            reverse_dict[val] = reverse_dict.get(val, [])
            reverse_dict[val].append(key)
    for key, value in reverse_dict.iteritems():
        if len(value) == 1:
            reverse_dict[key] = value[0]
    return reverse_dict

回答 15

由于字典在字典中需要一个与值不同的唯一键,因此我们必须将反转的值附加到要包含在新的特定键中的排序列表中。

def r_maping(dictionary):
    List_z=[]
    Map= {}
    for z, x in dictionary.iteritems(): #iterate through the keys and values
        Map.setdefault(x,List_z).append(z) #Setdefault is the same as dict[key]=default."The method returns the key value available in the dictionary and if given key is not available then it will return provided default value. Afterward, we will append into the default list our new values for the specific key.
    return Map

Since dictionaries require one unique key within the dictionary unlike values, we have to append the reversed values into a list of sort to be included within the new specific keys.

def r_maping(dictionary):
    List_z=[]
    Map= {}
    for z, x in dictionary.iteritems(): #iterate through the keys and values
        Map.setdefault(x,List_z).append(z) #Setdefault is the same as dict[key]=default."The method returns the key value available in the dictionary and if given key is not available then it will return provided default value. Afterward, we will append into the default list our new values for the specific key.
    return Map

回答 16

非双射映射的快速功能解决方案(值不是唯一的):

from itertools import imap, groupby

def fst(s):
    return s[0]

def snd(s):
    return s[1]

def inverseDict(d):
    """
    input d: a -> b
    output : b -> set(a)
    """
    return {
        v : set(imap(fst, kv_iter))
        for (v, kv_iter) in groupby(
            sorted(d.iteritems(),
                   key=snd),
            key=snd
        )
    }

从理论上讲,这应该比在命令式解决方案中一个接一个地添加到集合(或追加到列表中)要快。

不幸的是,值必须是可排序的,groupby要求排序。

Fast functional solution for non-bijective maps (values not unique):

from itertools import imap, groupby

def fst(s):
    return s[0]

def snd(s):
    return s[1]

def inverseDict(d):
    """
    input d: a -> b
    output : b -> set(a)
    """
    return {
        v : set(imap(fst, kv_iter))
        for (v, kv_iter) in groupby(
            sorted(d.iteritems(),
                   key=snd),
            key=snd
        )
    }

In theory this should be faster than adding to the set (or appending to the list) one by one like in the imperative solution.

Unfortunately the values have to be sortable, the sorting is required by groupby.


回答 17

尝试使用python 2.7 / 3.x

inv_map={};
for i in my_map:
    inv_map[my_map[i]]=i    
print inv_map

Try this for python 2.7/3.x

inv_map={};
for i in my_map:
    inv_map[my_map[i]]=i    
print inv_map

回答 18

我会在python 2中那样做。

inv_map = {my_map[x] : x for x in my_map}

I would do it that way in python 2.

inv_map = {my_map[x] : x for x in my_map}

回答 19

def invertDictionary(d):
    myDict = {}
  for i in d:
     value = d.get(i)
     myDict.setdefault(value,[]).append(i)   
 return myDict
 print invertDictionary({'a':1, 'b':2, 'c':3 , 'd' : 1})

这将提供以下输出:{1:[‘a’,’d’],2:[‘b’],3:[‘c’]}

def invertDictionary(d):
    myDict = {}
  for i in d:
     value = d.get(i)
     myDict.setdefault(value,[]).append(i)   
 return myDict
 print invertDictionary({'a':1, 'b':2, 'c':3 , 'd' : 1})

This will provide output as : {1: [‘a’, ‘d’], 2: [‘b’], 3: [‘c’]}


回答 20

  def reverse_dictionary(input_dict):
      out = {}
      for v in input_dict.values():  
          for value in v:
              if value not in out:
                  out[value.lower()] = []

      for i in input_dict:
          for j in out:
              if j in map (lambda x : x.lower(),input_dict[i]):
                  out[j].append(i.lower())
                  out[j].sort()
      return out

这段代码是这样的:

r = reverse_dictionary({'Accurate': ['exact', 'precise'], 'exact': ['precise'], 'astute': ['Smart', 'clever'], 'smart': ['clever', 'bright', 'talented']})

print(r)

{'precise': ['accurate', 'exact'], 'clever': ['astute', 'smart'], 'talented': ['smart'], 'bright': ['smart'], 'exact': ['accurate'], 'smart': ['astute']}
  def reverse_dictionary(input_dict):
      out = {}
      for v in input_dict.values():  
          for value in v:
              if value not in out:
                  out[value.lower()] = []

      for i in input_dict:
          for j in out:
              if j in map (lambda x : x.lower(),input_dict[i]):
                  out[j].append(i.lower())
                  out[j].sort()
      return out

this code do like this:

r = reverse_dictionary({'Accurate': ['exact', 'precise'], 'exact': ['precise'], 'astute': ['Smart', 'clever'], 'smart': ['clever', 'bright', 'talented']})

print(r)

{'precise': ['accurate', 'exact'], 'clever': ['astute', 'smart'], 'talented': ['smart'], 'bright': ['smart'], 'exact': ['accurate'], 'smart': ['astute']}

回答 21

没有什么完全不同,只是Cookbook重写了一些食谱。还通过保留setdefault方法进行了优化,而不是每次通过实例进行优化:

def inverse(mapping):
    '''
    A function to inverse mapping, collecting keys with simillar values
    in list. Careful to retain original type and to be fast.
    >> d = dict(a=1, b=2, c=1, d=3, e=2, f=1, g=5, h=2)
    >> inverse(d)
    {1: ['f', 'c', 'a'], 2: ['h', 'b', 'e'], 3: ['d'], 5: ['g']}
    '''
    res = {}
    setdef = res.setdefault
    for key, value in mapping.items():
        setdef(value, []).append(key)
    return res if mapping.__class__==dict else mapping.__class__(res)

设计下CPython的3.X中运行,为2.X替换mapping.items()mapping.iteritems()

在我的机器上,运行速度比这里的其他示例更快

Not something completely different, just a bit rewritten recipe from Cookbook. It’s futhermore optimized by retaining setdefault method, instead of each time getting it through the instance:

def inverse(mapping):
    '''
    A function to inverse mapping, collecting keys with simillar values
    in list. Careful to retain original type and to be fast.
    >> d = dict(a=1, b=2, c=1, d=3, e=2, f=1, g=5, h=2)
    >> inverse(d)
    {1: ['f', 'c', 'a'], 2: ['h', 'b', 'e'], 3: ['d'], 5: ['g']}
    '''
    res = {}
    setdef = res.setdefault
    for key, value in mapping.items():
        setdef(value, []).append(key)
    return res if mapping.__class__==dict else mapping.__class__(res)

Designed to be run under CPython 3.x, for 2.x replace mapping.items() with mapping.iteritems()

On my machine runs a bit faster, than other examples here


回答 22

我在循环“ for”和方法“ .get()”的帮助下编写了此代码,并将字典的名称“ map”更改为“ map1”,因为“ map”是一个函数。

def dict_invert(map1):
    inv_map = {} # new dictionary
    for key in map1.keys():
        inv_map[map1.get(key)] = key
    return inv_map

I wrote this with the help of cycle ‘for’ and method ‘.get()’ and I changed the name ‘map’ of the dictionary to ‘map1’ because ‘map’ is a function.

def dict_invert(map1):
    inv_map = {} # new dictionary
    for key in map1.keys():
        inv_map[map1.get(key)] = key
    return inv_map

回答 23

如果值不是唯一的,并且可能是哈希(一维):

for k, v in myDict.items():
    if len(v) > 1:
        for item in v:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

如果需要更深层次地进行递归,则只需一个维度即可:

def digList(lst):
    temp = []
    for item in lst:
        if type(item) is list:
            temp.append(digList(item))
        else:
            temp.append(item)
    return set(temp)

for k, v in myDict.items():
    if type(v) is list:
        items = digList(v)
        for item in items:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

If values aren’t unique AND may be a hash (one dimension):

for k, v in myDict.items():
    if len(v) > 1:
        for item in v:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)

And with a recursion if you need to dig deeper then just one dimension:

def digList(lst):
    temp = []
    for item in lst:
        if type(item) is list:
            temp.append(digList(item))
        else:
            temp.append(item)
    return set(temp)

for k, v in myDict.items():
    if type(v) is list:
        items = digList(v)
        for item in items:
            invDict[item] = invDict.get(item, [])
            invDict[item].append(k)
    else:
        invDict[v] = invDict.get(v, [])
        invDict[v].append(k)