python脚本的文件名和行号

问题:python脚本的文件名和行号

如何在python脚本中获取文件名和行号。

正是从异常回溯中获得的文件信息。在这种情况下不会引发异常。

How can I get the file name and line number in python script.

Exactly the file information we get from an exception traceback. In this case without raising an exception.


回答 0

感谢mcandre,答案是:

#python3
from inspect import currentframe, getframeinfo

frameinfo = getframeinfo(currentframe())

print(frameinfo.filename, frameinfo.lineno)

Thanks to mcandre, the answer is:

#python3
from inspect import currentframe, getframeinfo

frameinfo = getframeinfo(currentframe())

print(frameinfo.filename, frameinfo.lineno)

回答 1

是否使用currentframe().f_back取决于是否使用功能。

直接调用检查:

from inspect import currentframe, getframeinfo

cf = currentframe()
filename = getframeinfo(cf).filename

print "This is line 5, python says line ", cf.f_lineno 
print "The filename is ", filename

调用为您执行此操作的函数:

from inspect import currentframe

def get_linenumber():
    cf = currentframe()
    return cf.f_back.f_lineno

print "This is line 7, python says line ", get_linenumber()

Whether you use currentframe().f_back depends on whether you are using a function or not.

Calling inspect directly:

from inspect import currentframe, getframeinfo

cf = currentframe()
filename = getframeinfo(cf).filename

print "This is line 5, python says line ", cf.f_lineno 
print "The filename is ", filename

Calling a function that does it for you:

from inspect import currentframe

def get_linenumber():
    cf = currentframe()
    return cf.f_back.f_lineno

print "This is line 7, python says line ", get_linenumber()

回答 2

如果在公用文件中使用,则非常方便-打印文件名,行号和调用方法的功能:

import inspect
def getLineInfo():
    print(inspect.stack()[1][1],":",inspect.stack()[1][2],":",
          inspect.stack()[1][3])

Handy if used in a common file – prints file name, line number and function of the caller:

import inspect
def getLineInfo():
    print(inspect.stack()[1][1],":",inspect.stack()[1][2],":",
          inspect.stack()[1][3])

回答 3

档案名称

__file__
# or
sys.argv[0]

线

inspect.currentframe().f_lineno

(不是inspect.currentframe().f_back.f_lineno上面提到的)

Filename:

__file__
# or
sys.argv[0]

Line:

inspect.currentframe().f_lineno

(not inspect.currentframe().f_back.f_lineno as mentioned above)


回答 4

最好也使用sys-

print dir(sys._getframe())
print dir(sys._getframe().f_lineno)
print sys._getframe().f_lineno

输出为:

['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'f_back', 'f_builtins', 'f_code', 'f_exc_traceback', 'f_exc_type', 'f_exc_value', 'f_globals', 'f_lasti', 'f_lineno', 'f_locals', 'f_restricted', 'f_trace']
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']
14

Better to use sys also-

print dir(sys._getframe())
print dir(sys._getframe().f_lineno)
print sys._getframe().f_lineno

The output is:

['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'f_back', 'f_builtins', 'f_code', 'f_exc_traceback', 'f_exc_type', 'f_exc_value', 'f_globals', 'f_lasti', 'f_lineno', 'f_locals', 'f_restricted', 'f_trace']
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']
14

回答 5

只是为了贡献,

linecachepython中有一个模块,这里有两个链接可以提供帮助。

linecache模块文档
linecache源代码

从某种意义上讲,您可以将整个文件“转储”到其缓存中,并使用class中的linecache.cache数据读取它。

import linecache as allLines
## have in mind that fileName in linecache behaves as any other open statement, you will need a path to a file if file is not in the same directory as script
linesList = allLines.updatechache( fileName ,None)
for i,x in enumerate(lineslist): print(i,x) #prints the line number and content
#or for more info
print(line.cache)
#or you need a specific line
specLine = allLines.getline(fileName,numbOfLine)
#returns a textual line from that number of line

对于其他信息,为了进行错误处理,您可以简单地使用

from sys import exc_info
try:
     raise YourError # or some other error
except Exception:
     print(exc_info() )

Just to contribute,

there is a linecache module in python, here is two links that can help.

linecache module documentation
linecache source code

In a sense, you can “dump” a whole file into its cache , and read it with linecache.cache data from class.

import linecache as allLines
## have in mind that fileName in linecache behaves as any other open statement, you will need a path to a file if file is not in the same directory as script
linesList = allLines.updatechache( fileName ,None)
for i,x in enumerate(lineslist): print(i,x) #prints the line number and content
#or for more info
print(line.cache)
#or you need a specific line
specLine = allLines.getline(fileName,numbOfLine)
#returns a textual line from that number of line

For additional info, for error handling, you can simply use

from sys import exc_info
try:
     raise YourError # or some other error
except Exception:
     print(exc_info() )

回答 6

import inspect    

file_name = __FILE__
current_line_no = inspect.stack()[0][2]
current_function_name = inspect.stack()[0][3]

#Try printing inspect.stack() you can see current stack and pick whatever you want 
import inspect    

file_name = __FILE__
current_line_no = inspect.stack()[0][2]
current_function_name = inspect.stack()[0][3]

#Try printing inspect.stack() you can see current stack and pick whatever you want 

回答 7

在Python 3中,您可以在以下方面使用变体:

def Deb(msg = None):
  print(f"Debug {sys._getframe().f_back.f_lineno}: {msg if msg is not None else ''}")

在代码中,您可以使用:

Deb("Some useful information")
Deb()

生产:

123: Some useful information
124:

其中123和124是进行呼叫的线路。

In Python 3 you can use a variation on:

def Deb(msg = None):
  print(f"Debug {sys._getframe().f_back.f_lineno}: {msg if msg is not None else ''}")

In code, you can then use:

Deb("Some useful information")
Deb()

To produce:

123: Some useful information
124:

Where the 123 and 124 are the lines that the calls are made from.


回答 8

dmsg是我在VSCode 1.39.2中在Python 3.7.3中获取行号的方法(这是我的调试消息助记符):

import inspect

def dmsg(text_s):
    print (str(inspect.currentframe().f_back.f_lineno) + '| ' + text_s)

调用显示变量name_s及其值:

name_s = put_code_here
dmsg('name_s: ' + name_s)

输出看起来像这样:

37| name_s: value_of_variable_at_line_37

Here’s what works for me to get the line number in Python 3.7.3 in VSCode 1.39.2 (dmsg is my mnemonic for debug message):

import inspect

def dmsg(text_s):
    print (str(inspect.currentframe().f_back.f_lineno) + '| ' + text_s)

To call showing a variable name_s and its value:

name_s = put_code_here
dmsg('name_s: ' + name_s)

Output looks like this:

37| name_s: value_of_variable_at_line_37

如何使用Django删除表中的所有数据

问题:如何使用Django删除表中的所有数据

我有两个问题:

  1. 如何在Django中删除表格?
  2. 如何删除表格中的所有数据?

这是我的代码,不成功:

Reporter.objects.delete()

I have two questions:

  1. How do I delete a table in Django?
  2. How do I remove all the data in the table?

This is my code, which is not successful:

Reporter.objects.delete()

回答 0

经理内部:

def delete_everything(self):
    Reporter.objects.all().delete()

def drop_table(self):
    cursor = connection.cursor()
    table_name = self.model._meta.db_table
    sql = "DROP TABLE %s;" % (table_name, )
    cursor.execute(sql)

Inside a manager:

def delete_everything(self):
    Reporter.objects.all().delete()

def drop_table(self):
    cursor = connection.cursor()
    table_name = self.model._meta.db_table
    sql = "DROP TABLE %s;" % (table_name, )
    cursor.execute(sql)

回答 1

根据最新文档,正确的调用方法是:

Reporter.objects.all().delete()

As per the latest documentation, the correct method to call would be:

Reporter.objects.all().delete()

回答 2

如果要从所有表中删除所有数据,则可能需要尝试使用命令python manage.py flush。这将删除表中的所有数据,但表本身仍将存在。

在此处查看更多信息:https : //docs.djangoproject.com/en/1.8/ref/django-admin/

If you want to remove all the data from all your tables, you might want to try the command python manage.py flush. This will delete all of the data in your tables, but the tables themselves will still exist.

See more here: https://docs.djangoproject.com/en/1.8/ref/django-admin/


回答 3

使用外壳,

1)删除表:

python manage.py dbshell
>> DROP TABLE {app_name}_{model_name}

2)要从表中删除所有数据:

python manage.py shell
>> from {app_name}.models import {model_name}
>> {model_name}.objects.all().delete()

Using shell,

1) For Deleting the table:

python manage.py dbshell
>> DROP TABLE {app_name}_{model_name}

2) For removing all data from table:

python manage.py shell
>> from {app_name}.models import {model_name}
>> {model_name}.objects.all().delete()

回答 4

Django 1.11从数据库表中删除所有对象-

Entry.objects.all().delete()  ## Entry being Model Name. 

请参考以下引用的Django官方文档-https: //docs.djangoproject.com/en/1.11/topics/db/queries/#deleting-objects

请注意,delete()是唯一未在Manager本身上公开的QuerySet方法。这是一种安全机制,可防止您意外地请求Entry.objects.delete()并删除所有条目。如果确实要删除所有对象,则必须显式请求一个完整的查询集:

我自己尝试了以下代码片段, somefilename.py

    # for deleting model objects
    from django.db import connection
    def del_model_4(self):
        with connection.schema_editor() as schema_editor:
            schema_editor.delete_model(model_4)

在我内部,我views.py有一个视图,只显示一个html页面…

  def data_del_4(request):
      obj = calc_2() ## 
      obj.del_model_4()
      return render(request, 'dc_dash/data_del_4.html') ## 

它结束了从-model == model_4删除所有条目的操作,但是当我尝试确定model_4的所有对象都已删除时,我现在看到管理控制台中的错误屏幕…

ProgrammingError at /admin/dc_dash/model_4/
relation "dc_dash_model_4" does not exist
LINE 1: SELECT COUNT(*) AS "__count" FROM "dc_dash_model_4" 

请考虑一下-如果我们不转到ADMIN控制台并尝试查看模型的对象-已经被删除-Django应用程序将按预期工作。

django管理员截屏

Django 1.11 delete all objects from a database table –

Entry.objects.all().delete()  ## Entry being Model Name. 

Refer the Official Django documentation here as quoted below – https://docs.djangoproject.com/en/1.11/topics/db/queries/#deleting-objects

Note that delete() is the only QuerySet method that is not exposed on a Manager itself. This is a safety mechanism to prevent you from accidentally requesting Entry.objects.delete(), and deleting all the entries. If you do want to delete all the objects, then you have to explicitly request a complete query set:

I myself tried the code snippet seen below within my somefilename.py

    # for deleting model objects
    from django.db import connection
    def del_model_4(self):
        with connection.schema_editor() as schema_editor:
            schema_editor.delete_model(model_4)

and within my views.py i have a view that simply renders a html page …

  def data_del_4(request):
      obj = calc_2() ## 
      obj.del_model_4()
      return render(request, 'dc_dash/data_del_4.html') ## 

it ended deleting all entries from – model == model_4 , but now i get to see a Error screen within Admin console when i try to asceratin that all objects of model_4 have been deleted …

ProgrammingError at /admin/dc_dash/model_4/
relation "dc_dash_model_4" does not exist
LINE 1: SELECT COUNT(*) AS "__count" FROM "dc_dash_model_4" 

Do consider that – if we do not go to the ADMIN Console and try and see objects of the model – which have been already deleted – the Django app works just as intended.

django admin screencapture


回答 5

有两种方法:

要直接删除它:

SomeModel.objects.filter(id=id).delete()

要从实例中删除它:

instance1 = SomeModel.objects.get(id=id)
instance1.delete()

//不要使用相同的名称

There are a couple of ways:

To delete it directly:

SomeModel.objects.filter(id=id).delete()

To delete it from an instance:

instance1 = SomeModel.objects.get(id=id)
instance1.delete()

// don’t use same name


有没有办法从解释器的内存中删除创建的变量,函数等?

问题:有没有办法从解释器的内存中删除创建的变量,函数等?

我一直在寻找这个问题的准确答案已有几天了,但是还没有任何好的结果。我不是编程的完整初学者,但即使在中级水平上也还不是。

当我进入Python的外壳程序时,键入:dir()并且可以看到当前作用域(主块)中所有对象的所有名称,其中有6个:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

然后,例如,当我声明一个变量时,x = 10它会自动添加到内置模块下的对象列表中dir(),当我dir()再次键入时,它现在显示:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

函数,类等也是如此。

我如何删除所有这些新对象而不删除开始时可用的标准6?

我在这里已经阅读了有关“内存清理”,“控制台清理”的信息,该命令将删除命令提示符窗口中的所有文本:

>>> import sys
>>> clear = lambda: os.system('cls')
>>> clear()

但是,所有这些都与我要实现的目标无关,它不会清除所有使用过的对象。

I’ve been searching for the accurate answer to this question for a couple of days now but haven’t got anything good. I’m not a complete beginner in programming, but not yet even on the intermediate level.

When I’m in the shell of Python, I type: dir() and I can see all the names of all the objects in the current scope (main block), there are 6 of them:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']

Then, when I’m declaring a variable, for example x = 10, it automatically adds to that lists of objects under built-in module dir(), and when I type dir() again, it shows now:

['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'x']

The same goes for functions, classes and so on.

How do I delete all those new objects without erasing the standard 6 which where available at the beginning?

I’ve read here about “memory cleaning”, “cleaning of the console”, which erases all the text from the command prompt window:

>>> import sys
>>> clear = lambda: os.system('cls')
>>> clear()

But all this has nothing to do with what I’m trying to achieve, it doesn’t clean out all used objects.


回答 0

您可以使用以下方法删除个人名称del

del x

或者您可以将它们从globals()对象中删除:

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

这只是一个示例循环。它只在防御性上删除不以下划线开头的名称,并假设(并非毫无道理)您在解释器的开头仅使用了不带下划线的名称。如果您确实想透彻一点,则可以使用硬编码的名称列表来保留(白名单)。除了退出并重新启动解释器外,没有内置函数可以为您执行清除操作。

您已导入(import os)的模块将保持导入状态,因为它们被sys.modules; 引用。后续导入将重用已经导入的模块对象。您只是在当前的全局命名空间中没有对它们的引用。

You can delete individual names with del:

del x

or you can remove them from the globals() object:

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

This is just an example loop; it defensively only deletes names that do not start with an underscore, making a (not unreasoned) assumption that you only used names without an underscore at the start in your interpreter. You could use a hard-coded list of names to keep instead (whitelisting) if you really wanted to be thorough. There is no built-in function to do the clearing for you, other than just exit and restart the interpreter.

Modules you’ve imported (import os) are going to remain imported because they are referenced by sys.modules; subsequent imports will reuse the already imported module object. You just won’t have a reference to them in your current global namespace.


回答 1

是。有一种简单的方法可以删除iPython中的所有内容。在iPython控制台中,只需键入:

%reset

然后系统会要求您确认。按y。如果您不想看到此提示,只需键入:

%reset -f

这应该工作..

Yes. There is a simple way to remove everything in iPython. In iPython console, just type:

%reset

Then system will ask you to confirm. Press y. If you don’t want to see this prompt, simply type:

%reset -f

This should work..


回答 2

您可以使用python垃圾收集器:

import gc
gc.collect()

You can use python garbage collector:

import gc
gc.collect()

回答 3

如果您在的交互式环境中,Jupyter或者,如果您不想要的var变重,则ipython可能需要 清除它们。

magic命令resetreset_selective在交互式python会话(例如ipython和)上可用 Jupyter

1) reset

reset 通过删除用户定义的所有名称(如果不带参数调用)来重置命名空间。

inout参数指定是否要刷新输入/输出缓存。目录历史记录将使用dhist参数刷新。

reset in out

另一个有趣的是array,仅删除了numpy数组:

reset array

2)reset_selective

通过删除用户定义的名称来重置命名空间。输入/输出历史记录保留在您需要的时候。

清洁阵列示例:

In [1]: import numpy as np
In [2]: littleArray = np.array([1,2,3,4,5])
In [3]: who_ls
Out[3]: ['littleArray', 'np']
In [4]: reset_selective -f littleArray
In [5]: who_ls
Out[5]: ['np']

来源:http : //ipython.readthedocs.io/en/stable/interactive/magics.html

If you are in an interactive environment like Jupyter or ipython you might be interested in clearing unwanted var’s if they are getting heavy.

The magic-commands reset and reset_selective is vailable on interactive python sessions like ipython and Jupyter

1) reset

reset Resets the namespace by removing all names defined by the user, if called without arguments.

in and the out parameters specify whether you want to flush the in/out caches. The directory history is flushed with the dhist parameter.

reset in out

Another interesting one is array that only removes numpy Arrays:

reset array

2) reset_selective

Resets the namespace by removing names defined by the user. Input/Output history are left around in case you need them.

Clean Array Example:

In [1]: import numpy as np
In [2]: littleArray = np.array([1,2,3,4,5])
In [3]: who_ls
Out[3]: ['littleArray', 'np']
In [4]: reset_selective -f littleArray
In [5]: who_ls
Out[5]: ['np']

Source: http://ipython.readthedocs.io/en/stable/interactive/magics.html


回答 4

这对我有用。

您需要对全局变量运行两次,然后对本地变量运行

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

for name in dir():
    if not name.startswith('_'):
        del locals()[name]

This worked for me.

You need to run it twice once for globals followed by locals

for name in dir():
    if not name.startswith('_'):
        del globals()[name]

for name in dir():
    if not name.startswith('_'):
        del locals()[name]

回答 5

实际上python会回收不再使用的内存。这被称为垃圾回收,这是python中的自动过程。但是,如果您仍然想这样做,则可以通过删除它del variable_name。您也可以通过将变量分配给None

a = 10
print a 

del a       
print a      ## throws an error here because it's been deleted already.

真正从未引用的Python对象回收内存的唯一方法是通过垃圾回收器。del关键字只是将对象的名称解除绑定,但是仍然需要对该对象进行垃圾回收。您可以强制垃圾收集器使用gc模块运行,但这几乎可以肯定是过早的优化,但是它有其自身的风险。使用del并没有真正的效果,因为这些名称会因为超出范围而被删除。

Actually python will reclaim the memory which is not in use anymore.This is called garbage collection which is automatic process in python. But still if you want to do it then you can delete it by del variable_name. You can also do it by assigning the variable to None

a = 10
print a 

del a       
print a      ## throws an error here because it's been deleted already.

The only way to truly reclaim memory from unreferenced Python objects is via the garbage collector. The del keyword simply unbinds a name from an object, but the object still needs to be garbage collected. You can force garbage collector to run using the gc module, but this is almost certainly a premature optimization but it has its own risks. Using del has no real effect, since those names would have been deleted as they went out of scope anyway.


将Python代码转换为符合PEP8的工具

问题:将Python代码转换为符合PEP8的工具

我知道有一些工具可以验证您的Python代码是否符合PEP8,例如,既有在线服务又有python模块

但是,我找不到可以我的Python文件转换为自包含的PEP8有效Python文件的服务或模块。有人知道是否有吗?
我认为这是可行的,因为PEP8完全是关于代码的外观,对吧?

I know there are tools which validate whether your Python code is compliant with PEP8, for example there is both an online service and a python module.

However, I cannot find a service or module which can convert my Python file to a self-contained, PEP8 valid Python file. Does anyone know if there are any?
I assume it’s feasible since PEP8 is all about the appearance of the code, right?


回答 0

不幸的是,“ pep8风暴”(整个项目)有几个负面影响:

  • 很多合并冲突
  • 打破git的责备
  • 使代码审查困难

作为替代方案(感谢@yp的想法),我编写了一个小程序包,该程序仅自动pepep8s自上次提交/分支以来一直在处理的那些行:

基本上使项目 比您发现的要好

pip install pep8radius

假设您已经完成工作master并准备提交:

# be somewhere in your project directory
# see the diff with pep, see the changes you've made since master
pep8radius master --diff
# make those changes
pep8radius master --diff --in-place

或者清除自上次提交以来已提交的新行:

pep8radius --diff
pep8radius --diff --in-place

# the lines which changed since a specific commit `git diff 98f51f`
pep8radius 98f51f --diff

基本上pep8radius是将autopep8应用于git / hg diff输出中的行(来自最后一个共享commit)。

该脚本当前可与git和hg一起使用,如果您使用其他方法并希望它起作用,请发表评论/问题/ PR

Unfortunately “pep8 storming” (the entire project) has several negative side-effects:

  • lots of merge-conflicts
  • break git blame
  • make code review difficult

As an alternative (and thanks to @y-p for the idea), I wrote a small package which autopep8s only those lines which you have been working on since the last commit/branch:

Basically leaving the project a little better than you found it:

pip install pep8radius

Suppose you’ve done your work off of master and are ready to commit:

# be somewhere in your project directory
# see the diff with pep, see the changes you've made since master
pep8radius master --diff
# make those changes
pep8radius master --diff --in-place

Or to clean the new lines you’ve commited since the last commit:

pep8radius --diff
pep8radius --diff --in-place

# the lines which changed since a specific commit `git diff 98f51f`
pep8radius 98f51f --diff

Basically pep8radius is applying autopep8 to lines in the output of git/hg diff (from the last shared commit).

This script currently works with git and hg, if your using something else and want this to work please post a comment/issue/PR!


回答 1

您可以使用autopep8!在您自己泡杯咖啡的同时,该工具会愉快地删除所有那些不会改变代码含义的讨厌的PEP8违规行为。

通过pip安装它:

pip install autopep8

将此应用到特定文件:

autopep8 py_file --in-place

或对您的项目(递归地),verbose选项为您提供了一些进展的反馈

autopep8 project_dir --recursive --in-place --pep8-passes 2000 --verbose

注意:有时默认的100次通过还不够,我将其设置为2000,因为它相当高,它将捕获除最麻烦的文件以外的所有文件(一旦发现没有可解决的pep8违规行为,它将停止通过)…

此时,我建议重新测试并进行提交!

如果要“完全”符合PEP8:我使用的一种策略是如上所述运行autopep8,然后运行PEP8,它会打印其余的违规行为(文件,行号以及其他内容):

pep8 project_dir --ignore=E501

并手动更改它们(例如E712s-与布尔值比较)。

注意:autopep8提供了一个--aggressive参数(以无情地“修复”这些改变含义的违规行为),但是请注意,如果您确实使用激进的方法,则可能必须调试…(例如,在numpy / pandas中,True == np.bool_(True)但不能True is np.bool_(True)!)

您可以检查每种类型(前后)有多少次违规:

pep8 --quiet --statistics .

注意:我认为E501(行太长)是一种特殊情况,因为您的代码中可能会有很多这样的代码,有时autopep8无法纠正这些代码。

例如,我将此技术应用于了熊猫代码库。

You can use autopep8! Whilst you make yourself a cup of coffee this tool happily removes all those pesky PEP8 violations which don’t change the meaning of the code.

Install it via pip:

pip install autopep8

Apply this to a specific file:

autopep8 py_file --in-place

or to your project (recursively), the verbose option gives you some feedback of how it’s going:

autopep8 project_dir --recursive --in-place --pep8-passes 2000 --verbose

Note: Sometimes the default of 100 passes isn’t enough, I set it to 2000 as it’s reasonably high and will catch all but the most troublesome files (it stops passing once it finds no resolvable pep8 infractions)…

At this point I suggest retesting and doing a commit!

If you want “full” PEP8 compliance: one tactic I’ve used is to run autopep8 as above, then run PEP8, which prints the remaining violations (file, line number, and what):

pep8 project_dir --ignore=E501

and manually change these individually (e.g. E712s – comparison with boolean).

Note: autopep8 offers an --aggressive argument (to ruthlessly “fix” these meaning-changing violations), but beware if you do use aggressive you may have to debug… (e.g. in numpy/pandas True == np.bool_(True) but not True is np.bool_(True)!)

You can check how many violations of each type (before and after):

pep8 --quiet --statistics .

Note: I consider E501s (line too long) are a special case as there will probably be a lot of these in your code and sometimes these are not corrected by autopep8.

As an example, I applied this technique to the pandas code base.


回答 2

@Andy Hayden很好地概述了autopep8。除此之外,还有一个名为pep8ify的软件包,它也可以执行相同的操作。

但是,这两个软件包都只能消除棉绒错误,但不能格式化代码。

little = more[3:   5]

上面的代码在经过pep8化后仍然保持不变。但是代码看起来还不太好。您可以使用诸如yapf之类的格式化程序,即使代码是PEP8兼容的,也可以格式化代码。上面的代码将被格式化为

little = more[3:5]

有时这甚至会破坏您的手动格式。例如

BAZ = {
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
}

将被转换为

BAZ = {[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]}

但是您可以告诉它忽略某些部分。

BAZ = {
   [1, 2, 3, 4],
   [5, 6, 7, 8],
   [9, 10, 11, 12]
}  # yapf: disable

取自我的旧博客文章:自动PEP8和格式化Python代码!

@Andy Hayden gave good overview of autopep8. In addition to that there is one more package called pep8ify which also does the same thing.

However both packages can remove only lint errors but they cannot format code.

little = more[3:   5]

Above code remains same after pep8ifying also. But the code doesn’t look good yet. You can use formatters like yapf, which will format the code even if the code is PEP8 compliant. Above code will be formatted to

little = more[3:5]

Some times this even destroys Your manual formatting. For example

BAZ = {
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
}

will be converted to

BAZ = {[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]}

But You can tell it to ignore some parts.

BAZ = {
   [1, 2, 3, 4],
   [5, 6, 7, 8],
   [9, 10, 11, 12]
}  # yapf: disable

Taken from my old blog post: Automatically PEP8 & Format Your Python Code!


回答 3

如果您使用的是eclipse + PyDev,则只需从PyDev的设置中激活autopep8:Windows-> Preferences->在搜索过滤器中键入’autopep8’。

选中“使用autopep8.py进行代码格式化?” ->好

现在,eclipse的CTRL-SHIFT-F代码格式应该使用autopep8来格式化代码:)

If you’re using eclipse + PyDev you can simply activate autopep8 from PyDev’s settings: Windows -> Preferences -> type ‘autopep8’ in the search filter.

Check the ‘use autopep8.py for code formatting?’ -> OK

Now eclipse’s CTRL-SHIFT-F code formatting should format your code using autopep8 :)


回答 4

我对python和代码风格的不同工具进行了广泛的研究。有两种类型的工具:linters-分析您的代码并给出有关错误使用的代码样式的警告并显示有关如何修复它的建议,以及code formatter-在保存文件时,它将使用PEP样式重新设置文档格式。

因为重新格式化必须更加准确-如果重新格式化您不希望的内容变得无用-它们覆盖的PEP所占比例较小,因此棉绒显示的更多。

它们都具有不同的配置权限-例如,在所有规则中都可以配置pylinter(您可以打开/关闭每种类型的警告),根本无法配置黑色。

以下是一些有用的链接和教程:

说明文件:

短绒(按受欢迎程度排序):

代码格式化程序(按流行程度排序):

I made wide research about different instruments for python and code style. There are two types of instruments: linters – analyzing your code and give some warnings about bad used code styles and showing advices how to fix it, and code formatters – when you save your file it re-format your document using PEP style.

Because re-formatting must be more accurate – if it remorfat something that you don’t want it became useless – they cover less part of PEP, linters show much more.

All of them have different permissions for configuring – for example, pylinter configurable in all its rules (you can turn on/off every type of warnings), black unconfigurable at all.

Here are some useful links and tutorials:

Documentation:

Linters (in order of popularity):

Code formatters (in order of popularity):


回答 5

有许多。

IDE通常具有一些内置的格式化功能。IntelliJ Idea / PyCharm确实具有这种功能,Eclipse的Python插件也是如此,依此类推。

有些格式化程序/分类符可以针对多种语言。https://coala.io是一个很好的例子。

然后是单目的工具,其他答案中提到了许多工具。

自动重新格式化的一种特定方法是将文件解析为AST树(不删除注释),然后将其转储回文本(意味着不保留任何原始格式)。例如:https://github.com/python/black

There are many.

IDEs usually have some formatting capability built in. IntelliJ Idea / PyCharm does, same goes for the Python plugin for Eclipse, and so on.

There are formatters/linters that can target multiple languages. https://coala.io is a good example of those.

Then there are the single purpose tools, of which many are mentioned in other answers.

One specific method of automatic reformatting is to parse the file into AST tree (without dropping comments) and then dump it back to text (meaning nothing of the original formatting is preserved). Example of that would be https://github.com/python/black.


多重处理:如何在多个流程之间共享一个字典?

问题:多重处理:如何在多个流程之间共享一个字典?

一个程序,该程序创建在可连接队列上工作的多个进程Q,并且最终可能会操纵全局字典D来存储结果。(因此每个子进程都可以D用来存储其结果,并查看其他子进程正在产生什么结果)

如果我在子进程中打印字典D,我会看到对它进行的修改(即在D上)。但是在主流程加入Q之后,如果我打印D,那就是空洞的字典!

我了解这是同步/锁定问题。有人可以告诉我这里发生了什么,如何同步对D的访问?

A program that creates several processes that work on a join-able queue, Q, and may eventually manipulate a global dictionary D to store results. (so each child process may use D to store its result and also see what results the other child processes are producing)

If I print the dictionary D in a child process, I see the modifications that have been done on it (i.e. on D). But after the main process joins Q, if I print D, it’s an empty dict!

I understand it is a synchronization/lock issue. Can someone tell me what is happening here, and how I can synchronize access to D?


回答 0

一般的答案涉及使用Manager对象。改编自文档:

from multiprocessing import Process, Manager

def f(d):
    d[1] += '1'
    d['2'] += 2

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    d[1] = '1'
    d['2'] = 2

    p1 = Process(target=f, args=(d,))
    p2 = Process(target=f, args=(d,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    print d

输出:

$ python mul.py 
{1: '111', '2': 6}

A general answer involves using a Manager object. Adapted from the docs:

from multiprocessing import Process, Manager

def f(d):
    d[1] += '1'
    d['2'] += 2

if __name__ == '__main__':
    manager = Manager()

    d = manager.dict()
    d[1] = '1'
    d['2'] = 2

    p1 = Process(target=f, args=(d,))
    p2 = Process(target=f, args=(d,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

    print d

Output:

$ python mul.py 
{1: '111', '2': 6}

回答 1

多处理不像线程。每个子进程将获得主进程内存的副本。通常,状态是通过通信(管道/套接字),信号或共享内存共享的。

多重处理可为您的用例提供一些抽象-共享状态通过使用代理或共享内存被视为本地状态:http : //docs.python.org/library/multiprocessing.html#sharing-state-between-processes

相关章节:

multiprocessing is not like threading. Each child process will get a copy of the main process’s memory. Generally state is shared via communication (pipes/sockets), signals, or shared memory.

Multiprocessing makes some abstractions available for your use case – shared state that’s treated as local by use of proxies or shared memory: http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes

Relevant sections:


回答 2

我想分享自己的工作,该工作比Manager的指令快,并且比使用大量内存并且不适用于Mac OS的pyshmht库更简单,更稳定。虽然我的字典仅适用于纯字符串,并且目前不可变。我使用线性探测实现,并将键和值对存储在表后的单独内存块中。

from mmap import mmap
import struct
from timeit import default_timer
from multiprocessing import Manager
from pyshmht import HashTable


class shared_immutable_dict:
    def __init__(self, a):
        self.hs = 1 << (len(a) * 3).bit_length()
        kvp = self.hs * 4
        ht = [0xffffffff] * self.hs
        kvl = []
        for k, v in a.iteritems():
            h = self.hash(k)
            while ht[h] != 0xffffffff:
                h = (h + 1) & (self.hs - 1)
            ht[h] = kvp
            kvp += self.kvlen(k) + self.kvlen(v)
            kvl.append(k)
            kvl.append(v)

        self.m = mmap(-1, kvp)
        for p in ht:
            self.m.write(uint_format.pack(p))
        for x in kvl:
            if len(x) <= 0x7f:
                self.m.write_byte(chr(len(x)))
            else:
                self.m.write(uint_format.pack(0x80000000 + len(x)))
            self.m.write(x)

    def hash(self, k):
        h = hash(k)
        h = (h + (h >> 3) + (h >> 13) + (h >> 23)) * 1749375391 & (self.hs - 1)
        return h

    def get(self, k, d=None):
        h = self.hash(k)
        while True:
            x = uint_format.unpack(self.m[h * 4:h * 4 + 4])[0]
            if x == 0xffffffff:
                return d
            self.m.seek(x)
            if k == self.read_kv():
                return self.read_kv()
            h = (h + 1) & (self.hs - 1)

    def read_kv(self):
        sz = ord(self.m.read_byte())
        if sz & 0x80:
            sz = uint_format.unpack(chr(sz) + self.m.read(3))[0] - 0x80000000
        return self.m.read(sz)

    def kvlen(self, k):
        return len(k) + (1 if len(k) <= 0x7f else 4)

    def __contains__(self, k):
        return self.get(k, None) is not None

    def close(self):
        self.m.close()

uint_format = struct.Struct('>I')


def uget(a, k, d=None):
    return to_unicode(a.get(to_str(k), d))


def uin(a, k):
    return to_str(k) in a


def to_unicode(s):
    return s.decode('utf-8') if isinstance(s, str) else s


def to_str(s):
    return s.encode('utf-8') if isinstance(s, unicode) else s


def mmap_test():
    n = 1000000
    d = shared_immutable_dict({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'mmap speed: %d gets per sec' % (n / (default_timer() - start_time))


def manager_test():
    n = 100000
    d = Manager().dict({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'manager speed: %d gets per sec' % (n / (default_timer() - start_time))


def shm_test():
    n = 1000000
    d = HashTable('tmp', n)
    d.update({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'shm speed: %d gets per sec' % (n / (default_timer() - start_time))


if __name__ == '__main__':
    mmap_test()
    manager_test()
    shm_test()

在我的笔记本电脑上,性能结果是:

mmap speed: 247288 gets per sec
manager speed: 33792 gets per sec
shm speed: 691332 gets per sec

简单用法示例:

ht = shared_immutable_dict({'a': '1', 'b': '2'})
print ht.get('a')

I’d like to share my own work that is faster than Manager’s dict and is simpler and more stable than pyshmht library that uses tons of memory and doesn’t work for Mac OS. Though my dict only works for plain strings and is immutable currently. I use linear probing implementation and store keys and values pairs in a separate memory block after the table.

from mmap import mmap
import struct
from timeit import default_timer
from multiprocessing import Manager
from pyshmht import HashTable


class shared_immutable_dict:
    def __init__(self, a):
        self.hs = 1 << (len(a) * 3).bit_length()
        kvp = self.hs * 4
        ht = [0xffffffff] * self.hs
        kvl = []
        for k, v in a.iteritems():
            h = self.hash(k)
            while ht[h] != 0xffffffff:
                h = (h + 1) & (self.hs - 1)
            ht[h] = kvp
            kvp += self.kvlen(k) + self.kvlen(v)
            kvl.append(k)
            kvl.append(v)

        self.m = mmap(-1, kvp)
        for p in ht:
            self.m.write(uint_format.pack(p))
        for x in kvl:
            if len(x) <= 0x7f:
                self.m.write_byte(chr(len(x)))
            else:
                self.m.write(uint_format.pack(0x80000000 + len(x)))
            self.m.write(x)

    def hash(self, k):
        h = hash(k)
        h = (h + (h >> 3) + (h >> 13) + (h >> 23)) * 1749375391 & (self.hs - 1)
        return h

    def get(self, k, d=None):
        h = self.hash(k)
        while True:
            x = uint_format.unpack(self.m[h * 4:h * 4 + 4])[0]
            if x == 0xffffffff:
                return d
            self.m.seek(x)
            if k == self.read_kv():
                return self.read_kv()
            h = (h + 1) & (self.hs - 1)

    def read_kv(self):
        sz = ord(self.m.read_byte())
        if sz & 0x80:
            sz = uint_format.unpack(chr(sz) + self.m.read(3))[0] - 0x80000000
        return self.m.read(sz)

    def kvlen(self, k):
        return len(k) + (1 if len(k) <= 0x7f else 4)

    def __contains__(self, k):
        return self.get(k, None) is not None

    def close(self):
        self.m.close()

uint_format = struct.Struct('>I')


def uget(a, k, d=None):
    return to_unicode(a.get(to_str(k), d))


def uin(a, k):
    return to_str(k) in a


def to_unicode(s):
    return s.decode('utf-8') if isinstance(s, str) else s


def to_str(s):
    return s.encode('utf-8') if isinstance(s, unicode) else s


def mmap_test():
    n = 1000000
    d = shared_immutable_dict({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'mmap speed: %d gets per sec' % (n / (default_timer() - start_time))


def manager_test():
    n = 100000
    d = Manager().dict({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'manager speed: %d gets per sec' % (n / (default_timer() - start_time))


def shm_test():
    n = 1000000
    d = HashTable('tmp', n)
    d.update({str(i * 2): '1' for i in xrange(n)})
    start_time = default_timer()
    for i in xrange(n):
        if bool(d.get(str(i))) != (i % 2 == 0):
            raise Exception(i)
    print 'shm speed: %d gets per sec' % (n / (default_timer() - start_time))


if __name__ == '__main__':
    mmap_test()
    manager_test()
    shm_test()

On my laptop performance results are:

mmap speed: 247288 gets per sec
manager speed: 33792 gets per sec
shm speed: 691332 gets per sec

simple usage example:

ht = shared_immutable_dict({'a': '1', 'b': '2'})
print ht.get('a')

回答 3

除了这里的@senderle之外,有些人可能还想知道如何使用的功能multiprocessing.Pool

令人高兴的是,实例中有一个.Pool()方法可以manager模拟所有熟悉的顶层API multiprocessing

from itertools import repeat
import multiprocessing as mp
import os
import pprint

def f(d: dict) -> None:
    pid = os.getpid()
    d[pid] = "Hi, I was written by process %d" % pid

if __name__ == '__main__':
    with mp.Manager() as manager:
        d = manager.dict()
        with manager.Pool() as pool:
            pool.map(f, repeat(d, 10))
        # `d` is a DictProxy object that can be converted to dict
        pprint.pprint(dict(d))

输出:

$ python3 mul.py 
{22562: 'Hi, I was written by process 22562',
 22563: 'Hi, I was written by process 22563',
 22564: 'Hi, I was written by process 22564',
 22565: 'Hi, I was written by process 22565',
 22566: 'Hi, I was written by process 22566',
 22567: 'Hi, I was written by process 22567',
 22568: 'Hi, I was written by process 22568',
 22569: 'Hi, I was written by process 22569',
 22570: 'Hi, I was written by process 22570',
 22571: 'Hi, I was written by process 22571'}

这是一个稍有不同的示例,其中每个进程仅将其进程ID记录到全局DictProxy对象中d

In addition to @senderle’s here, some might also be wondering how to use the functionality of multiprocessing.Pool.

The nice thing is that there is a .Pool() method to the manager instance that mimics all the familiar API of the top-level multiprocessing.

from itertools import repeat
import multiprocessing as mp
import os
import pprint

def f(d: dict) -> None:
    pid = os.getpid()
    d[pid] = "Hi, I was written by process %d" % pid

if __name__ == '__main__':
    with mp.Manager() as manager:
        d = manager.dict()
        with manager.Pool() as pool:
            pool.map(f, repeat(d, 10))
        # `d` is a DictProxy object that can be converted to dict
        pprint.pprint(dict(d))

Output:

$ python3 mul.py 
{22562: 'Hi, I was written by process 22562',
 22563: 'Hi, I was written by process 22563',
 22564: 'Hi, I was written by process 22564',
 22565: 'Hi, I was written by process 22565',
 22566: 'Hi, I was written by process 22566',
 22567: 'Hi, I was written by process 22567',
 22568: 'Hi, I was written by process 22568',
 22569: 'Hi, I was written by process 22569',
 22570: 'Hi, I was written by process 22570',
 22571: 'Hi, I was written by process 22571'}

This is a slightly different example where each process just logs its process ID to the global DictProxy object d.


回答 4

也许您可以尝试pyshmht,为Python共享基于内存的哈希表扩展。

注意

  1. 尚未经过全面测试,仅供参考。

  2. 当前,它缺乏用于多处理的锁定/ sem机制。

Maybe you can try pyshmht, sharing memory based hash table extension for Python.

Notice

  1. It’s not fully tested, just for your reference.

  2. It currently lacks lock/sem mechanisms for multiprocessing.


Python中是否有`string.split()`的生成器版本?

问题:Python中是否有`string.split()`的生成器版本?

string.split()返回列表实例。是否有返回生成器的版本?是否有任何理由禁止使用生成器版本?

string.split() returns a list instance. Is there a version that returns a generator instead? Are there any reasons against having a generator version?


回答 0

re.finditer使用相当少的内存开销的可能性很大。

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

演示:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

编辑:我刚刚确认,假设我的测试方法正确,这将在python 3.2.1中占用不变的内存。我创建了一个非常大的字符串(大约1GB),然后使用for循环遍历了可迭代对象(没有列表理解,这会产生额外的内存)。这不会导致内存的显着增长(也就是说,如果内存增长,则远远少于1GB字符串)。

It is highly probable that re.finditer uses fairly minimal memory overhead.

def split_iter(string):
    return (x.group(0) for x in re.finditer(r"[A-Za-z']+", string))

Demo:

>>> list( split_iter("A programmer's RegEx test.") )
['A', "programmer's", 'RegEx', 'test']

edit: I have just confirmed that this takes constant memory in python 3.2.1, assuming my testing methodology was correct. I created a string of very large size (1GB or so), then iterated through the iterable with a for loop (NOT a list comprehension, which would have generated extra memory). This did not result in a noticeable growth of memory (that is, if there was a growth in memory, it was far far less than the 1GB string).


回答 1

我可以想到的最有效的方法是使用方法的offset参数编写一个str.find()。这样可以避免大量的内存使用,并在不需要时依靠正则表达式的开销。

[编辑2016-8-2:已对此进行更新,可以选择支持正则表达式分隔符]

def isplit(source, sep=None, regex=False):
    """
    generator version of str.split()

    :param source:
        source string (unicode or bytes)

    :param sep:
        separator to split on.

    :param regex:
        if True, will treat sep as regular expression.

    :returns:
        generator yielding elements of string.
    """
    if sep is None:
        # mimic default python behavior
        source = source.strip()
        sep = "\\s+"
        if isinstance(source, bytes):
            sep = sep.encode("ascii")
        regex = True
    if regex:
        # version using re.finditer()
        if not hasattr(sep, "finditer"):
            sep = re.compile(sep)
        start = 0
        for m in sep.finditer(source):
            idx = m.start()
            assert idx >= start
            yield source[start:idx]
            start = m.end()
        yield source[start:]
    else:
        # version using str.find(), less overhead than re.finditer()
        sepsize = len(sep)
        start = 0
        while True:
            idx = source.find(sep, start)
            if idx == -1:
                yield source[start:]
                return
            yield source[start:idx]
            start = idx + sepsize

可以根据需要使用…

>>> print list(isplit("abcb","b"))
['a','c','']

每次执行find()或切片时,在字符串中都需要花费一点成本,但这应该是最小的,因为字符串被表示为内存中的连续数组。

The most efficient way I can think of it to write one using the offset parameter of the str.find() method. This avoids lots of memory use, and relying on the overhead of a regexp when it’s not needed.

[edit 2016-8-2: updated this to optionally support regex separators]

def isplit(source, sep=None, regex=False):
    """
    generator version of str.split()

    :param source:
        source string (unicode or bytes)

    :param sep:
        separator to split on.

    :param regex:
        if True, will treat sep as regular expression.

    :returns:
        generator yielding elements of string.
    """
    if sep is None:
        # mimic default python behavior
        source = source.strip()
        sep = "\\s+"
        if isinstance(source, bytes):
            sep = sep.encode("ascii")
        regex = True
    if regex:
        # version using re.finditer()
        if not hasattr(sep, "finditer"):
            sep = re.compile(sep)
        start = 0
        for m in sep.finditer(source):
            idx = m.start()
            assert idx >= start
            yield source[start:idx]
            start = m.end()
        yield source[start:]
    else:
        # version using str.find(), less overhead than re.finditer()
        sepsize = len(sep)
        start = 0
        while True:
            idx = source.find(sep, start)
            if idx == -1:
                yield source[start:]
                return
            yield source[start:idx]
            start = idx + sepsize

This can be used like you want…

>>> print list(isplit("abcb","b"))
['a','c','']

While there is a little bit of cost seeking within the string each time find() or slicing is performed, this should be minimal since strings are represented as continguous arrays in memory.


回答 2

这是split()通过实现的生成器版本,re.search()不存在分配太多子字符串的问题。

import re

def itersplit(s, sep=None):
    exp = re.compile(r'\s+' if sep is None else re.escape(sep))
    pos = 0
    while True:
        m = exp.search(s, pos)
        if not m:
            if pos < len(s) or sep is not None:
                yield s[pos:]
            break
        if pos < m.start() or sep is not None:
            yield s[pos:m.start()]
        pos = m.end()


sample1 = "Good evening, world!"
sample2 = " Good evening, world! "
sample3 = "brackets][all][][over][here"
sample4 = "][brackets][all][][over][here]["

assert list(itersplit(sample1)) == sample1.split()
assert list(itersplit(sample2)) == sample2.split()
assert list(itersplit(sample3, '][')) == sample3.split('][')
assert list(itersplit(sample4, '][')) == sample4.split('][')

编辑:如果没有给出分隔符,则纠正了周围空白的处理。

This is generator version of split() implemented via re.search() that does not have the problem of allocating too many substrings.

import re

def itersplit(s, sep=None):
    exp = re.compile(r'\s+' if sep is None else re.escape(sep))
    pos = 0
    while True:
        m = exp.search(s, pos)
        if not m:
            if pos < len(s) or sep is not None:
                yield s[pos:]
            break
        if pos < m.start() or sep is not None:
            yield s[pos:m.start()]
        pos = m.end()


sample1 = "Good evening, world!"
sample2 = " Good evening, world! "
sample3 = "brackets][all][][over][here"
sample4 = "][brackets][all][][over][here]["

assert list(itersplit(sample1)) == sample1.split()
assert list(itersplit(sample2)) == sample2.split()
assert list(itersplit(sample3, '][')) == sample3.split('][')
assert list(itersplit(sample4, '][')) == sample4.split('][')

EDIT: Corrected handling of surrounding whitespace if no separator chars are given.


回答 3

对提出的各种方法进行了性能测试(我在这里不再重复)。一些结果:

  • str.split (默认= 0.3461570239996945
  • 手动搜索(按字符)(Dave Webb的答案之一)= 0.8260340550004912
  • re.finditer (ninjagecko的答案)= 0.698872097000276
  • str.find (Eli Collins的答案之一)= 0.7230395330007013
  • itertools.takewhile (伊格纳西奥·巴斯克斯(Ignacio Vazquez-Abrams)的答案)= 2.023023967998597
  • str.split(..., maxsplit=1) 递归= N / A†

† 鉴于s的速度,递归答案(string.split带有maxsplit = 1)未能在合理的时间内完成string.split,但它们可能在较短的字符串上可以更好地工作,但是后来我看不到内存不成问题的短字符串的用例。

使用timeit以下测试:

the_text = "100 " * 9999 + "100"

def test_function( method ):
    def fn( ):
        total = 0

        for x in method( the_text ):
            total += int( x )

        return total

    return fn

这就提出了另一个问题,即为什么string.split尽管使用了内存却速度如此之快。

Did some performance testing on the various methods proposed (I won’t repeat them here). Some results:

  • str.split (default = 0.3461570239996945
  • manual search (by character) (one of Dave Webb’s answer’s) = 0.8260340550004912
  • re.finditer (ninjagecko’s answer) = 0.698872097000276
  • str.find (one of Eli Collins’s answers) = 0.7230395330007013
  • itertools.takewhile (Ignacio Vazquez-Abrams’s answer) = 2.023023967998597
  • str.split(..., maxsplit=1) recursion = N/A†

†The recursion answers (string.split with maxsplit = 1) fail to complete in a reasonable time, given string.splits speed they may work better on shorter strings, but then I can’t see the use-case for short strings where memory isn’t an issue anyway.

Tested using timeit on:

the_text = "100 " * 9999 + "100"

def test_function( method ):
    def fn( ):
        total = 0

        for x in method( the_text ):
            total += int( x )

        return total

    return fn

This raises another question as to why string.split is so much faster despite its memory usage.


回答 4

这是我的实现,比这里的其他答案要快得多,更完整。它具有针对不同情况的4个单独的子功能。

我将只复制main str_split函数的文档字符串:


str_split(s, *delims, empty=None)

s用其余的参数分割字符串,可能省略空白部分(empty关键字参数负责)。这是一个生成器功能。

如果仅提供一个定界符,则字符串将被它简单分割。 empty然后True默认情况下。

str_split('[]aaa[][]bb[c', '[]')
    -> '', 'aaa', '', 'bb[c'
str_split('[]aaa[][]bb[c', '[]', empty=False)
    -> 'aaa', 'bb[c'

如果提供了多个定界符,则默认情况下,该字符串将按这些定界符的最长可能序列进行拆分,或者,如果empty将其设置为 True,则还包括定界符之间的空字符串。注意,在这种情况下,分隔符只能是单个字符。

str_split('aaa, bb : c;', ' ', ',', ':', ';')
    -> 'aaa', 'bb', 'c'
str_split('aaa, bb : c;', *' ,:;', empty=True)
    -> 'aaa', '', 'bb', '', '', 'c', ''

如果未提供定界符,string.whitespace则使用,因此效果与相同str.split(),不同之处在于此函数是一个生成器。

str_split('aaa\\t  bb c \\n')
    -> 'aaa', 'bb', 'c'

import string

def _str_split_chars(s, delims):
    "Split the string `s` by characters contained in `delims`, including the \
    empty parts between two consecutive delimiters"
    start = 0
    for i, c in enumerate(s):
        if c in delims:
            yield s[start:i]
            start = i+1
    yield s[start:]

def _str_split_chars_ne(s, delims):
    "Split the string `s` by longest possible sequences of characters \
    contained in `delims`"
    start = 0
    in_s = False
    for i, c in enumerate(s):
        if c in delims:
            if in_s:
                yield s[start:i]
                in_s = False
        else:
            if not in_s:
                in_s = True
                start = i
    if in_s:
        yield s[start:]


def _str_split_word(s, delim):
    "Split the string `s` by the string `delim`"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    yield s[start:]

def _str_split_word_ne(s, delim):
    "Split the string `s` by the string `delim`, not including empty parts \
    between two consecutive delimiters"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            if start!=i:
                yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    if start<len(s):
        yield s[start:]


def str_split(s, *delims, empty=None):
    """\
Split the string `s` by the rest of the arguments, possibly omitting
empty parts (`empty` keyword argument is responsible for that).
This is a generator function.

When only one delimiter is supplied, the string is simply split by it.
`empty` is then `True` by default.
    str_split('[]aaa[][]bb[c', '[]')
        -> '', 'aaa', '', 'bb[c'
    str_split('[]aaa[][]bb[c', '[]', empty=False)
        -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest
possible sequences of those delimiters by default, or, if `empty` is set to
`True`, empty strings between the delimiters are also included. Note that
the delimiters in this case may only be single characters.
    str_split('aaa, bb : c;', ' ', ',', ':', ';')
        -> 'aaa', 'bb', 'c'
    str_split('aaa, bb : c;', *' ,:;', empty=True)
        -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, `string.whitespace` is used, so the effect
is the same as `str.split()`, except this function is a generator.
    str_split('aaa\\t  bb c \\n')
        -> 'aaa', 'bb', 'c'
"""
    if len(delims)==1:
        f = _str_split_word if empty is None or empty else _str_split_word_ne
        return f(s, delims[0])
    if len(delims)==0:
        delims = string.whitespace
    delims = set(delims) if len(delims)>=4 else ''.join(delims)
    if any(len(d)>1 for d in delims):
        raise ValueError("Only 1-character multiple delimiters are supported")
    f = _str_split_chars if empty else _str_split_chars_ne
    return f(s, delims)

该函数可在Python 3中使用,并且可以通过简单但很难看的修复程序使其在2和3版本中均可使用。该函数的第一行应更改为:

def str_split(s, *delims, **kwargs):
    """...docstring..."""
    empty = kwargs.get('empty')

Here is my implementation, which is much, much faster and more complete than the other answers here. It has 4 separate subfunctions for different cases.

I’ll just copy the docstring of the main str_split function:


str_split(s, *delims, empty=None)

Split the string s by the rest of the arguments, possibly omitting empty parts (empty keyword argument is responsible for that). This is a generator function.

When only one delimiter is supplied, the string is simply split by it. empty is then True by default.

str_split('[]aaa[][]bb[c', '[]')
    -> '', 'aaa', '', 'bb[c'
str_split('[]aaa[][]bb[c', '[]', empty=False)
    -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest possible sequences of those delimiters by default, or, if empty is set to True, empty strings between the delimiters are also included. Note that the delimiters in this case may only be single characters.

str_split('aaa, bb : c;', ' ', ',', ':', ';')
    -> 'aaa', 'bb', 'c'
str_split('aaa, bb : c;', *' ,:;', empty=True)
    -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, string.whitespace is used, so the effect is the same as str.split(), except this function is a generator.

str_split('aaa\\t  bb c \\n')
    -> 'aaa', 'bb', 'c'

import string

def _str_split_chars(s, delims):
    "Split the string `s` by characters contained in `delims`, including the \
    empty parts between two consecutive delimiters"
    start = 0
    for i, c in enumerate(s):
        if c in delims:
            yield s[start:i]
            start = i+1
    yield s[start:]

def _str_split_chars_ne(s, delims):
    "Split the string `s` by longest possible sequences of characters \
    contained in `delims`"
    start = 0
    in_s = False
    for i, c in enumerate(s):
        if c in delims:
            if in_s:
                yield s[start:i]
                in_s = False
        else:
            if not in_s:
                in_s = True
                start = i
    if in_s:
        yield s[start:]


def _str_split_word(s, delim):
    "Split the string `s` by the string `delim`"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    yield s[start:]

def _str_split_word_ne(s, delim):
    "Split the string `s` by the string `delim`, not including empty parts \
    between two consecutive delimiters"
    dlen = len(delim)
    start = 0
    try:
        while True:
            i = s.index(delim, start)
            if start!=i:
                yield s[start:i]
            start = i+dlen
    except ValueError:
        pass
    if start<len(s):
        yield s[start:]


def str_split(s, *delims, empty=None):
    """\
Split the string `s` by the rest of the arguments, possibly omitting
empty parts (`empty` keyword argument is responsible for that).
This is a generator function.

When only one delimiter is supplied, the string is simply split by it.
`empty` is then `True` by default.
    str_split('[]aaa[][]bb[c', '[]')
        -> '', 'aaa', '', 'bb[c'
    str_split('[]aaa[][]bb[c', '[]', empty=False)
        -> 'aaa', 'bb[c'

When multiple delimiters are supplied, the string is split by longest
possible sequences of those delimiters by default, or, if `empty` is set to
`True`, empty strings between the delimiters are also included. Note that
the delimiters in this case may only be single characters.
    str_split('aaa, bb : c;', ' ', ',', ':', ';')
        -> 'aaa', 'bb', 'c'
    str_split('aaa, bb : c;', *' ,:;', empty=True)
        -> 'aaa', '', 'bb', '', '', 'c', ''

When no delimiters are supplied, `string.whitespace` is used, so the effect
is the same as `str.split()`, except this function is a generator.
    str_split('aaa\\t  bb c \\n')
        -> 'aaa', 'bb', 'c'
"""
    if len(delims)==1:
        f = _str_split_word if empty is None or empty else _str_split_word_ne
        return f(s, delims[0])
    if len(delims)==0:
        delims = string.whitespace
    delims = set(delims) if len(delims)>=4 else ''.join(delims)
    if any(len(d)>1 for d in delims):
        raise ValueError("Only 1-character multiple delimiters are supported")
    f = _str_split_chars if empty else _str_split_chars_ne
    return f(s, delims)

This function works in Python 3, and an easy, though quite ugly, fix can be applied to make it work in both 2 and 3 versions. The first lines of the function should be changed to:

def str_split(s, *delims, **kwargs):
    """...docstring..."""
    empty = kwargs.get('empty')

回答 5

否,但是使用编写一个应该足够容易itertools.takewhile()

编辑:

非常简单,半断的实现:

import itertools
import string

def isplitwords(s):
  i = iter(s)
  while True:
    r = []
    for c in itertools.takewhile(lambda x: not x in string.whitespace, i):
      r.append(c)
    else:
      if r:
        yield ''.join(r)
        continue
      else:
        raise StopIteration()

No, but it should be easy enough to write one using itertools.takewhile().

EDIT:

Very simple, half-broken implementation:

import itertools
import string

def isplitwords(s):
  i = iter(s)
  while True:
    r = []
    for c in itertools.takewhile(lambda x: not x in string.whitespace, i):
      r.append(c)
    else:
      if r:
        yield ''.join(r)
        continue
      else:
        raise StopIteration()

回答 6

我认为的生成器版本没有任何明显的好处split()。生成器对象将必须包含整个字符串以进行迭代,因此您不必通过生成器来节省任何内存。

如果您想编写一个,那将很容易:

import string

def gsplit(s,sep=string.whitespace):
    word = []

    for c in s:
        if c in sep:
            if word:
                yield "".join(word)
                word = []
        else:
            word.append(c)

    if word:
        yield "".join(word)

I don’t see any obvious benefit to a generator version of split(). The generator object is going to have to contain the whole string to iterate over so you’re not going to save any memory by having a generator.

If you wanted to write one it would be fairly easy though:

import string

def gsplit(s,sep=string.whitespace):
    word = []

    for c in s:
        if c in sep:
            if word:
                yield "".join(word)
                word = []
        else:
            word.append(c)

    if word:
        yield "".join(word)

回答 7

我写了一个@ninjagecko答案的版本,其行为更类似于string.split(即默认情况下用空格定界,您可以指定定界符)。

def isplit(string, delimiter = None):
    """Like string.split but returns an iterator (lazy)

    Multiple character delimters are not handled.
    """

    if delimiter is None:
        # Whitespace delimited by default
        delim = r"\s"

    elif len(delimiter) != 1:
        raise ValueError("Can only handle single character delimiters",
                        delimiter)

    else:
        # Escape, incase it's "\", "*" etc.
        delim = re.escape(delimiter)

    return (x.group(0) for x in re.finditer(r"[^{}]+".format(delim), string))

这是我使用的测试(在python 3和python 2中):

# Wrapper to make it a list
def helper(*args,  **kwargs):
    return list(isplit(*args, **kwargs))

# Normal delimiters
assert helper("1,2,3", ",") == ["1", "2", "3"]
assert helper("1;2;3,", ";") == ["1", "2", "3,"]
assert helper("1;2 ;3,  ", ";") == ["1", "2 ", "3,  "]

# Whitespace
assert helper("1 2 3") == ["1", "2", "3"]
assert helper("1\t2\t3") == ["1", "2", "3"]
assert helper("1\t2 \t3") == ["1", "2", "3"]
assert helper("1\n2\n3") == ["1", "2", "3"]

# Surrounding whitespace dropped
assert helper(" 1 2  3  ") == ["1", "2", "3"]

# Regex special characters
assert helper(r"1\2\3", "\\") == ["1", "2", "3"]
assert helper(r"1*2*3", "*") == ["1", "2", "3"]

# No multi-char delimiters allowed
try:
    helper(r"1,.2,.3", ",.")
    assert False
except ValueError:
    pass

python的regex模块说 unicode空格做了“正确的事”,但我实际上尚未对其进行测试。

也可作为要点

I wrote a version of @ninjagecko’s answer that behaves more like string.split (i.e. whitespace delimited by default and you can specify a delimiter).

def isplit(string, delimiter = None):
    """Like string.split but returns an iterator (lazy)

    Multiple character delimters are not handled.
    """

    if delimiter is None:
        # Whitespace delimited by default
        delim = r"\s"

    elif len(delimiter) != 1:
        raise ValueError("Can only handle single character delimiters",
                        delimiter)

    else:
        # Escape, incase it's "\", "*" etc.
        delim = re.escape(delimiter)

    return (x.group(0) for x in re.finditer(r"[^{}]+".format(delim), string))

Here are the tests I used (in both python 3 and python 2):

# Wrapper to make it a list
def helper(*args,  **kwargs):
    return list(isplit(*args, **kwargs))

# Normal delimiters
assert helper("1,2,3", ",") == ["1", "2", "3"]
assert helper("1;2;3,", ";") == ["1", "2", "3,"]
assert helper("1;2 ;3,  ", ";") == ["1", "2 ", "3,  "]

# Whitespace
assert helper("1 2 3") == ["1", "2", "3"]
assert helper("1\t2\t3") == ["1", "2", "3"]
assert helper("1\t2 \t3") == ["1", "2", "3"]
assert helper("1\n2\n3") == ["1", "2", "3"]

# Surrounding whitespace dropped
assert helper(" 1 2  3  ") == ["1", "2", "3"]

# Regex special characters
assert helper(r"1\2\3", "\\") == ["1", "2", "3"]
assert helper(r"1*2*3", "*") == ["1", "2", "3"]

# No multi-char delimiters allowed
try:
    helper(r"1,.2,.3", ",.")
    assert False
except ValueError:
    pass

python’s regex module says that it does “the right thing” for unicode whitespace, but I haven’t actually tested it.

Also available as a gist.


回答 8

如果您还希望能够读取迭代器(以及返回一个迭代器),请尝试以下操作:

import itertools as it

def iter_split(string, sep=None):
    sep = sep or ' '
    groups = it.groupby(string, lambda s: s != sep)
    return (''.join(g) for k, g in groups if k)

用法

>>> list(iter_split(iter("Good evening, world!")))
['Good', 'evening,', 'world!']

If you would also like to be able to read an iterator (as well as return one) try this:

import itertools as it

def iter_split(string, sep=None):
    sep = sep or ' '
    groups = it.groupby(string, lambda s: s != sep)
    return (''.join(g) for k, g in groups if k)

Usage

>>> list(iter_split(iter("Good evening, world!")))
['Good', 'evening,', 'world!']

回答 9

more_itertools.split_atstr.split为迭代器提供一个模拟。

>>> import more_itertools as mit


>>> list(mit.split_at("abcdcba", lambda x: x == "b"))
[['a'], ['c', 'd', 'c'], ['a']]

>>> "abcdcba".split("b")
['a', 'cdc', 'a']

more_itertools 是第三方软件包。

more_itertools.split_at offers an analog to str.split for iterators.

>>> import more_itertools as mit


>>> list(mit.split_at("abcdcba", lambda x: x == "b"))
[['a'], ['c', 'd', 'c'], ['a']]

>>> "abcdcba".split("b")
['a', 'cdc', 'a']

more_itertools is a third-party package.


回答 10

我想展示如何使用find_iter解决方案为给定的定界符返回生成器,然后使用itertools中的成对配方来构建先前的下一个迭代,该迭代将获得与原始split方法相同的实际单词。


from more_itertools import pairwise
import re

string = "dasdha hasud hasuid hsuia dhsuai dhasiu dhaui d"
delimiter = " "
# split according to the given delimiter including segments beginning at the beginning and ending at the end
for prev, curr in pairwise(re.finditer("^|[{0}]+|$".format(delimiter), string)):
    print(string[prev.end(): curr.start()])

注意:

  1. 我使用prev&curr而不是prev&next,因为在python中覆盖next是一个非常糟糕的主意
  2. 这很有效

I wanted to show how to use the find_iter solution to return a generator for given delimiters and then use the pairwise recipe from itertools to build a previous next iteration which will get the actual words as in the original split method.


from more_itertools import pairwise
import re

string = "dasdha hasud hasuid hsuia dhsuai dhasiu dhaui d"
delimiter = " "
# split according to the given delimiter including segments beginning at the beginning and ending at the end
for prev, curr in pairwise(re.finditer("^|[{0}]+|$".format(delimiter), string)):
    print(string[prev.end(): curr.start()])

note:

  1. I use prev & curr instead of prev & next because overriding next in python is a very bad idea
  2. This is quite efficient

回答 11

最笨的方法,不使用正则表达式/ itertools:

def isplit(text, split='\n'):
    while text != '':
        end = text.find(split)

        if end == -1:
            yield text
            text = ''
        else:
            yield text[:end]
            text = text[end + 1:]

Dumbest method, without regex / itertools:

def isplit(text, split='\n'):
    while text != '':
        end = text.find(split)

        if end == -1:
            yield text
            text = ''
        else:
            yield text[:end]
            text = text[end + 1:]

回答 12

def split_generator(f,s):
    """
    f is a string, s is the substring we split on.
    This produces a generator rather than a possibly
    memory intensive list. 
    """
    i=0
    j=0
    while j<len(f):
        if i>=len(f):
            yield f[j:]
            j=i
        elif f[i] != s:
            i=i+1
        else:
            yield [f[j:i]]
            j=i+1
            i=i+1
def split_generator(f,s):
    """
    f is a string, s is the substring we split on.
    This produces a generator rather than a possibly
    memory intensive list. 
    """
    i=0
    j=0
    while j<len(f):
        if i>=len(f):
            yield f[j:]
            j=i
        elif f[i] != s:
            i=i+1
        else:
            yield [f[j:i]]
            j=i+1
            i=i+1

回答 13

这是一个简单的回应

def gen_str(some_string, sep):
    j=0
    guard = len(some_string)-1
    for i,s in enumerate(some_string):
        if s == sep:
           yield some_string[j:i]
           j=i+1
        elif i!=guard:
           continue
        else:
           yield some_string[j:]

here is a simple response

def gen_str(some_string, sep):
    j=0
    guard = len(some_string)-1
    for i,s in enumerate(some_string):
        if s == sep:
           yield some_string[j:i]
           j=i+1
        elif i!=guard:
           continue
        else:
           yield some_string[j:]

如何在python中获取当前日期时间的字符串格式?

问题:如何在python中获取当前日期时间的字符串格式?

例如,在2010年7月5日,我想计算字符串

 July 5, 2010

应该怎么做?

For example, on July 5, 2010, I would like to calculate the string

 July 5, 2010

How should this be done?


回答 0

您可以使用该datetime模块在Python中处理日期和时间。该strftime方法允许您使用指定的格式生成日期和时间的字符串表示形式。

>>> import datetime
>>> datetime.date.today().strftime("%B %d, %Y")
'July 23, 2010'
>>> datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y")
'10:36AM on July 23, 2010'

You can use the datetime module for working with dates and times in Python. The strftime method allows you to produce string representation of dates and times with a format you specify.

>>> import datetime
>>> datetime.date.today().strftime("%B %d, %Y")
'July 23, 2010'
>>> datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y")
'10:36AM on July 23, 2010'

回答 1

#python3

import datetime
print(
    '1: test-{date:%Y-%m-%d_%H:%M:%S}.txt'.format( date=datetime.datetime.now() )
    )

d = datetime.datetime.now()
print( "2a: {:%B %d, %Y}".format(d))

# see the f" to tell python this is a f string, no .format
print(f"2b: {d:%B %d, %Y}")

print(f"3: Today is {datetime.datetime.now():%Y-%m-%d} yay")

1:测试-2018-02-14_16:40:52.txt

2a:2018年3月4日

2b:2018年3月4日

3:今天是2018-11-11


描述:

使用新的字符串格式将值插入占位符{}的字符串中,value是当前时间。

然后,不只是将原始值显示为{},而是使用格式来获​​取正确的日期格式。

https://docs.python.org/3/library/string.html#formatexamples

#python3

import datetime
print(
    '1: test-{date:%Y-%m-%d_%H:%M:%S}.txt'.format( date=datetime.datetime.now() )
    )

d = datetime.datetime.now()
print( "2a: {:%B %d, %Y}".format(d))

# see the f" to tell python this is a f string, no .format
print(f"2b: {d:%B %d, %Y}")

print(f"3: Today is {datetime.datetime.now():%Y-%m-%d} yay")

1: test-2018-02-14_16:40:52.txt

2a: March 04, 2018

2b: March 04, 2018

3: Today is 2018-11-11 yay


Description:

Using the new string format to inject value into a string at placeholder {}, value is the current time.

Then rather than just displaying the raw value as {}, use formatting to obtain the correct date format.

https://docs.python.org/3/library/string.html#formatexamples


回答 2

>>> import datetime
>>> now = datetime.datetime.now()
>>> now.strftime("%B %d, %Y")
'July 23, 2010'
>>> import datetime
>>> now = datetime.datetime.now()
>>> now.strftime("%B %d, %Y")
'July 23, 2010'

回答 3

如果您不关心格式,只需要一些快速日期,则可以使用以下方法:

import time
print(time.ctime())

If you don’t care about formatting and you just need some quick date, you can use this:

import time
print(time.ctime())

在python中处理list.index(可能不存在)的最佳方法?

问题:在python中处理list.index(可能不存在)的最佳方法?

我有看起来像这样的代码:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

好的,所以简化了,但是您知道了。现在thing可能实际上不在列表中,在这种情况下,我想通过-1 thing_index。在其他语言中index(),如果找不到该元素,这就是您期望返回的结果。实际上,它引发了一个错误ValueError

我可以这样做:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

但这感觉很脏,而且我不知道是否ValueError可以出于其他原因而提出该提议。我根据生成器函数提出了以下解决方案,但似乎有点复杂:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

有没有一种更清洁的方法来实现同一目标?假设列表未排序。

I have code which looks something like this:

thing_index = thing_list.index(thing)
otherfunction(thing_list, thing_index)

ok so that’s simplified but you get the idea. Now thing might not actually be in the list, in which case I want to pass -1 as thing_index. In other languages this is what you’d expect index() to return if it couldn’t find the element. In fact it throws a ValueError.

I could do this:

try:
    thing_index = thing_list.index(thing)
except ValueError:
    thing_index = -1
otherfunction(thing_list, thing_index)

But this feels dirty, plus I don’t know if ValueError could be raised for some other reason. I came up with the following solution based on generator functions, but it seems a little complex:

thing_index = ( [(i for i in xrange(len(thing_list)) if thing_list[i]==thing)] or [-1] )[0]

Is there a cleaner way to achieve the same thing? Let’s assume the list isn’t sorted.


回答 0

使用try-except子句没有任何“肮脏”。这是Python方式。ValueError只会由.index方法引发,因为这是您唯一的代码!

要回答的评论:
在Python,容易请求原谅比获得许可的理念已经非常成熟,并没有 index不会提高这种类型的错误的任何其他问题。并不是我能想到的。

There is nothing “dirty” about using try-except clause. This is the pythonic way. ValueError will be raised by the .index method only, because it’s the only code you have there!

To answer the comment:
In Python, easier to ask forgiveness than to get permission philosophy is well established, and no index will not raise this type of error for any other issues. Not that I can think of any.


回答 1

thing_index = thing_list.index(elem) if elem in thing_list else -1

一条线。简单。没有exceptions。

thing_index = thing_list.index(elem) if elem in thing_list else -1

One line. Simple. No exceptions.


回答 2

dict类型具有一个get函数,如果字典中不存在该键,则to的第二个参数get是它应返回的值。同样setdefaultdict存在,如果键存在,则返回值;否则,它将根据您的默认参数设置值,然后返回您的默认参数。

您可以扩展list类型以具有getindexdefault方法。

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

然后可以这样使用:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

The dict type has a get function, where if the key doesn’t exist in the dictionary, the 2nd argument to get is the value that it should return. Similarly there is setdefault, which returns the value in the dict if the key exists, otherwise it sets the value according to your default parameter and then returns your default parameter.

You could extend the list type to have a getindexdefault method.

class SuperDuperList(list):
    def getindexdefault(self, elem, default):
        try:
            thing_index = self.index(elem)
            return thing_index
        except ValueError:
            return default

Which could then be used like:

mylist = SuperDuperList([0,1,2])
index = mylist.getindexdefault( 'asdf', -1 )

回答 3

使用的代码没有错ValueError。如果您想避免exceptions,这里还有另外一种说法:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

There is nothing wrong with your code that uses ValueError. Here’s yet another one-liner if you’d like to avoid exceptions:

thing_index = next((i for i, x in enumerate(thing_list) if x == thing), -1)

回答 4

这个问题是语言哲学之一。例如,在Java中,一直存在这样一种传统,即异常仅应仅在发生错误的“exceptions情况”中使用,而不是用于流控制。最初,这是出于性能原因,因为Java异常的速度很慢,但是现在这已成为公认的样式。

相反,Python一直使用异常来指示正常的程序流,就像ValueError我们在这里讨论的那样抛出异常。Python风格对此没有任何“污秽”,并且还有更多的来源。一个更常见的例子是StopIteration异常,它是由迭代器的next()方法引发的,以表示没有其他值。

This issue is one of language philosophy. In Java for example there has always been a tradition that exceptions should really only be used in “exceptional circumstances” that is when errors have happened, rather than for flow control. In the beginning this was for performance reasons as Java exceptions were slow but now this has become the accepted style.

In contrast Python has always used exceptions to indicate normal program flow, like raising a ValueError as we are discussing here. There is nothing “dirty” about this in Python style and there are many more where that came from. An even more common example is StopIteration exception which is raised by an iterator‘s next() method to signal that there are no further values.


回答 5

如果您经常这样做,那么最好在辅助功能中将其烘烤掉:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

If you are doing this often then it is better to stove it away in a helper function:

def index_of(val, in_list):
    try:
        return in_list.index(val)
    except ValueError:
        return -1 

回答 6

那😃呢?

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

What about this 😃 :

li = [1,2,3,4,5] # create list 

li = dict(zip(li,range(len(li)))) # convert List To Dict 
print( li ) # {1: 0, 2: 1, 3: 2, 4:3 , 5: 4}
li.get(20) # None 
li.get(1)  # 0 

回答 7

那这个呢:

otherfunction(thing_collection, thing)

而不是在函数接口中公开一些依赖实现的东西(如列表索引),而是传递集合和东西,然后让其他函数处理“成员资格测试”问题。如果将其他函数编写为与集合类型无关,则可能以以下内容开头:

if thing in thing_collection:
    ... proceed with operation on thing

如果thing_collection是一个列表,元组,集合或字典,它将起作用。

这可能比以下更清楚:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

这是您在其他功能中已经拥有的代码。

What about this:

otherfunction(thing_collection, thing)

Rather than expose something so implementation-dependent like a list index in a function interface, pass the collection and the thing and let otherfunction deal with the “test for membership” issues. If otherfunction is written to be collection-type-agnostic, then it would probably start with:

if thing in thing_collection:
    ... proceed with operation on thing

which will work if thing_collection is a list, tuple, set, or dict.

This is possibly clearer than:

if thing_index != MAGIC_VALUE_INDICATING_NOT_A_MEMBER:

which is the code you already have in otherfunction.


回答 8

像这样:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

What about like this:

temp_inx = (L + [x]).index(x) 
inx = temp_inx if temp_inx < len(L) else -1

回答 9

列表中的“ .index()”方法存在相同的问题。我对它引发异常的事实没有任何疑问,但是我强烈不同意它是一个非描述性ValueError的事实。我可以理解是否会是IndexError。

我可以看到为什么返回“ -1”也是一个问题,因为它是Python中的有效索引。但实际上,我从不期望“ .index()”方法返回负数。

这是一行代码(好吧,这是一条相当长的线…),仅遍历列表一次,如果找不到该项目,则返回“无”。如果您愿意,将其重写为返回-1将是微不足道的。

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

如何使用:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

I have the same issue with the “.index()” method on lists. I have no issue with the fact that it throws an exception but I strongly disagree with the fact that it’s a non-descriptive ValueError. I could understand if it would’ve been an IndexError, though.

I can see why returning “-1” would be an issue too because it’s a valid index in Python. But realistically, I never expect a “.index()” method to return a negative number.

Here goes a one liner (ok, it’s a rather long line …), goes through the list exactly once and returns “None” if the item isn’t found. It would be trivial to rewrite it to return -1, should you so desire.

indexOf = lambda list, thing: \
            reduce(lambda acc, (idx, elem): \
                   idx if (acc is None) and elem == thing else acc, list, None)

How to use:

>>> indexOf([1,2,3], 4)
>>>
>>> indexOf([1,2,3], 1)
0
>>>

回答 10

我不知道为什么你应该认为它很脏…因为异常?如果您想要一个内衬,则为:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

但是我建议不要使用它。我认为Ross Rogers解决方案是最好的解决方案,请使用对象封装您的喜好行为,不要尝试以牺牲可读性为代价将语言推向极限。

I don’t know why you should think it is dirty… because of the exception? if you want a oneliner, here it is:

thing_index = thing_list.index(elem) if thing_list.count(elem) else -1

but i would advise against using it; I think Ross Rogers solution is the best, use an object to encapsulate your desiderd behaviour, don’t try pushing the language to its limits at the cost of readability.


回答 11

我建议:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

I’d suggest:

if thing in thing_list:
  list_index = -1
else:
  list_index = thing_list.index(thing)

从Python中的打开文件获取路径

问题:从Python中的打开文件获取路径

如果我有一个打开的文件,是否有os调用以字符串的形式获取完整路径?

f = open('/Users/Desktop/febROSTER2012.xls')

f,我将如何获得"/Users/Desktop/febROSTER2012.xls"

If I have an opened file, is there an os call to get the complete path as a string?

f = open('/Users/Desktop/febROSTER2012.xls')

From f, how would I get "/Users/Desktop/febROSTER2012.xls" ?


回答 0

此处的键是代表打开文件namef对象的属性。你得到这样的:

>>> f = open('/Users/Desktop/febROSTER2012.xls')
>>> f.name
'/Users/Desktop/febROSTER2012.xls'

有帮助吗?

The key here is the name attribute of the f object representing the opened file. You get it like that:

>>> f = open('/Users/Desktop/febROSTER2012.xls')
>>> f.name
'/Users/Desktop/febROSTER2012.xls'

Does it help?


回答 1

我有完全相同的问题。如果使用相对路径,则os.path.dirname(path)将仅返回相对路径。os.path.realpath可以解决这个问题:

>>> import os
>>> f = open('file.txt')
>>> os.path.realpath(f.name)

I had the exact same issue. If you are using a relative path os.path.dirname(path) will only return the relative path. os.path.realpath does the trick:

>>> import os
>>> f = open('file.txt')
>>> os.path.realpath(f.name)

回答 2

而且,如果您只想获取目录名称,而无需使用它附带的文件名,则可以使用osPython模块按照以下常规方式进行操作。

>>> import os
>>> f = open('/Users/Desktop/febROSTER2012.xls')
>>> os.path.dirname(f.name)
>>> '/Users/Desktop/'

这样,您就可以掌握目录结构。

And if you just want to get the directory name and no need for the filename coming with it, then you can do that in the following conventional way using os Python module.

>>> import os
>>> f = open('/Users/Desktop/febROSTER2012.xls')
>>> os.path.dirname(f.name)
>>> '/Users/Desktop/'

This way you can get hold of the directory structure.


回答 3

您也可以这样获得它。

filepath = os.path.abspath(f.name)

You can get it like this also.

filepath = os.path.abspath(f.name)

大熊猫可以使用列作为索引吗?

问题:大熊猫可以使用列作为索引吗?

我有一个像这样的电子表格:

Locality    2005    2006    2007    2008    2009

ABBOTSFORD  427000  448000  602500  600000  638500
ABERFELDIE  534000  600000  735000  710000  775000
AIREYS INLET459000  440000  430000  517500  512500

我不想手动将列与行交换。是否可以使用熊猫将数据读取到列表中,如下所示:

data['ABBOTSFORD']=[427000,448000,602500,600000,638500]
data['ABERFELDIE']=[534000,600000,735000,710000,775000]
data['AIREYS INLET']=[459000,440000,430000,517500,512500]

I have a spreadsheet like this:

Locality    2005    2006    2007    2008    2009

ABBOTSFORD  427000  448000  602500  600000  638500
ABERFELDIE  534000  600000  735000  710000  775000
AIREYS INLET459000  440000  430000  517500  512500

I don’t want to manually swap the column with the row. Could it be possible to use pandas reading data to a list as this:

data['ABBOTSFORD']=[427000,448000,602500,600000,638500]
data['ABERFELDIE']=[534000,600000,735000,710000,775000]
data['AIREYS INLET']=[459000,440000,430000,517500,512500]

回答 0

是的,使用set_index可以创建Locality行索引。

data.set_index('Locality', inplace=True)

如果inplace=True未提供,则set_index返回修改后的数据帧。

例:

> import pandas as pd
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                     ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> df
     Locality    2005    2006
0  ABBOTSFORD  427000  448000
1  ABERFELDIE  534000  600000

> df.set_index('Locality', inplace=True)
> df
              2005    2006
Locality                  
ABBOTSFORD  427000  448000
ABERFELDIE  534000  600000

> df.loc['ABBOTSFORD']
2005    427000
2006    448000
Name: ABBOTSFORD, dtype: int64

> df.loc['ABBOTSFORD'][2005]
427000

> df.loc['ABBOTSFORD'].values
array([427000, 448000])

> df.loc['ABBOTSFORD'].tolist()
[427000, 448000]

Yes, with set_index you can make Locality your row index.

data.set_index('Locality', inplace=True)

If inplace=True is not provided, set_index returns the modified dataframe as a result.

Example:

> import pandas as pd
> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                     ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> df
     Locality    2005    2006
0  ABBOTSFORD  427000  448000
1  ABERFELDIE  534000  600000

> df.set_index('Locality', inplace=True)
> df
              2005    2006
Locality                  
ABBOTSFORD  427000  448000
ABERFELDIE  534000  600000

> df.loc['ABBOTSFORD']
2005    427000
2006    448000
Name: ABBOTSFORD, dtype: int64

> df.loc['ABBOTSFORD'][2005]
427000

> df.loc['ABBOTSFORD'].values
array([427000, 448000])

> df.loc['ABBOTSFORD'].tolist()
[427000, 448000]

回答 1

您可以使用进行更改,如已经说明的那样set_index。您无需手动将行与列交换data.T,pandas中有一个transpose()方法可以为您完成此操作:

> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                    ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> newdf = df.set_index('Locality').T
> newdf

Locality    ABBOTSFORD  ABERFELDIE
2005        427000      534000
2006        448000      600000

然后您可以获取数据框列值并将其转换为列表:

> newdf['ABBOTSFORD'].values.tolist()

[427000, 448000]

You can change the index as explained already using set_index. You don’t need to manually swap rows with columns, there is a transpose (data.T) method in pandas that does it for you:

> df = pd.DataFrame([['ABBOTSFORD', 427000, 448000],
                    ['ABERFELDIE', 534000, 600000]],
                    columns=['Locality', 2005, 2006])

> newdf = df.set_index('Locality').T
> newdf

Locality    ABBOTSFORD  ABERFELDIE
2005        427000      534000
2006        448000      600000

then you can fetch the dataframe column values and transform them to a list:

> newdf['ABBOTSFORD'].values.tolist()

[427000, 448000]

回答 2

您可以在从Pandas中的电子表格读取数据时使用可用的index_col参数设置列索引。

这是我的解决方案:

  1. 首先,将熊猫作为pd导入: import pandas as pd

  2. 使用pd.read_excel()读入文件名(如果电子表格中有数据),并通过指定index_col参数将索引设置为“ Locality”。

    df = pd.read_excel('testexcel.xlsx', index_col=0)

    在此阶段,如果出现“没有名为xlrd的模块”错误,请使用进行安装pip install xlrd

  3. 为了进行视觉检查,请读取数据框,使用df.head()该数据框将打印以下输出

  4. 现在,您可以获取数据框所需列的值并进行打印

You can set the column index using index_col parameter available while reading from spreadsheet in Pandas.

Here is my solution:

  1. Firstly, import pandas as pd: import pandas as pd

  2. Read in filename using pd.read_excel() (if you have your data in a spreadsheet) and set the index to ‘Locality’ by specifying the index_col parameter.

    df = pd.read_excel('testexcel.xlsx', index_col=0)

    At this stage if you get a ‘no module named xlrd’ error, install it using pip install xlrd.

  3. For visual inspection, read the dataframe using df.head() which will print the following output

  4. Now you can fetch the values of the desired columns of the dataframe and print it


有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付