如何在python中将列表另存为numpy数组?

问题:如何在python中将列表另存为numpy数组?

我需要知道是否可以将python列表另存为numPy数组。

Is possible to construct a NumPy array from a python list?


回答 0

如果您在这里看,它可能会告诉您您需要了解的内容。

http://www.scipy.org/Tentative_NumPy_Tutorial#head-d3f8e5fe9b903f3c3b2a5c0dfceb60d71602cf93

基本上,您可以根据序列创建数组。

import numpy as np
a = np.array( [2,3,4] )

或来自序列序列。

import numpy as np
a = np.array( [[2,3,4], [3,4,5]] )

First of all, I’d recommend you to go through NumPy’s Quickstart tutorial, which will probably help with these basic questions.

You can directly create an array from a list as:

import numpy as np
a = np.array( [2,3,4] )

Or from a from a nested list in the same way:

import numpy as np
a = np.array( [[2,3,4], [3,4,5]] )

回答 1

你的意思是这样的吗?

from numpy  import array
a = array( your_list )

you mean something like this ?

from numpy  import array
a = array( your_list )

回答 2

是的:

a = numpy.array([1,2,3])

Yes it is:

a = numpy.array([1,2,3])

回答 3

您想将其另存为文件吗?

import numpy as np

myList = [1, 2, 3]

np.array(myList).dump(open('array.npy', 'wb'))

…然后阅读:

myArray = np.load(open('array.npy', 'rb'))

You want to save it as a file?

import numpy as np

myList = [1, 2, 3]

np.array(myList).dump(open('array.npy', 'wb'))

… and then read:

myArray = np.load(open('array.npy', 'rb'))

回答 4

您可以使用numpy.asarray,例如将列表转换为数组:

>>> a = [1, 2]
>>> np.asarray(a)
array([1, 2])

You can use numpy.asarray, for example to convert a list into an array:

>>> a = [1, 2]
>>> np.asarray(a)
array([1, 2])

回答 5

我想,您是说将列表转换为numpy数组?然后,

import numpy as np

# b is some list, then ...    
a = np.array(b).reshape(lengthDim0, lengthDim1);

以reshape给定的形状为您提供a作为列表b的数组。

I suppose, you mean converting a list into a numpy array? Then,

import numpy as np

# b is some list, then ...    
a = np.array(b).reshape(lengthDim0, lengthDim1);

gives you a as an array of list b in the shape given in reshape.


回答 6

这是一个更完整的示例:

import csv
import numpy as np

with open('filename','rb') as csvfile:
     cdl = list( csv.reader(csvfile,delimiter='\t'))
     print "Number of records = " + str(len(cdl))

#then later

npcdl = np.array(cdl)

希望这可以帮助!!

Here is a more complete example:

import csv
import numpy as np

with open('filename','rb') as csvfile:
     cdl = list( csv.reader(csvfile,delimiter='\t'))
     print "Number of records = " + str(len(cdl))

#then later

npcdl = np.array(cdl)

Hope this helps!!


回答 7

import numpy as np 

... ## other code

一些列表理解

t=[nodel[ nodenext[i][j] ] for j in idx]
            #for each link, find the node lables 
            #t is the list of node labels 

使用numpy库中指定的数组方法将列表转换为numpy数组。

t=np.array(t)

这可能会有所帮助:https : //numpy.org/devdocs/user/basics.creation.html

import numpy as np 

... ## other code

some list comprehension

t=[nodel[ nodenext[i][j] ] for j in idx]
            #for each link, find the node lables 
            #t is the list of node labels 

Convert the list to a numpy array using the array method specified in the numpy library.

t=np.array(t)

This may be helpful: https://numpy.org/devdocs/user/basics.creation.html


回答 8

也许:

import numpy as np
a=[[1,1],[2,2]]
b=np.asarray(a)
print(type(b))

输出:

<class 'numpy.ndarray'>

maybe:

import numpy as np
a=[[1,1],[2,2]]
b=np.asarray(a)
print(type(b))

output:

<class 'numpy.ndarray'>

使用ctrl + c停止python

问题:使用ctrl + c停止python

我有一个使用线程并发出大量HTTP请求的python脚本。我认为正在发生的事情是,在读取HTTP请求(使用urllib2)时,它正在阻塞并且没有响应CtrlC以停止程序。有没有办法解决?

I have a python script that uses threads and makes lots of HTTP requests. I think what’s happening is that while a HTTP request (using urllib2) is reading, it’s blocking and not responding to CtrlC to stop the program. Is there any way around this?


回答 0

在Windows上,唯一确定的方法是使用CtrlBreak。立即停止每个python脚本!

(请注意,在某些键盘上,“中断”被标记为“暂停”。)

On Windows, the only sure way is to use CtrlBreak. Stops every python script instantly!

(Note that on some keyboards, “Break” is labeled as “Pause”.)


回答 1

在python程序运行时按Ctrl+ c将导致python引发KeyboardInterrupt异常。发出大量HTTP请求的程序可能会有大量异常处理代码。如果- 块的except一部分没有指定应捕获的异常,它将捕获所有异常,包括刚刚引起的异常。正确编码的python程序将利用python异常层次结构,并且仅捕获从派生的异常。tryexceptKeyboardInterruptException

#This is the wrong way to do things
try:
  #Some stuff might raise an IO exception
except:
  #Code that ignores errors

#This is the right way to do things
try:
  #Some stuff might raise an IO exception
except Exception:
  #This won't catch KeyboardInterrupt

如果您无法更改代码(或需要终止程序以使更改生效),则可以尝试快速按Ctrl+ c。第一个KeyboardInterrupt异常会将您的程序从try块中删除,而希望KeyboardInterrupt在程序位于try块外时引发稍后的异常之一。

Pressing Ctrl + c while a python program is running will cause python to raise a KeyboardInterrupt exception. It’s likely that a program that makes lots of HTTP requests will have lots of exception handling code. If the except part of the tryexcept block doesn’t specify which exceptions it should catch, it will catch all exceptions including the KeyboardInterrupt that you just caused. A properly coded python program will make use of the python exception hierarchy and only catch exceptions that are derived from Exception.

#This is the wrong way to do things
try:
  #Some stuff might raise an IO exception
except:
  #Code that ignores errors

#This is the right way to do things
try:
  #Some stuff might raise an IO exception
except Exception:
  #This won't catch KeyboardInterrupt

If you can’t change the code (or need to kill the program so that your changes will take effect) then you can try pressing Ctrl + c rapidly. The first of the KeyboardInterrupt exceptions will knock your program out of the try block and hopefully one of the later KeyboardInterrupt exceptions will be raised when the program is outside of a try block.


回答 2

如果它在Python shell中运行,请使用Ctrl+ Z,否则找到python进程并终止它。

If it is running in the Python shell use Ctrl + Z, otherwise locate the python process and kill it.


回答 3

中断过程取决于硬件和操作系统。因此,根据运行python脚本的位置,您将具有截然不同的行为。例如,在Windows计算机上,我们有Ctrl+ CSIGINT)和Ctrl+ BreakSIGBREAK)。

因此,尽管SIGINT存在于所有系统上并且可以处理和捕获,但是SIGBREAK信号是Windows特定的(可以在CONFIG.SYS中禁用),并且实际上由BIOS作为中断向量INT 1Bh进行处理,这就是此键的原因比其他任何功能都强大。因此,如果您使用某些* nix风格的OS,则根据实现的不同,您将获得不同的结果,因为该信号不存在,而其他则存在。在Linux中,您可以通过以下方法检查可用的信号:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGEMT       8) SIGFPE       9) SIGKILL     10) SIGBUS
11) SIGSEGV     12) SIGSYS      13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGURG      17) SIGSTOP     18) SIGTSTP     19) SIGCONT     20) SIGCHLD
21) SIGTTIN     22) SIGTTOU     23) SIGIO       24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGPWR      30) SIGUSR1
31) SIGUSR2     32) SIGRTMAX

因此,如果要在Linux系统上捕获CTRL+BREAK 信号,则必须检查它们已映射了哪个POSIX信号。流行的映射有:

CTRL+\     = SIGQUIT 
CTRL+D     = SIGQUIT
CTRL+C     = SIGINT
CTRL+Z     = SIGTSTOP 
CTRL+BREAK = SIGKILL or SIGTERM or SIGSTOP

实际上,Linux下还有更多功能可用,其中SysRq(系统请求)键可以保留自己的生命

The interrupt process is hardware and OS dependent. So you will have very different behavior depending on where you run your python script. For example, on Windows machines we have Ctrl+C (SIGINT) and Ctrl+Break (SIGBREAK).

So while SIGINT is present on all systems and can be handled and caught, the SIGBREAK signal is Windows specific (and can be disabled in CONFIG.SYS) and is really handled by the BIOS as an interrupt vector INT 1Bh, which is why this key is much more powerful than any other. So if you’re using some *nix flavored OS, you will get different results depending on the implementation, since that signal is not present there, but others are. In Linux you can check what signals are available to you by:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGEMT       8) SIGFPE       9) SIGKILL     10) SIGBUS
11) SIGSEGV     12) SIGSYS      13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGURG      17) SIGSTOP     18) SIGTSTP     19) SIGCONT     20) SIGCHLD
21) SIGTTIN     22) SIGTTOU     23) SIGIO       24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGPWR      30) SIGUSR1
31) SIGUSR2     32) SIGRTMAX

So if you want to catch the CTRL+BREAK signal on a linux system you’ll have to check to what POSIX signal they have mapped that key. Popular mappings are:

CTRL+\     = SIGQUIT 
CTRL+D     = SIGQUIT
CTRL+C     = SIGINT
CTRL+Z     = SIGTSTOP 
CTRL+BREAK = SIGKILL or SIGTERM or SIGSTOP

In fact, many more functions are available under Linux, where the SysRq (System Request) key can take on a life of its own


回答 4

这篇文章很老,但是我最近遇到了同样的问题,即Ctrl+ C不终止Linux上的Python脚本。我用Ctrl+ \SIGQUIT)。

This post is old but I recently ran into the same problem of Ctrl+C not terminating Python scripts on Linux. I used Ctrl+\ (SIGQUIT).


回答 5

Ctrl+ DWindows和Linux的区别

事实证明,像Python 3.6,Python的口译手柄Ctrl+ C不同的Linux和Windows。对于Linux,Ctrl+ C会的工作主要预期但是在Windows Ctrl+ C 大多没有特别的工作,如果Python是运行阻塞调用,如thread.join或等待网页响应。它确实适用于time.sleep。这是Python解释器中发生的事情的很好的解释。注意Ctrl+ C生成SIGINT

解决方案1:使用Ctrl+ Break或等效项

在终端/控制台窗口中使用下面的键盘快捷键,这些快捷键将SIGBREAK在OS的较低级别生成并终止Python解释器。

Mac OS和Linux

Ctrl+ Shift+ \Ctrl+\

Windows

  • 一般:Ctrl+Break
  • 戴尔:Ctrl+ Fn+ F6Ctrl+ Fn+S
  • 联想:Ctrl+ Fn+ F11Ctrl+ Fn+B
  • HP:Ctrl+ Fn+Shift
  • 三星:Fn+Esc

解决方案2:使用Windows API

以下是一些方便的功能,这些功能将检测Windows并在控制台中为Ctrl+ 安装自定义处理程序C

#win_ctrl_c.py

import sys

def handler(a,b=None):
    sys.exit(1)
def install_handler():
    if sys.platform == "win32":
        import win32api
        win32api.SetConsoleCtrlHandler(handler, True)

您可以像上面这样使用:

import threading
import time
import win_ctrl_c

# do something that will block
def work():
    time.sleep(10000)        
t = threading.Thread(target=work)
t.daemon = True
t.start()

#install handler
install_handler()

# now block
t.join()

#Ctrl+C works now!

解决方案3:轮询方法

我不喜欢或不推荐这种方法,因为它不必要地消耗了处理器并降低了对性能的影响。

导入线程导入时间

def work():
    time.sleep(10000)        
t = threading.Thread(target=work)
t.daemon = True
t.start()
while(True):
    t.join(0.1) #100ms ~ typical human response
# you will get KeyboardIntrupt exception

Ctrl+D Difference for Windows and Linux

It turns out that as of Python 3.6, the Python interpreter handles Ctrl+C differently for Linux and Windows. For Linux, Ctrl+C would work mostly as expected however on Windows Ctrl+C mostly doesn’t work especially if Python is running blocking call such as thread.join or waiting on web response. It does work for time.sleep, however. Here’s the nice explanation of what is going on in Python interpreter. Note that Ctrl+C generates SIGINT.

Solution 1: Use Ctrl+Break or Equivalent

Use below keyboard shortcuts in terminal/console window which will generate SIGBREAK at lower level in OS and terminate the Python interpreter.

Mac OS and Linux

Ctrl+Shift+\ or Ctrl+\

Windows:

  • General: Ctrl+Break
  • Dell: Ctrl+Fn+F6 or Ctrl+Fn+S
  • Lenovo: Ctrl+Fn+F11 or Ctrl+Fn+B
  • HP: Ctrl+Fn+Shift
  • Samsung: Fn+Esc

Solution 2: Use Windows API

Below are handy functions which will detect Windows and install custom handler for Ctrl+C in console:

#win_ctrl_c.py

import sys

def handler(a,b=None):
    sys.exit(1)
def install_handler():
    if sys.platform == "win32":
        import win32api
        win32api.SetConsoleCtrlHandler(handler, True)

You can use above like this:

import threading
import time
import win_ctrl_c

# do something that will block
def work():
    time.sleep(10000)        
t = threading.Thread(target=work)
t.daemon = True
t.start()

#install handler
install_handler()

# now block
t.join()

#Ctrl+C works now!

Solution 3: Polling method

I don’t prefer or recommend this method because it unnecessarily consumes processor and power negatively impacting the performance.

import threading import time

def work():
    time.sleep(10000)        
t = threading.Thread(target=work)
t.daemon = True
t.start()
while(True):
    t.join(0.1) #100ms ~ typical human response
# you will get KeyboardIntrupt exception

回答 6

在Mac上,按Ctrl+ \退出连接到终端的python进程。

On Mac press Ctrl+\ to quit a python process attached to a terminal.


回答 7

在Mac上/在Terminal中:

  1. 显示检查器(在终端窗口中或在Shell>显示检查器上单击鼠标右键)
  2. 点击“运行进程”上方的设置图标
  3. 从“信号处理组”下的选项列表中选择(杀死,终止,中断等)。

On a mac / in Terminal:

  1. Show Inspector (right click within the terminal window or Shell >Show Inspector)
  2. click the Settings icon above “running processes”
  3. choose from the list of options under “Signal Process Group” (Kill, terminate, interrupt, etc).

回答 8

  • 使用Alt + F4强制关闭程序(关闭当前程序)
  • 向CMD的X按钮发送垃圾邮件,例如
  • Taskmanager(首先是Windows + R,然后是“ taskmgr”),然后结束任务。

这些可能会有所帮助。

  • Forcing the program to close using Alt+F4 (shuts down current program)
  • Spamming the X button on CMD for e.x.
  • Taskmanager (first Windows+R and then “taskmgr”) and then end the task.

Those may help.


回答 9

您可以打开任务管理器(Ctrl + Alt + Delete,然后转到任务管理器)并在其中查找python,然后调用服务器(例如)_go_app(命名约定为_language_app)。

如果结束_go_app任务,它将结束服务器,因此在浏览器中显示该消息“意外结束”,我也使用git bash,当我启动服务器时,无法在bash的外壳中脱离该服务器使用ctrl + c或ctrl +暂停,但是一旦您结束python任务(使用63.7 mb的任务),它将脱离bash中的服务器脚本,并允许我使用git bash shell。

You can open your task manager (ctrl + alt + delete, then go to task manager) and look through it for python and the server is called (for the example) _go_app (naming convention is: _language_app).

If I end the _go_app task it’ll end the server, so going there in the browser will say it “unexpectedly ended”, I also use git bash, and when I start a server, I cannot break out of the server in bash’s shell with ctrl + c or ctrl + pause, but once you end the python task (the one using 63.7 mb) it’ll break out of the server script in bash, and allow me to use the git bash shell.


回答 10

根据记录,在我的Raspberry 3B +(运行raspbian)上杀死进程的原因是Ctrl+ '。在我的法语AZERTY键盘上,触摸'也是4。

For the record, what killed the process on my Raspberry 3B+ (running raspbian) was Ctrl+'. On my French AZERTY keyboard, the touch ' is also number 4.


将Django表单字段更改为隐藏字段

问题:将Django表单字段更改为隐藏字段

我有一个带的Django表单,与RegexField正常的文本输入字段非常相似。

我认为,在某些情况下,我想对用户隐藏它,并尝试使表单尽可能相似。将这个领域变成一个HiddenInput领域的最好方法是什么?

我知道我可以使用以下方法在字段上设置属性:

form['fieldname'].field.widget.attr['readonly'] = 'readonly'

我可以通过以下方式设置所需的初始值:

form.initial['fieldname'] = 'mydesiredvalue'

但是,这不会更改小部件的形式。

什么是使此字段成为<input type="hidden">字段的最佳/最“ django-y” /最不“ hacky”的方法?

I have a Django form with a RegexField, which is very similar to a normal text input field.

In my view, under certain conditions I want to hide it from the user, and trying to keep the form as similar as possible. What’s the best way to turn this field into a HiddenInput field?

I know I can set attributes on the field with:

form['fieldname'].field.widget.attr['readonly'] = 'readonly'

And I can set the desired initial value with:

form.initial['fieldname'] = 'mydesiredvalue'

However, that won’t change the form of the widget.

What’s the best / most “django-y” / least “hacky” way to make this field a <input type="hidden"> field?


回答 0

如果您具有自定义模板并查看,则可以排除该字段并用于{{ modelform.instance.field }}获取值。

您也可以在视图中使用:

form.fields['field_name'].widget = forms.HiddenInput()

但我不确定它是否会保护发布后的保存方法。

希望能帮助到你。

If you have a custom template and view you may exclude the field and use {{ modelform.instance.field }} to get the value.

also you may prefer to use in the view:

form.fields['field_name'].widget = forms.HiddenInput()

but I’m not sure it will protect save method on post.

Hope it helps.


回答 1

这也可能有用: {{ form.field.as_hidden }}

This may also be useful: {{ form.field.as_hidden }}


回答 2

一个对我有用的选项,将字段的原始形式定义为:

forms.CharField(widget = forms.HiddenInput(), required = False)

那么当您在新的类中覆盖它时,它将保留它的位置。

an option that worked for me, define the field in the original form as:

forms.CharField(widget = forms.HiddenInput(), required = False)

then when you override it in the new Class it will keep it’s place.


回答 3

首先,如果您不希望用户修改数据,则仅排除该字段似乎更干净。将其包含为隐藏字段只会增加更多数据以通过有线方式发送,并在您不希望恶意用户修改时邀请它们进行修改。如果确实有理由包括该字段但将其隐藏,则可以将关键字arg传递给modelform的构造函数。可能是这样的:

class MyModelForm(forms.ModelForm):
    class Meta:
        model = MyModel
    def __init__(self, *args, **kwargs):
        from django.forms.widgets import HiddenInput
        hide_condition = kwargs.pop('hide_condition',None)
        super(MyModelForm, self).__init__(*args, **kwargs)
        if hide_condition:
            self.fields['fieldname'].widget = HiddenInput()
            # or alternately:  del self.fields['fieldname']  to remove it from the form altogether.

然后在您看来:

form = MyModelForm(hide_condition=True)

在视图中,我更喜欢这种方法来修改模型形式的内部,但这只是一个问题。

Firstly, if you don’t want the user to modify the data, then it seems cleaner to simply exclude the field. Including it as a hidden field just adds more data to send over the wire and invites a malicious user to modify it when you don’t want them to. If you do have a good reason to include the field but hide it, you can pass a keyword arg to the modelform’s constructor. Something like this perhaps:

class MyModelForm(forms.ModelForm):
    class Meta:
        model = MyModel
    def __init__(self, *args, **kwargs):
        from django.forms.widgets import HiddenInput
        hide_condition = kwargs.pop('hide_condition',None)
        super(MyModelForm, self).__init__(*args, **kwargs)
        if hide_condition:
            self.fields['fieldname'].widget = HiddenInput()
            # or alternately:  del self.fields['fieldname']  to remove it from the form altogether.

Then in your view:

form = MyModelForm(hide_condition=True)

I prefer this approach to modifying the modelform’s internals in the view, but it’s a matter of taste.


回答 4

对于正常形式,您可以

class MyModelForm(forms.ModelForm):
    slug = forms.CharField(widget=forms.HiddenInput())

如果您有模型表格,则可以执行以下操作

class MyModelForm(forms.ModelForm):
    class Meta:
        model = TagStatus
        fields = ('slug', 'ext')
        widgets = {'slug': forms.HiddenInput()}

您也可以覆盖__init__方法

class Myform(forms.Form):
    def __init__(self, *args, **kwargs):
        super(Myform, self).__init__(*args, **kwargs)
        self.fields['slug'].widget = forms.HiddenInput()

For normal form you can do

class MyModelForm(forms.ModelForm):
    slug = forms.CharField(widget=forms.HiddenInput())

If you have model form you can do the following

class MyModelForm(forms.ModelForm):
    class Meta:
        model = TagStatus
        fields = ('slug', 'ext')
        widgets = {'slug': forms.HiddenInput()}

You can also override __init__ method

class Myform(forms.Form):
    def __init__(self, *args, **kwargs):
        super(Myform, self).__init__(*args, **kwargs)
        self.fields['slug'].widget = forms.HiddenInput()

回答 5

如果要始终隐藏该字段,请使用以下命令:

class MyForm(forms.Form):
    hidden_input = forms.CharField(widget=forms.HiddenInput(), initial="value")

If you want the field to always be hidden, use the following:

class MyForm(forms.Form):
    hidden_input = forms.CharField(widget=forms.HiddenInput(), initial="value")

回答 6

您可以只使用css:

#id_fieldname, label[for="id_fieldname"] {
  position: absolute;
  display: none
}

这将使该字段及其标签不可见。

You can just use css :

#id_fieldname, label[for="id_fieldname"] {
  position: absolute;
  display: none
}

This will make the field and its label invisible.


如何在Python中列出所有已安装的软件包及其版本?

问题:如何在Python中列出所有已安装的软件包及其版本?

Python中是否有办法列出所有已安装的软件包及其版本?

我知道我可以进入python/Lib/site-packages并查看存在哪些文件和目录,但是我觉得这很尴尬。我正在寻找的东西类似于npm listNPM-LS

Is there a way in Python to list all installed packages and their versions?

I know I can go inside python/Lib/site-packages and see what files and directories exist, but I find this very awkward. What I’m looking for something that is similar to npm list i.e. npm-ls.


回答 0

如果您已经进行了pip安装,并且想查看安装程序工具已安装了哪些软件包,则可以简单地调用以下命令:

pip freeze

它还将包含已安装软件包的版本号。

更新资料

pip已更新,可以产生与pip freeze调用相同的输出:

pip list

注意

的输出pip list格式不同,因此,如果您有一些shell脚本来解析(可能是获取版本号)的输出,freeze并且想要将脚本更改为call list,则需要更改解析代码。

If you have pip install and you want to see what packages have been installed with your installer tools you can simply call this:

pip freeze

It will also include version numbers for the installed packages.

Update

pip has been updated to also produce the same output as pip freeze by calling:

pip list

Note

The output from pip list is formatted differently, so if you have some shell script that parses the output (maybe to grab the version number) of freeze and want to change your script to call list, you’ll need to change your parsing code.


回答 1

help('modules') 应该为你做。

在IPython中:

In [1]: import                      #import press-TAB
Display all 631 possibilities? (y or n)
ANSI                   audiodev               markupbase
AptUrl                 audioop                markupsafe
ArgImagePlugin         avahi                  marshal
BaseHTTPServer         axi                    math
Bastion                base64                 md5
BdfFontFile            bdb                    mhlib
BmpImagePlugin         binascii               mimetools
BufrStubImagePlugin    binhex                 mimetypes
CDDB                   bisect                 mimify
CDROM                  bonobo                 mmap
CGIHTTPServer          brlapi                 mmkeys
Canvas                 bsddb                  modulefinder
CommandNotFound        butterfly              multifile
ConfigParser           bz2                    multiprocessing
ContainerIO            cPickle                musicbrainz2
Cookie                 cProfile               mutagen
Crypto                 cStringIO              mutex
CurImagePlugin         cairo                  mx
DLFCN                  calendar               netrc
DcxImagePlugin         cdrom                  new
Dialog                 cgi                    nis
DiscID                 cgitb                  nntplib
DistUpgrade            checkbox               ntpath

help('modules') should do it for you.

in IPython :

In [1]: import                      #import press-TAB
Display all 631 possibilities? (y or n)
ANSI                   audiodev               markupbase
AptUrl                 audioop                markupsafe
ArgImagePlugin         avahi                  marshal
BaseHTTPServer         axi                    math
Bastion                base64                 md5
BdfFontFile            bdb                    mhlib
BmpImagePlugin         binascii               mimetools
BufrStubImagePlugin    binhex                 mimetypes
CDDB                   bisect                 mimify
CDROM                  bonobo                 mmap
CGIHTTPServer          brlapi                 mmkeys
Canvas                 bsddb                  modulefinder
CommandNotFound        butterfly              multifile
ConfigParser           bz2                    multiprocessing
ContainerIO            cPickle                musicbrainz2
Cookie                 cProfile               mutagen
Crypto                 cStringIO              mutex
CurImagePlugin         cairo                  mx
DLFCN                  calendar               netrc
DcxImagePlugin         cdrom                  new
Dialog                 cgi                    nis
DiscID                 cgitb                  nntplib
DistUpgrade            checkbox               ntpath

回答 2

如果要获取有关已安装的python发行版的信息,并且不想使用其cmd控制台或终端,而希望通过python代码,则可以使用以下代码(经过python 3.4测试):

import pip #needed to use the pip functions
for i in pip.get_installed_distributions(local_only=True):
    print(i)

pip.get_installed_distributions(local_only=True)函数调用返回一个可迭代的对象,由于使用了for循环和打印功能,该可迭代对象中包含的元素被换行符(\n)分开打印。结果(取决于您安装的发行版)将如下所示:

cycler 0.9.0
decorator 4.0.4
ipykernel 4.1.0
ipython 4.0.0
ipython-genutils 0.1.0
ipywidgets 4.0.3
Jinja2 2.8
jsonschema 2.5.1
jupyter 1.0.0
jupyter-client 4.1.1
#... and so on...

If you want to get information about your installed python distributions and don’t want to use your cmd console or terminal for it, but rather through python code, you can use the following code (tested with python 3.4):

import pip #needed to use the pip functions
for i in pip.get_installed_distributions(local_only=True):
    print(i)

The pip.get_installed_distributions(local_only=True) function-call returns an iterable and because of the for-loop and the print function the elements contained in the iterable are printed out separated by new line characters (\n). The result will (depending on your installed distributions) look something like this:

cycler 0.9.0
decorator 4.0.4
ipykernel 4.1.0
ipython 4.0.0
ipython-genutils 0.1.0
ipywidgets 4.0.3
Jinja2 2.8
jsonschema 2.5.1
jupyter 1.0.0
jupyter-client 4.1.1
#... and so on...

回答 3

可以尝试:蛋黄

对于安装蛋黄,请尝试:

easy_install yolk

Yolk是一个Python工具,用于获取有关已安装的Python软件包的信息并查询可在PyPI(Python软件包索引)上使用的软件包。

您可以通过查询PyPI查看哪些软件包处于活动状态,非活动状态或处于开发模式,并向您显示哪些软件包可用。

You can try : Yolk

For install yolk, try:

easy_install yolk

Yolk is a Python tool for obtaining information about installed Python packages and querying packages avilable on PyPI (Python Package Index).

You can see which packages are active, non-active or in development mode and show you which have newer versions available by querying PyPI.


回答 4

要在更高版本的pip(在上测试)上运行此命令,请pip==10.0.1使用以下命令:

from pip._internal.operations.freeze import freeze
for requirement in freeze(local_only=True):
    print(requirement)

To run this in later versions of pip (tested on pip==10.0.1) use the following:

from pip._internal.operations.freeze import freeze
for requirement in freeze(local_only=True):
    print(requirement)

回答 5

从命令行

python -c help('modules')

可用于查看所有模块以及特定模块

python -c help('os')

对于Linux,以下版本适用

python -c "help('os')"

from command line

python -c help('modules')

can be used to view all modules, and for specific modules

python -c help('os')

For Linux below will work

python -c "help('os')"

回答 6

是!您应该将pip用作python包管理器(http://pypi.python.org/pypi/pip

使用pip安装的软件包,您可以

pip freeze

它将列出所有已安装的软件包。您可能还应该使用virtualenvvirtualenvwrapper。当您开始一个新项目时,您可以

mkvirtualenv my_new_project

然后(在virtualenv内)

pip install all_your_stuff

这样,您可以workon my_new_project然后pip freeze查看为该virtualenv / project安装了哪些软件包。

例如:

  ~  mkvirtualenv yo_dude
New python executable in yo_dude/bin/python
Installing setuptools............done.
Installing pip...............done.
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/predeactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/postdeactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/preactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/postactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/get_env_details

(yo_dude)➜  ~  pip install django
Downloading/unpacking django
  Downloading Django-1.4.1.tar.gz (7.7Mb): 7.7Mb downloaded
  Running setup.py egg_info for package django

Installing collected packages: django
  Running setup.py install for django
    changing mode of build/scripts-2.7/django-admin.py from 644 to 755

    changing mode of /Users/aaylward/dev/virtualenvs/yo_dude/bin/django-admin.py to 755
Successfully installed django
Cleaning up...

(yo_dude)➜  ~  pip freeze
Django==1.4.1
wsgiref==0.1.2

(yo_dude)➜  ~  

或者,如果您有一个带有requirements.pip文件的python软件包,

mkvirtualenv my_awesome_project
pip install -r requirements.pip
pip freeze

会成功的

yes! you should be using pip as your python package manager ( http://pypi.python.org/pypi/pip )

with pip installed packages, you can do a

pip freeze

and it will list all installed packages. You should probably also be using virtualenv and virtualenvwrapper. When you start a new project, you can do

mkvirtualenv my_new_project

and then (inside that virtualenv), do

pip install all_your_stuff

This way, you can workon my_new_project and then pip freeze to see which packages are installed for that virtualenv/project.

for example:

➜  ~  mkvirtualenv yo_dude
New python executable in yo_dude/bin/python
Installing setuptools............done.
Installing pip...............done.
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/predeactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/postdeactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/preactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/postactivate
virtualenvwrapper.user_scripts creating /Users/aaylward/dev/virtualenvs/yo_dude/bin/get_env_details

(yo_dude)➜  ~  pip install django
Downloading/unpacking django
  Downloading Django-1.4.1.tar.gz (7.7Mb): 7.7Mb downloaded
  Running setup.py egg_info for package django

Installing collected packages: django
  Running setup.py install for django
    changing mode of build/scripts-2.7/django-admin.py from 644 to 755

    changing mode of /Users/aaylward/dev/virtualenvs/yo_dude/bin/django-admin.py to 755
Successfully installed django
Cleaning up...

(yo_dude)➜  ~  pip freeze
Django==1.4.1
wsgiref==0.1.2

(yo_dude)➜  ~  

or if you have a python package with a requirements.pip file,

mkvirtualenv my_awesome_project
pip install -r requirements.pip
pip freeze

will do the trick


回答 7

我的看法:

#!/usr/bin/env python3

import pkg_resources

dists = [str(d).replace(" ","==") for d in pkg_resources.working_set]
for i in dists:
    print(i)

My take:

#!/usr/bin/env python3

import pkg_resources

dists = [str(d).replace(" ","==") for d in pkg_resources.working_set]
for i in dists:
    print(i)

回答 8

这是一种使用方法来PYTHONPATH代替python libs dir的绝对路径的方法:

for d in `echo "${PYTHONPATH}" | tr ':' '\n'`; do ls "${d}"; done

[ 10:43 Jonathan@MacBookPro-2 ~/xCode/Projects/Python for iOS/trunk/Python for iOS/Python for iOS ]$ for d in `echo "$PYTHONPATH" | tr ':' '\n'`; do ls "${d}"; done
libpython2.7.dylib pkgconfig          python2.7
BaseHTTPServer.py      _pyio.pyc              cgitb.pyo              doctest.pyo            htmlentitydefs.pyc     mimetools.pyc          plat-mac               runpy.py               stringold.pyc          traceback.pyo
BaseHTTPServer.pyc     _pyio.pyo              chunk.py               dumbdbm.py             htmlentitydefs.pyo     mimetools.pyo          platform.py            runpy.pyc              stringold.pyo          tty.py
BaseHTTPServer.pyo     _strptime.py           chunk.pyc              dumbdbm.pyc            htmllib.py             mimetypes.py           platform.pyc           runpy.pyo              stringprep.py          tty.pyc
Bastion.py             _strptime.pyc          chunk.pyo              dumbdbm.pyo            htmllib.pyc            mimetypes.pyc          platform.pyo           sched.py               stringprep.pyc         tty.pyo
Bastion.pyc            _strptime.pyo          cmd.py
....

Here’s a way to do it using PYTHONPATH instead of the absolute path of your python libs dir:

for d in `echo "${PYTHONPATH}" | tr ':' '\n'`; do ls "${d}"; done

[ 10:43 Jonathan@MacBookPro-2 ~/xCode/Projects/Python for iOS/trunk/Python for iOS/Python for iOS ]$ for d in `echo "$PYTHONPATH" | tr ':' '\n'`; do ls "${d}"; done
libpython2.7.dylib pkgconfig          python2.7
BaseHTTPServer.py      _pyio.pyc              cgitb.pyo              doctest.pyo            htmlentitydefs.pyc     mimetools.pyc          plat-mac               runpy.py               stringold.pyc          traceback.pyo
BaseHTTPServer.pyc     _pyio.pyo              chunk.py               dumbdbm.py             htmlentitydefs.pyo     mimetools.pyo          platform.py            runpy.pyc              stringold.pyo          tty.py
BaseHTTPServer.pyo     _strptime.py           chunk.pyc              dumbdbm.pyc            htmllib.py             mimetypes.py           platform.pyc           runpy.pyo              stringprep.py          tty.pyc
Bastion.py             _strptime.pyc          chunk.pyo              dumbdbm.pyo            htmllib.pyc            mimetypes.pyc          platform.pyo           sched.py               stringprep.pyc         tty.pyo
Bastion.pyc            _strptime.pyo          cmd.py
....

回答 9

如果您使用的是Python:

conda list

会做的!参见:https : //conda.io/docs/_downloads/conda-cheatsheet.pdf

If you’re using anaconda:

conda list

will do it! See: https://conda.io/docs/_downloads/conda-cheatsheet.pdf


回答 10

如果需要从python内部运行,则可以调用子进程

from subprocess import PIPE, Popen

pip_process = Popen(["pip freeze"], stdout=PIPE,
                   stderr=PIPE, shell=True)
stdout, stderr = pip_process.communicate()
print(stdout.decode("utf-8"))

If this is needed to run from within python you can just invoke subprocess

from subprocess import PIPE, Popen

pip_process = Popen(["pip freeze"], stdout=PIPE,
                   stderr=PIPE, shell=True)
stdout, stderr = pip_process.communicate()
print(stdout.decode("utf-8"))

Python:在__init__中引发异常是否不好?

问题:Python:在__init__中引发异常是否不好?

在其中引发异常是否被认为是不好的形式__init__?如果是这样,那么当某些类变量初始化为None错误类型或类型错误时,可以接受的引发错误的方法是什么?

Is it considered bad form to raise exceptions within __init__? If so, then what is the accepted method of throwing an error when certain class variables are initialized as None or of an incorrect type?


回答 0

在内部引发异常__init__()是绝对可以的。在构造函数中没有其他好的方法来指示错误情况,并且标准库中有数百个示例,在这些示例中构建对象会引发异常。

当然,要提出的错误类别由您决定。ValueError如果向构造函数传递了无效的参数,则最好。

Raising exceptions within __init__() is absolutely fine. There’s no other good way to indicate an error condition within a constructor, and there are many hundreds of examples in the standard library where building an object can raise an exception.

The error class to raise, of course, is up to you. ValueError is best if the constructor was passed an invalid parameter.


回答 1

确实,在构造函数中指示错误的唯一正确方法是引发异常。这就是为什么在C ++和其他考虑到异常安全性设计的面向对象的语言中,如果在对象的构造函数中抛出异常(表示对象的初始化不完整),则不会调用析构函数。在脚本语言(例如Python)中通常不是这种情况。例如,如果socket.connect()失败,以下代码将引发AttributeError:

class NetworkInterface:
    def __init__(self, address)
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.socket.connect(address)
        self.stream = self.socket.makefile()

    def __del__(self)
        self.stream.close()
        self.socket.close()

原因是在连接尝试失败之后,流属性初始化之前,调用了不完整对象的析构函数。您不应该避免从构造函数中引发异常,我只是说很难在Python中编写完全安全的异常代码。一些Python开发人员完全避免使用析构函数,但这是另一个参数的问题。

It’s true that the only proper way to indicate an error in a constructor is raising an exception. That is why in C++ and in other object-oriented languages that have been designed with exception safety in mind, the destructor is not called if an exception is thrown in the constructor of an object (meaning that the initialization of the object is incomplete). This is often not the case in scripting languages, such as Python. For example, the following code throws an AttributeError if socket.connect() fails:

class NetworkInterface:
    def __init__(self, address)
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.socket.connect(address)
        self.stream = self.socket.makefile()

    def __del__(self)
        self.stream.close()
        self.socket.close()

The reason is that the destructor of the incomplete object is called after the connection attempt has failed, before the stream attribute has been initialized. You shouldn’t avoid throwing exceptions from constructors, I’m just saying that it’s difficult to write fully exception safe code in Python. Some Python developers avoid using destructors altogether, but that’s a matter of another debate.


回答 2

我看不出任何形式的错误。

相反,与返回错误代码相反,已知异常处理得很好的原因之一是,构造函数通常无法返回错误代码。因此,至少在像C ++这样的语言中,引发异常是发出错误的唯一途径。

I don’t see any reason that it should be bad form.

On the contrary, one of the things exceptions are known for doing well, as opposed to returning error codes, is that error codes usually can’t be returned by constructors. So at least in languages like C++, raising exceptions is the only way to signal errors.


回答 3

标准库说:

>>> f = file("notexisting.txt")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'notexisting.txt'

我也没有真正看到任何理由将其视为错误的形式。

The standard library says:

>>> f = file("notexisting.txt")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'notexisting.txt'

Also I don’t really see any reason why it should be considered bad form.


回答 4

我应该认为这是内置ValueError异常的完美案例。

I should think it is the perfect case for the built-in ValueError exception.


回答 5

我同意以上所有观点。

除了引发异常外,实际上没有其他方法可以表明对象的初始化出错。

在大多数程序类中,类的状态完全取决于该类的输入,我们可能期望引发某种ValueError或TypeError。

如果(例如)网络设备不可用或无法写入画布对象,则具有副作用的类(例如,进行网络或图形处理的类)可能会在初始化中引发错误。这对我来说听起来很合理,因为您通常希望尽快了解故障情况。

I concur with all of the above.

There’s really no other way to signal that something went wrong in the initialisation of an object other than raising an exception.

In most programs classes where the state of a class is wholly dependant on the inputs to that class we might expect some kind of ValueError or TypeError to be raised.

Classes with side-effects (e.g. one which does networking or graphics) might raise an error in init if (for example) the network device is unavailable or the canvas object cannot be written to. This sounds sensible to me because often you want to know about failure conditions as soon as possible.


回答 6

在某些情况下,不可避免地要从init引发错误,但是init太多的工作是不好的风格。您应该考虑建立工厂或伪工厂-一种简单的类方法,该方法返回设置的对象。

Raising errors from init is unavoidable in some cases, but doing too much work in init is a bad style. You should consider making a factory or a pseudo-factory – a simple classmethod that returns setted up object.


在Python中释放内存

问题:在Python中释放内存

在以下示例中,我有一些有关内存使用的相关问题。

  1. 如果我在解释器中运行,

    foo = ['bar' for _ in xrange(10000000)]

    我的机器上使用的实际内存最高为80.9mb。那我

    del foo

    实际内存下降,但仅限于30.4mb。解释器使用4.4mb基线,因此不26mb向OS 释放内存有什么好处?是因为Python在“提前计划”,以为您可能会再次使用那么多的内存?

  2. 它为什么50.5mb特别释放- 释放的量基于什么?

  3. 有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

注意 此问题不同于我如何在Python中显式释放内存? 因为这个问题主要解决了内存使用量相对于基线的增加,即使解释器通过垃圾回收(使用gc.collect或不使用)释放了对象之后。

I have a few related questions regarding memory usage in the following example.

  1. If I run in the interpreter,

    foo = ['bar' for _ in xrange(10000000)]
    

    the real memory used on my machine goes up to 80.9mb. I then,

    del foo
    

    real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is “planning ahead”, thinking that you may use that much memory again?

  2. Why does it release 50.5mb in particular – what is the amount that is released based on?

  3. Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

NOTE This question is different from How can I explicitly free memory in Python? because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).


回答 0

堆上分配的内存可能会出现高水位标记。Python PyObject_Malloc在4个KiB池中分配小对象()的内部优化使情况复杂化,分类为8字节倍数的分配大小-最多256字节(3.3中为512字节)。池本身位于256 KiB竞技场中,因此,如果仅在一个池中使用一个块,则不会释放整个256 KiB竞技场。在Python 3.3中,小型对象分配器已切换为使用匿名内存映射而不是堆,因此它在释放内存方面应表现更好。

此外,内置类型维护可能使用或不使用小型对象分配器的先前分配对象的空闲列表。该int类型维护一个具有自己分配的内存的空闲列表,要清除它,需要调用PyInt_ClearFreeList()。可以通过做一个full来间接地调用它gc.collect

这样尝试,然后告诉我您得到了什么。这是psutil.Process.memory_info的链接。

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

输出:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

编辑:

我改用相对于进程VM大小的度量来消除系统中其他进程的影响。

当顶部的连续可用空间达到恒定,动态或可配置的阈值时,C运行时(例如glibc,msvcrt)会缩小堆。使用glibc,您可以使用mallopt(M_TRIM_THRESHOLD)进行调整。鉴于此,如果堆的收缩量比您的块收缩的量更大甚至更多,也就不足为奇了free

在3.x版本range中不会创建列表,因此上面的测试不会创建1000万个int对象。即使这样做,int3.x中的类型也基本上是2.x long,它没有实现自由列表。

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python’s internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes — up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.

Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.

Try it like this, and tell me what you get. Here’s the link for psutil.Process.memory_info.

import os
import gc
import psutil

proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.get_memory_info().rss

# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.get_memory_info().rss

# unreference, including x == 9999999
del foo, x
mem2 = proc.get_memory_info().rss

# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.get_memory_info().rss

pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)

Output:

Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%

Edit:

I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.

The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn’t surprising if the heap shrinks by more — even a lot more — than the block that you free.

In 3.x range doesn’t create a list, so the test above won’t create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn’t implement a freelist.


回答 1

我猜您在这里真正关心的问题是:

有没有一种方法可以强制Python释放所有已使用的内存(如果您知道不会再使用那么多的内存)?

不,那里没有。但是有一个简单的解决方法:子进程。

如果您需要5分钟的500MB临时存储空间,但是之后又需要运行2个小时,并且不会再碰到那么多的内存,请生成一个子进程来进行占用大量内存的工作。当子进程消失时,内存将被释放。

这不是完全琐碎和免费的,但是它很容易且便宜,通常足以使交易值得。

首先,最简单的创建子进程的方法是使用concurrent.futures或(对于3.1及更早版本,futures在PyPI上进行反向移植):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

如果需要更多控制,请使用该multiprocessing模块。

费用是:

  • 在某些平台上,尤其是Windows,进程启动有点慢。我们在这里以毫秒为单位,而不是分钟,如果您要让一个孩子做300秒的工作,您甚至不会注意到。但这不是免费的。
  • 如果使用大量的临时存储的还真是,这样做可能会导致换出你的主程序。当然,从长远来看,您可以节省时间,因为如果该内存永远存在,那将导致在某些时候进行交换。但是,在某些情况下,这可能会将逐渐的缓慢转变为非常明显的一次(和早期)延迟。
  • 在进程之间发送大量数据可能很慢。同样,如果您正在谈论发送超过2K的参数并返回64K的结果,您甚至不会注意到它,但是如果您发送和接收大量数据,则需要使用其他某种机制(文件,mmapPed或其他格式;共享内存API multiprocessing;等)。
  • 在进程之间发送大量数据意味着数据必须是可腌制的(或者,如果将它们粘贴到文件或共享内存中,struct则是-理想情况下是-理想ctypes)。

I’m guessing the question you really care about here is:

Is there a way to force Python to release all the memory that was used (if you know you won’t be using that much memory again)?

No, there is not. But there is an easy workaround: child processes.

If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won’t touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.

This isn’t completely trivial and free, but it’s pretty easy and cheap, which is usually good enough for the trade to be worthwhile.

First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    result = executor.submit(func, *args, **kwargs).result()

If you need a little more control, use the multiprocessing module.

The costs are:

  • Process startup is kind of slow on some platforms, notably Windows. We’re talking milliseconds here, not minutes, and if you’re spinning up one child to do 300 seconds’ worth of work, you won’t even notice it. But it’s not free.
  • If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you’re saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
  • Sending large amounts of data between processes can be slow. Again, if you’re talking about sending over 2K of arguments and getting back 64K of results, you won’t even notice it, but if you’re sending and receiving large amounts of data, you’ll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
  • Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

回答 2

eryksun已经回答了问题1,而我已经回答了问题3(原始的#4),但是现在让我们回答问题2:

为什么特别释放50.5mb-释放量基于多少?

最终,它基于的是Python内部的一系列巧合,而malloc这些巧合很难预测。

首先,根据测量内存的方式,您可能只在测量实际映射到内存的页面。在这种情况下,每当页面被调页器换出时,内存将显示为“已释放”,即使尚未释放也是如此。

或者您可能正在测量使用中的页面,这些页面可能会或可能不会计算已分配但从未接触过的页面(在乐观地过度分配的系统(例如linux)上),已分配但已标记的页面MADV_FREE等。

如果您确实在测量分配的页面(这实际上不是一件非常有用的事情,但这似乎是您要问的问题),并且页面实际上已经被释放了,则可能会发生两种情况:您曾经使用过brk或等效方法来缩小数据段(如今非常少见),或者您曾经使用过munmap或类似方法来释放映射的段。(从理论上讲,后者也有一个较小的变体,因为有一些方法可以释放一部分已映射的段,例如,将其窃取MAP_FIXED用于MADV_FREE立即取消映射的段。)

但是大多数程序并不直接在内存页面中分配内容。他们使用malloc-style分配器。当您调用时free,如果您恰巧free正在映射中的最后一个活动对象(或数据段的最后N个页面),则分配器只能将页面释放到OS 。您的应用程序无法合理地预测甚至提前检测到它。

CPython使这一过程变得更加复杂-它在的顶部具有一个自定义的2级对象分配器,而在的顶部具有一个自定义的内存分配器malloc。(有关更详细的解释,请参见源注释。)此外,即使在C API级别上,Python也要少得多,您甚至不直接控制顶级对象的释放时间。

因此,当您释放一个对象时,如何知道它是否将向OS释放内存?好吧,首先,您必须知道已发布了最后一个引用(包括您不知道的任何内部引用),从而允许GC对其进行分配。(与其他实现不同,至少CPython会在允许的情况下立即释放对象。)这通常会在下一级向下释放至少两件事(例如,对于一个字符串,您释放该PyString对象和字符串缓冲区) )。

如果确实要释放对象,则要知道这是否导致下一级别的释放对象存储块,您必须知道对象分配器的内部状态及其实现方式。(除非您要取消分配块中的最后一件事,否则显然不会发生,即使那样,也可能不会发生。)

如果确实要释放对象存储块,要知道这是否导致free调用,则必须知道PyMem分配器的内部状态及其实现方式。(同样,您必须在malloced区域中释放最后一个使用中的块,即使那样,也可能不会发生。)

如果你 free一个malloc版区,要知道这是否会导致munmap或同等学历(或brk),你必须知道的内部状态malloc,以及它是如何实现的。而且,这个不同于其他,它是高度特定于平台的。(同样,您通常必须malloc在一个mmap网段中释放最后一次使用的资源,即使那样,也可能不会发生。)

因此,如果您想了解为什么它恰好释放了50.5mb,则必须从下至上进行跟踪。malloc当您进行一次或多次free通话(可能超过50.5mb)时,为什么不映射50.5mb的页面?您必须阅读平台的malloc,然后遍历各种表和列表以查看其当前状态。(在某些平台上,它甚至可能利用系统级信息,而如果不制作系统快照以进行脱机检查几乎是不可能捕获的,但是幸运的是,这通常不是问题。)然后,您必须在以上三个级别上执行相同的操作。

因此,对该问题唯一有用的答案是“因为”。

除非您正在进行资源有限的(例如嵌入式)开发,否则您没有理由关心这些细节。

如果你正在做资源有限的发展,了解这些细节是无用的; 您几乎必须在所有这些级别上进行最终运行,尤其mmap是在应用程序级别上可能需要的内存(可能在两者之间使用一个简单的,易于理解的,特定于应用程序的区域分配器)。

eryksun has answered question #1, and I’ve answered question #3 (the original #4), but now let’s answer question #2:

Why does it release 50.5mb in particular – what is the amount that is released based on?

What it’s based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.

First, depending on how you’re measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as “freed”, even though it hasn’t been freed.

Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.

If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you’re asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you’ve used brk or equivalent to shrink the data segment (very rare nowadays), or you’ve used munmap or similar to release a mapped segment. (There’s also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)

But most programs don’t directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There’s no way your application can reasonably predict this, or even detect that it happened in advance.

CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don’t even directly control when the top-level objects are deallocated.

So, when you release an object, how do you know whether it’s going to release memory to the OS? Well, first you have to know that you’ve released the last reference (including any internal references you didn’t know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it’s allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you’re releasing the PyString object, and the string buffer).

If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it’s implemented. (It obviously can’t happen unless you’re deallocating the last thing in the block, and even then, it may not happen.)

If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it’s implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)

If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it’s implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)

So, if you want to understand why it happened to release exactly 50.5mb, you’re going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You’d have to read your platform’s malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn’t usually a problem.) And then you have to do the same thing at the 3 levels above that.

So, the only useful answer to the question is “Because.”

Unless you’re doing resource-limited (e.g., embedded) development, you have no reason to care about these details.

And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).


回答 3

首先,您可能要安装一眼:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

然后在终端中运行它!

glances

在您的Python代码中,在文件的开头添加以下内容:

import os
import gc # Garbage Collector

使用“ Big”变量(例如:myBigVar)后,您要为其释放内存,请在python代码中编写以下内容:

del myBigVar
gc.collect()

在另一个终端中,运行您的python代码,并在“ glances”终端中观察如何在系统中管理内存!

祝好运!

PS我假设您正在Debian或Ubuntu系统上工作

First, you may want to install glances:

sudo apt-get install python-pip build-essential python-dev lm-sensors 
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances

Then run it in the terminal!

glances

In your Python code, add at the begin of the file, the following:

import os
import gc # Garbage Collector

After using the “Big” variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:

del myBigVar
gc.collect()

In another terminal, run your python code and observe in the “glances” terminal, how the memory is managed in your system!

Good luck!

P.S. I assume you are working on a Debian or Ubuntu system


用1行代码打开读取和关闭文件

问题:用1行代码打开读取和关闭文件

现在我使用:

pageHeadSectionFile = open('pagehead.section.htm','r')
output = pageHeadSectionFile.read()
pageHeadSectionFile.close()

但是为了使代码看起来更好,我可以这样做:

output = open('pagehead.section.htm','r').read()

使用上述语法时,如何关闭文件以释放系统资源?

Now I use:

pageHeadSectionFile = open('pagehead.section.htm','r')
output = pageHeadSectionFile.read()
pageHeadSectionFile.close()

But to make the code look better, I can do:

output = open('pagehead.section.htm','r').read()

When using the above syntax, how do I close the file to free up system resources?


回答 0

您实际上不必关闭它-Python将在垃圾回收期间或程序退出时自动完成它。但是正如@delnan指出的,出于各种原因,最好将其显式关闭。

因此,可以做些什么来使其简短,简单和明确:

with open('pagehead.section.htm','r') as f:
    output = f.read()

我认为,现在只有两行并且可读性强。

You don’t really have to close it – Python will do it automatically either during garbage collection or at program exit. But as @delnan noted, it’s better practice to explicitly close it for various reasons.

So, what you can do to keep it short, simple and explicit:

with open('pagehead.section.htm','r') as f:
    output = f.read()

Now it’s just two lines and pretty readable, I think.


回答 1

Python标准库Pathlib模块可满足您的需求:

Path('pagehead.section.htm').read_text()

不要忘记导入路径:

jsk@dev1:~$ python3
Python 3.5.2 (default, Sep 10 2016, 08:21:44)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> (Path("/etc") / "hostname").read_text()
'dev1.example\n'

在Python 27上安装反向移植pathlibpathlib2

Python Standard Library Pathlib module does what you looking for:

Path('pagehead.section.htm').read_text()

Don’t forget to import Path:

jsk@dev1:~$ python3
Python 3.5.2 (default, Sep 10 2016, 08:21:44)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> (Path("/etc") / "hostname").read_text()
'dev1.example\n'

On Python 27 install backported pathlib or pathlib2


回答 2

使用CPython,将在执行该行后立即关闭您的文件,因为该文件对象会立即被垃圾回收。但是,有两个缺点:

  1. 在不同于CPython的Python实现中,通常不会立即关闭文件,而是在以后的时间,超出您的控制范围。

  2. 在Python 3.2或更高版本中ResourceWarning,如果启用,将抛出。

最好再投资一条线:

with open('pagehead.section.htm','r') as f:
    output = f.read()

这将确保在所有情况下都正确关闭文件。

Using CPython, your file will be closed immediately after the line is executed, because the file object is immediately garbage collected. There are two drawbacks, though:

  1. In Python implementations different from CPython, the file often isn’t immediately closed, but rather at a later time, beyond your control.

  2. In Python 3.2 or above, this will throw a ResourceWarning, if enabled.

Better to invest one additional line:

with open('pagehead.section.htm','r') as f:
    output = f.read()

This will ensure that the file is correctly closed under all circumstances.


回答 3

无需导入任何特殊的库即可执行此操作。

使用常规语法,它将打开文件进行读取,然后将其关闭。

with open("/etc/hostname","r") as f: print f.read() 

要么

with open("/etc/hosts","r") as f: x = f.read().splitlines()

这将为您提供一个包含行的数组x,并且可以这样打印:

for line in x: print line

这些单行代码对于维护非常有帮助-基本上是自我记录。

No need to import any special libraries to do this.

Use normal syntax and it will open the file for reading then close it.

with open("/etc/hostname","r") as f: print f.read() 

or

with open("/etc/hosts","r") as f: x = f.read().splitlines()

which gives you an array x containing the lines, and can be printed like so:

for line in x: print line

These one-liners are very helpful for maintenance – basically self-documenting.


回答 4

您可以做的是使用该with语句,并在一行上写下两个步骤:

>>> with open('pagehead.section.htm', 'r') as fin: output = fin.read();
>>> print(output)
some content

with语句将谨慎地调用__exit__给定对象的函数,即使代码中发生了一些问题;它接近try... finally语法。对于由返回的对象open__exit__对应于文件关闭。

该语句已在Python 2.6中引入。

What you can do is to use the with statement, and write the two steps on one line:

>>> with open('pagehead.section.htm', 'r') as fin: output = fin.read();
>>> print(output)
some content

The with statement will take care to call __exit__ function of the given object even if something bad happened in your code; it’s close to the try... finally syntax. For object returned by open, __exit__ corresponds to file closure.

This statement has been introduced with Python 2.6.


回答 5

使用ilio:(inline io):

仅一个函数调用,而不是文件open(),read(),close()。

from ilio import read

content = read('filename')

use ilio: (inline io):

just one function call instead of file open(), read(), close().

from ilio import read

content = read('filename')

回答 6

with open('pagehead.section.htm')as f:contents=f.read()
with open('pagehead.section.htm')as f:contents=f.read()

回答 7

我认为实现此目标的最自然的方法是定义一个功能。

def read(filename):
    f = open(filename, 'r')
    output = f.read()
    f.close()
    return output

然后,您可以执行以下操作:

output = read('pagehead.section.htm')

I think the most natural way for achieving this is to define a function.

def read(filename):
    f = open(filename, 'r')
    output = f.read()
    f.close()
    return output

Then you can do the following:

output = read('pagehead.section.htm')

回答 8

当我需要在日志文件中抓取一些内容时,我经常这样做:

$ grep -n "xlrd" requirements.txt | awk -F ":" '{print $1}'
54

$ python -c "with open('requirements.txt') as file: print ''.join(file.readlines()[52:55])"
wsgiref==0.1.2
xlrd==0.9.2
xlwt==0.7.5

I frequently do something like this when I need to get a few lines surrounding something I’ve grepped in a log file:

$ grep -n "xlrd" requirements.txt | awk -F ":" '{print $1}'
54

$ python -c "with open('requirements.txt') as file: print ''.join(file.readlines()[52:55])"
wsgiref==0.1.2
xlrd==0.9.2
xlwt==0.7.5

回答 9

使用more_itertools.with_iter,可以output在一行中打开,读取,关闭和分配等效项(不包括import语句):

import more_itertools as mit


output = "".join(line for line in mit.with_iter(open("pagehead.section.htm", "r")))

虽然可能,但我会寻找另一种方法,而不是将文件的内容分配给变量,即延迟迭代-这可以使用传统with块来完成,也可以在上面的示例中通过删除join()和迭代来完成output

Using more_itertools.with_iter, it is possible to open, read, close and assign an equivalent output in one line (excluding the import statement):

import more_itertools as mit


output = "".join(line for line in mit.with_iter(open("pagehead.section.htm", "r")))

Although possible, I would look for another approach other than assigning the contents of a file to a variable, i.e. lazy iteration – this can be done using a traditional with block or in the example above by removing join() and iterating output.


回答 10

如果你想要的温暖和模糊的感觉,只是去用

对于python 3.6,我在IDLE的新起点下运行了这两个程序,给出了以下运行时:

0.002000093460083008  Test A
0.0020003318786621094 Test B: with guaranteed close

所以没有太大的区别。

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Test A for reading a text file line-by-line into a list
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import time

#                                  # MAINLINE
if __name__ == '__main__':
    print("OK, starting program...")

    inTextFile = '/Users/Mike/Desktop/garbage.txt'

#                                  # Test: A: no 'with;
    c=[]
    start_time = time.time()
    c = open(inTextFile).read().splitlines()
    print("--- %s seconds ---" % (time.time() - start_time))

    print("OK, program execution has ended.")
    sys.exit()                     # END MAINLINE

输出:

OK, starting program...
--- 0.002000093460083008 seconds ---
OK, program execution has ended.

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Test B for reading a text file line-by-line into a list
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import time

#                                  # MAINLINE
if __name__ == '__main__':
    print("OK, starting program...")

    inTextFile = '/Users/Mike/Desktop/garbage.txt'

#                                  # Test: B: using 'with'
    c=[]
    start_time = time.time()
    with open(inTextFile) as D: c = D.read().splitlines()
    print("--- %s seconds ---" % (time.time() - start_time))

    print("OK, program execution has ended.")
    sys.exit()                     # END MAINLINE

输出:

OK, starting program...
--- 0.0020003318786621094 seconds ---
OK, program execution has ended.

If you want that warm and fuzzy feeling just go with with.

For python 3.6 I ran these two programs under a fresh start of IDLE, giving runtimes of:

0.002000093460083008  Test A
0.0020003318786621094 Test B: with guaranteed close

So not much of a difference.

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Test A for reading a text file line-by-line into a list
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import time

#                                  # MAINLINE
if __name__ == '__main__':
    print("OK, starting program...")

    inTextFile = '/Users/Mike/Desktop/garbage.txt'

#                                  # Test: A: no 'with;
    c=[]
    start_time = time.time()
    c = open(inTextFile).read().splitlines()
    print("--- %s seconds ---" % (time.time() - start_time))

    print("OK, program execution has ended.")
    sys.exit()                     # END MAINLINE

OUTPUT:

OK, starting program...
--- 0.002000093460083008 seconds ---
OK, program execution has ended.

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Test B for reading a text file line-by-line into a list
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import time

#                                  # MAINLINE
if __name__ == '__main__':
    print("OK, starting program...")

    inTextFile = '/Users/Mike/Desktop/garbage.txt'

#                                  # Test: B: using 'with'
    c=[]
    start_time = time.time()
    with open(inTextFile) as D: c = D.read().splitlines()
    print("--- %s seconds ---" % (time.time() - start_time))

    print("OK, program execution has ended.")
    sys.exit()                     # END MAINLINE

OUTPUT:

OK, starting program...
--- 0.0020003318786621094 seconds ---
OK, program execution has ended.

在python中将stdout重定向为“ nothing”

问题:在python中将stdout重定向为“ nothing”

我有一个大型项目,其中包含足够多的模块,每个模块都将一些内容打印到标准输出中。现在,随着项目规模的扩大,没有大型项目。的print报表打印在其上制作的节目相当慢性病了很多。

因此,我现在想在运行时决定是否将任何内容打印到标准输出。我无法对模块进行更改,因为其中有很多更改。(我知道我可以将标准输出重定向到文件,但即使这样也很慢。)

所以我的问题是如何将stdout重定向为空,即如何使print语句不执行任何操作?

# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.

目前,我唯一的想法是制作一个具有write方法的类(不执行任何操作),然后将stdout重定向到该类的实例。

class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp

在python中有内置的机制吗?还是有比这更好的东西?

I have a large project consisting of sufficiently large number of modules, each printing something to the standard output. Now as the project has grown in size, there are large no. of print statements printing a lot on the std out which has made the program considerably slower.

So, I now want to decide at runtime whether or not to print anything to the stdout. I cannot make changes in the modules as there are plenty of them. (I know I can redirect the stdout to a file but even this is considerably slow.)

So my question is how do I redirect the stdout to nothing ie how do I make the print statement do nothing?

# I want to do something like this.
sys.stdout = None         # this obviously will give an error as Nonetype object does not have any write method.

Currently the only idea I have is to make a class which has a write method (which does nothing) and redirect the stdout to an instance of this class.

class DontPrint(object):
    def write(*args): pass

dp = DontPrint()
sys.stdout = dp

Is there an inbuilt mechanism in python for this? Or is there something better than this?


回答 0

跨平台:

import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f

在Windows上:

f = open('nul', 'w')
sys.stdout = f

在Linux上:

f = open('/dev/null', 'w')
sys.stdout = f

Cross-platform:

import os
import sys
f = open(os.devnull, 'w')
sys.stdout = f

On Windows:

f = open('nul', 'w')
sys.stdout = f

On Linux:

f = open('/dev/null', 'w')
sys.stdout = f

回答 1

这样做的一种好方法是创建一个用于包装打印内容的小型上下文处理器。然后,您可以使用with-statement来使所有输出静音。

Python 2:

import os
import sys
from contextlib import contextmanager

@contextmanager
def silence_stdout():
    old_target = sys.stdout
    try:
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
    finally:
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4具有内置的上下文处理器,因此您可以像这样简单地使用contextlib:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")

运行此代码仅显示输出的第二行,而不输出第一行:

$ python test.py
this will print

这可以跨平台(Windows + Linux + Mac OSX)运行,并且比其他解决方案更干净。

A nice way to do this is to create a small context processor that you wrap your prints in. You then just use is in a with-statement to silence all output.

Python 2:

import os
import sys
from contextlib import contextmanager

@contextmanager
def silence_stdout():
    old_target = sys.stdout
    try:
        with open(os.devnull, "w") as new_target:
            sys.stdout = new_target
            yield new_target
    finally:
        sys.stdout = old_target

with silence_stdout():
    print("will not print")

print("this will print")

Python 3.4+:

Python 3.4 has a context processor like this built-in, so you can simply use contextlib like this:

import contextlib

with contextlib.redirect_stdout(None):
    print("will not print")

print("this will print")

Running this code only prints the second line of output, not the first:

$ python test.py
this will print

This works cross-platform (Windows + Linux + Mac OSX), and is cleaner than the ones other answers imho.


回答 2

如果您使用的是python 3.4或更高版本,则可以使用标准库提供一种简单安全的解决方案:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

If you’re in python 3.4 or higher, there’s a simple and safe solution using the standard library:

import contextlib

with contextlib.redirect_stdout(None):
  print("This won't print!")

回答 3

(至少在我的系统上)似乎写os.devnull比写DontPrint类快大约5倍,即

#!/usr/bin/python
import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):
      pass


printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)

给出以下输出:

<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

(at least on my system) it appears that writing to os.devnull is about 5x faster than writing to a DontPrint class, i.e.

#!/usr/bin/python
import os
import sys
import datetime

ITER = 10000000
def printlots(out, it, st="abcdefghijklmnopqrstuvwxyz1234567890"):
   temp = sys.stdout
   sys.stdout = out
   i = 0
   start_t = datetime.datetime.now()
   while i < it:
      print st
      i = i+1
   end_t = datetime.datetime.now()
   sys.stdout = temp
   print out, "\n   took", end_t - start_t, "for", it, "iterations"

class devnull():
   def write(*args):
      pass


printlots(open(os.devnull, 'wb'), ITER)
printlots(devnull(), ITER)

gave the following output:

<open file '/dev/null', mode 'wb' at 0x7f2b747044b0> 
   took 0:00:02.074853 for 10000000 iterations
<__main__.devnull instance at 0x7f2b746bae18> 
   took 0:00:09.933056 for 10000000 iterations

回答 4

如果您在Unix环境(包括Linux)中,则可以将输出重定向到/dev/null

python myprogram.py > /dev/null

对于Windows:

python myprogram.py > nul

If you’re in a Unix environment (Linux included), you can redirect output to /dev/null:

python myprogram.py > /dev/null

And for Windows:

python myprogram.py > nul

回答 5

这个怎么样:

from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")
        stack.enter_context(null_stream)
        stack.enter_context(redirect_stdout(null_stream))
    noisy_function()

这将使用contextlib模块中的功能根据的结果隐藏要尝试运行的任何命令的输出should_hide_output(),然后在该函数运行完毕后恢复输出行为。

如果您想隐藏标准错误输出,请redirect_stderr从导入contextlib并添加一行stack.enter_context(redirect_stderr(null_stream))

主要缺点是,这仅适用于Python 3.4和更高版本。

How about this:

from contextlib import ExitStack, redirect_stdout
import os

with ExitStack() as stack:
    if should_hide_output():
        null_stream = open(os.devnull, "w")
        stack.enter_context(null_stream)
        stack.enter_context(redirect_stdout(null_stream))
    noisy_function()

This uses the features in the contextlib module to hide the output of whatever command you are trying to run, depending on the result of should_hide_output(), and then restores the output behavior after that function is done running.

If you want to hide standard error output, then import redirect_stderr from contextlib and add a line saying stack.enter_context(redirect_stderr(null_stream)).

The main downside it that this only works in Python 3.4 and later versions.


回答 6

您的类将正常工作(write()方法名称除外-需要将其称为write()小写)。只要确保将副本保存sys.stdout在另一个变量中即可。

如果您使用的是* NIX,则可以执行sys.stdout = open('/dev/null'),但这比滚动自己的类要轻巧。

Your class will work just fine (with the exception of the write() method name — it needs to be called write(), lowercase). Just make sure you save a copy of sys.stdout in another variable.

If you’re on a *NIX, you can do sys.stdout = open('/dev/null'), but this is less portable than rolling your own class.


回答 7

您可以嘲笑它。

import mock

sys.stdout = mock.MagicMock()

You can just mock it.

import mock

sys.stdout = mock.MagicMock()

回答 8

你为什么不试试这个?

sys.stdout.close()
sys.stderr.close()

Why don’t you try this?

sys.stdout.close()
sys.stderr.close()

回答 9

sys.stdout = None

可以的print()情况下。但是,如果您调用sys.stdout的任何方法(例如),则可能会导致错误sys.stdout.write()

在文档中有一个注释

在某些情况下,stdin,stdout和stderr以及原始值stdinstdoutstderr可以为None。对于未连接到控制台的Windows GUI应用程序以及以pythonw开头的Python应用程序,通常是这种情况。

sys.stdout = None

It is OK for print() case. But it can cause an error if you call any method of sys.stdout, e.g. sys.stdout.write().

There is a note in docs:

Under some conditions stdin, stdout and stderr as well as the original values stdin, stdout and stderr can be None. It is usually the case for Windows GUI apps that aren’t connected to a console and Python apps started with pythonw.


回答 10

补充iFreilicht的答案 -适用于python 2和3。

import sys

class NonWritable:
    def write(self, *args, **kwargs):
        pass

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")

Supplement to iFreilicht’s answer – it works for both python 2 & 3.

import sys

class NonWritable:
    def write(self, *args, **kwargs):
        pass

class StdoutIgnore:
    def __enter__(self):
        self.stdout_saved = sys.stdout
        sys.stdout = NonWritable()
        return self

    def __exit__(self, *args):
        sys.stdout = self.stdout_saved

with StdoutIgnore():
    print("This won't print!")

如果存在列表索引,请执行X

问题:如果存在列表索引,请执行X

在我的程序中,用户输入number n,然后输入n字符串数,这些字符串存储在列表中。

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

我已经嵌套了if语句,这使情况变得更加复杂len(my_list)

这是我现在所拥有的简化版本,无法使用:

n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()
    nams.append(name)

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
        do_something
    if len(nams) > 4
        do_something_else

if nams[3]: #etc.

In my program, user inputs number n, and then inputs n number of strings, which get stored in a list.

I need to code such that if a certain list index exists, then run a function.

This is made more complicated by the fact that I have nested if statements about len(my_list).

Here’s a simplified version of what I have now, which isn’t working:

n = input ("Define number of actors: ")

count = 0

nams = []

while count < n:
    count = count + 1
    print "Define name for actor ", count, ":"
    name = raw_input ()
    nams.append(name)

if nams[2]: #I am trying to say 'if nams[2] exists, do something depending on len(nams)
    if len(nams) > 3:
        do_something
    if len(nams) > 4
        do_something_else

if nams[3]: #etc.

回答 0

使用列表的长度len(n)来告知您的决定而不是检查n[i]每个可能的长度,对您来说更有用吗?

Could it be more useful for you to use the length of the list len(n) to inform your decision rather than checking n[i] for each possible length?


回答 1

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

这是try块的完美用法:

ar=[1,2,3]

try:
    t=ar[5]
except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

但是,根据定义,Python列表中0和之间的所有项都len(the_list)-1存在(即,无需尝试,除非您知道0 <= index < len(the_list))。

如果希望索引在0和最后一个元素之间,可以使用enumerate

names=['barney','fred','dino']

for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...

如果您正在寻找一些明确的“索引”思想,那么我想您是在问一个错误的问题。也许您应该考虑使用映射容器(例如dict)与序列容器(例如列表)。您可以这样重写代码:

def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

something_to_do={  
    3: do_something,        
    4: do_something_else
    }        

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()
    names.append(name)

for name in names:
    try:
        something_to_do[len(name)](name)
    except KeyError:
        default(name)

像这样运行:

Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

您也可以使用.get方法而不是try / except来获得较短的版本:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

I need to code such that if a certain list index exists, then run a function.

This is the perfect use for a try block:

ar=[1,2,3]

try:
    t=ar[5]
except IndexError:
    print('sorry, no 5')   

# Note: this only is a valid test in this context 
# with absolute (ie, positive) index
# a relative index is only showing you that a value can be returned
# from that relative index from the end of the list...

However, by definition, all items in a Python list between 0 and len(the_list)-1 exist (i.e., there is no need for a try, except if you know 0 <= index < len(the_list)).

You can use enumerate if you want the indexes between 0 and the last element:

names=['barney','fred','dino']

for i, name in enumerate(names):
    print(i + ' ' + name)
    if i in (3,4):
        # do your thing with the index 'i' or value 'name' for each item...

If you are looking for some defined ‘index’ thought, I think you are asking the wrong question. Perhaps you should consider using a mapping container (such as a dict) versus a sequence container (such as a list). You could rewrite your code like this:

def do_something(name):
    print('some thing 1 done with ' + name)

def do_something_else(name):
    print('something 2 done with ' + name)        

def default(name):
    print('nothing done with ' + name)     

something_to_do={  
    3: do_something,        
    4: do_something_else
    }        

n = input ("Define number of actors: ")
count = 0
names = []

for count in range(n):
    print("Define name for actor {}:".format(count+1))
    name = raw_input ()
    names.append(name)

for name in names:
    try:
        something_to_do[len(name)](name)
    except KeyError:
        default(name)

Runs like this:

Define number of actors: 3
Define name for actor 1: bob
Define name for actor 2: tony
Define name for actor 3: alice
some thing 1 done with bob
something 2 done with tony
nothing done with alice

You can also use .get method rather than try/except for a shorter version:

>>> something_to_do.get(3, default)('bob')
some thing 1 done with bob
>>> something_to_do.get(22, default)('alice')
nothing done with alice

回答 2

len(nams)应该等于n您的代码。所有索引都0 <= i < n“存在”。

len(nams) should be equal to n in your code. All indexes 0 <= i < n “exist”.


回答 3

只需使用以下代码即可完成:

if index < len(my_list):
    print(index, 'exists in the list')
else:
    print(index, "doesn't exist in the list")

It can be done simply using the following code:

if index < len(my_list):
    print(index, 'exists in the list')
else:
    print(index, "doesn't exist in the list")

回答 4

我需要进行编码,以便如果存在某个列表索引,然后运行一个函数。

您已经知道如何对此进行测试,并且实际上已经在代码中执行了这种测试

对于长的列表,有效索引n0通过n-1包容性。

因此,i 当且仅当列表的长度至少为时,列表才具有索引i + 1

I need to code such that if a certain list index exists, then run a function.

You already know how to test for this and in fact are already performing such tests in your code.

The valid indices for a list of length n are 0 through n-1 inclusive.

Thus, a list has an index i if and only if the length of the list is at least i + 1.


回答 5

使用列表的长度是检查索引是否存在的最快解决方案:

def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)

这还将测试负索引以及具有长度的大多数序列类型(如rangesstr)。

如果无论如何您以后都需要访问该索引处的项目,则宽恕要比权限容易,并且它也更快,更Python化。使用try: except:

try:
    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item

这与以下情况相反:

if index_exists(ls, i):
    item = ls[i]
    # Do something with item
else:
    # Do something without the item

Using the length of the list would be the fastest solution to check if an index exists:

def index_exists(ls, i):
    return (0 <= i < len(ls)) or (-len(ls) <= i < 0)

This also tests for negative indices, and most sequence types (Like ranges and strs) that have a length.

If you need to access the item at that index afterwards anyways, it is easier to ask forgiveness than permission, and it is also faster and more Pythonic. Use try: except:.

try:
    item = ls[i]
    # Do something with item
except IndexError:
    # Do something without the item

This would be as opposed to:

if index_exists(ls, i):
    item = ls[i]
    # Do something with item
else:
    # Do something without the item

回答 6

如果要迭代插入的actor数据:

for i in range(n):
    if len(nams[i]) > 3:
        do_something
    if len(nams[i]) > 4:
        do_something_else

If you want to iterate the inserted actors data:

for i in range(n):
    if len(nams[i]) > 3:
        do_something
    if len(nams[i]) > 4:
        do_something_else

回答 7

好的,所以我认为这实际上是可能的(出于参数的目的):

>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
True
>>> 3 in zip(*enumerate(your_list))[0]
False

ok, so I think it’s actually possible (for the sake of argument):

>>> your_list = [5,6,7]
>>> 2 in zip(*enumerate(your_list))[0]
True
>>> 3 in zip(*enumerate(your_list))[0]
False

回答 8

您可以尝试这样的事情

list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
        break
    print list[i+1];

You can try something like this

list = ["a", "b", "C", "d", "e", "f", "r"]

for i in range(0, len(list), 2):
    print list[i]
    if len(list) % 2 == 1 and  i == len(list)-1:
        break
    print list[i+1];

回答 9

Oneliner:

do_X() if len(your_list) > your_index else do_something_else()  

完整示例:

In [10]: def do_X(): 
    ...:     print(1) 
    ...:                                                                                                                                                                                                                                      

In [11]: def do_something_else(): 
    ...:     print(2) 
    ...:                                                                                                                                                                                                                                      

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      
1

仅用于信息。恕我直言,try ... except IndexError是更好的解决方案。

Oneliner:

do_X() if len(your_list) > your_index else do_something_else()  

Full example:

In [10]: def do_X(): 
    ...:     print(1) 
    ...:                                                                                                                                                                                                                                      

In [11]: def do_something_else(): 
    ...:     print(2) 
    ...:                                                                                                                                                                                                                                      

In [12]: your_index = 2                                                                                                                                                                                                                       

In [13]: your_list = [1,2,3]                                                                                                                                                                                                                  

In [14]: do_X() if len(your_list) > your_index else do_something_else()                                                                                                                                                                      
1

Just for info. Imho, try ... except IndexError is better solution.


回答 10

不要在方括号前留任何空间。

例:

n = input ()
         ^

提示:您应该在代码上方和/或下方添加注释。不隐藏您的代码。


祝你今天愉快。

Do not let any space in front of your brackets.

Example:

n = input ()
         ^

Tip: You should add comments over and/or under your code. Not behind your code.


Have a nice day.


回答 11

很多答案,而不是简单的答案。

要检查字典dict是否存在索引“ id”:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic

如果存在“年龄”,则返回true。

A lot of answers, not the simple one.

To check if a index ‘id’ exists at dictionary dict:

dic = {}
dic['name'] = "joao"
dic['age']  = "39"

if 'age' in dic

returns true if ‘age’ exists.


如何以相反的顺序读取文件?

问题:如何以相反的顺序读取文件?

如何使用python以相反的顺序读取文件?我想从最后一行读取文件。

How to read a file in reverse order using python? I want to read a file from last line to first line.


回答 0

for line in reversed(open("filename").readlines()):
    print line.rstrip()

在Python 3中:

for line in reversed(list(open("filename"))):
    print(line.rstrip())
for line in reversed(open("filename").readlines()):
    print line.rstrip()

And in Python 3:

for line in reversed(list(open("filename"))):
    print(line.rstrip())

回答 1

作为生成器编写的正确,有效的答案。

import os

def reverse_readline(filename, buf_size=8192):
    """A generator that returns the lines of a file in reverse order"""
    with open(filename) as fh:
        segment = None
        offset = 0
        fh.seek(0, os.SEEK_END)
        file_size = remaining_size = fh.tell()
        while remaining_size > 0:
            offset = min(file_size, offset + buf_size)
            fh.seek(file_size - offset)
            buffer = fh.read(min(remaining_size, buf_size))
            remaining_size -= buf_size
            lines = buffer.split('\n')
            # The first line of the buffer is probably not a complete line so
            # we'll save it and append it to the last line of the next buffer
            # we read
            if segment is not None:
                # If the previous chunk starts right from the beginning of line
                # do not concat the segment to the last line of new chunk.
                # Instead, yield the segment first 
                if buffer[-1] != '\n':
                    lines[-1] += segment
                else:
                    yield segment
            segment = lines[0]
            for index in range(len(lines) - 1, 0, -1):
                if lines[index]:
                    yield lines[index]
        # Don't yield None if the file was empty
        if segment is not None:
            yield segment

A correct, efficient answer written as a generator.

import os

def reverse_readline(filename, buf_size=8192):
    """A generator that returns the lines of a file in reverse order"""
    with open(filename) as fh:
        segment = None
        offset = 0
        fh.seek(0, os.SEEK_END)
        file_size = remaining_size = fh.tell()
        while remaining_size > 0:
            offset = min(file_size, offset + buf_size)
            fh.seek(file_size - offset)
            buffer = fh.read(min(remaining_size, buf_size))
            remaining_size -= buf_size
            lines = buffer.split('\n')
            # The first line of the buffer is probably not a complete line so
            # we'll save it and append it to the last line of the next buffer
            # we read
            if segment is not None:
                # If the previous chunk starts right from the beginning of line
                # do not concat the segment to the last line of new chunk.
                # Instead, yield the segment first 
                if buffer[-1] != '\n':
                    lines[-1] += segment
                else:
                    yield segment
            segment = lines[0]
            for index in range(len(lines) - 1, 0, -1):
                if lines[index]:
                    yield lines[index]
        # Don't yield None if the file was empty
        if segment is not None:
            yield segment

回答 2

这样的事情怎么样:

import os


def readlines_reverse(filename):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        position = qfile.tell()
        line = ''
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n":
                yield line[::-1]
                line = ''
            else:
                line += next_char
            position -= 1
        yield line[::-1]


if __name__ == '__main__':
    for qline in readlines_reverse(raw_input()):
        print qline

由于文件是按相反的顺序逐个字符读取的,因此即使在非常大的文件中也可以使用,只要将单独的行放入内存中即可。

How about something like this:

import os


def readlines_reverse(filename):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        position = qfile.tell()
        line = ''
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n":
                yield line[::-1]
                line = ''
            else:
                line += next_char
            position -= 1
        yield line[::-1]


if __name__ == '__main__':
    for qline in readlines_reverse(raw_input()):
        print qline

Since the file is read character by character in reverse order, it will work even on very large files, as long as individual lines fit into memory.


回答 3

您也可以使用python模块 file_read_backwards

通过pip install file_read_backwards(v1.2.1)安装后,您可以通过以下方式以内存高效的方式向后(逐行)读取整个文件:

#!/usr/bin/env python2.7

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
    for l in frb:
         print l

它支持“ utf-8”,“ latin-1”和“ ascii”编码。

也支持python3。可以在http://file-read-backwards.readthedocs.io/en/latest/readme.html上找到更多文档。

You can also use python module file_read_backwards.

After installing it, via pip install file_read_backwards (v1.2.1), you can read the entire file backwards (line-wise) in a memory efficient manner via:

#!/usr/bin/env python2.7

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
    for l in frb:
         print l

It supports “utf-8″,”latin-1”, and “ascii” encodings.

Support is also available for python3. Further documentation can be found at http://file-read-backwards.readthedocs.io/en/latest/readme.html


回答 4

for line in reversed(open("file").readlines()):
    print line.rstrip()

如果您使用的是Linux,则可以使用taccommand。

$ tac file

您可以在ActiveState的此处此处找到2个食谱

for line in reversed(open("file").readlines()):
    print line.rstrip()

If you are on linux, you can use tac command.

$ tac file

2 recipes you can find in ActiveState here and here


回答 5

import re

def filerev(somefile, buffer=0x20000):
  somefile.seek(0, os.SEEK_END)
  size = somefile.tell()
  lines = ['']
  rem = size % buffer
  pos = max(0, (size // buffer - 1) * buffer)
  while pos >= 0:
    somefile.seek(pos, os.SEEK_SET)
    data = somefile.read(rem + buffer) + lines[0]
    rem = 0
    lines = re.findall('[^\n]*\n?', data)
    ix = len(lines) - 2
    while ix > 0:
      yield lines[ix]
      ix -= 1
    pos -= buffer
  else:
    yield lines[0]

with open(sys.argv[1], 'r') as f:
  for line in filerev(f):
    sys.stdout.write(line)
import re

def filerev(somefile, buffer=0x20000):
  somefile.seek(0, os.SEEK_END)
  size = somefile.tell()
  lines = ['']
  rem = size % buffer
  pos = max(0, (size // buffer - 1) * buffer)
  while pos >= 0:
    somefile.seek(pos, os.SEEK_SET)
    data = somefile.read(rem + buffer) + lines[0]
    rem = 0
    lines = re.findall('[^\n]*\n?', data)
    ix = len(lines) - 2
    while ix > 0:
      yield lines[ix]
      ix -= 1
    pos -= buffer
  else:
    yield lines[0]

with open(sys.argv[1], 'r') as f:
  for line in filerev(f):
    sys.stdout.write(line)

回答 6

接受的答案不适用于文件大而内存不足的情况(这种情况很少见)。

正如其他人所指出的那样,@ srohde的答案看起来不错,但存在下一个问题:

  • 当我们可以传递文件对象并将其留给用户来决定应以哪种编码方式读取文件时,打开文件看起来是多余的,
  • 即使我们重构为接受文件对象,它也不适用于所有编码:我们可以选择具有utf-8编码和非ascii内容的文件,例如

    й

    通过buf_size等于1并将具有

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: invalid start byte

    当然文字可能更大,但 buf_size可能会被拾取,从而导致上述混淆错误,

  • 我们无法指定自定义行分隔符,
  • 我们不能选择保留行分隔符。

因此,考虑到所有这些问题,我编写了单独的函数:

  • 一种适用于字节流的
  • 第二个用于文本流,并将其基础字节流委托给第一个,并解码结果行。

首先,让我们定义下一个实用程序函数:

ceil_division用于天花板分隔(与标准//地板分隔相反,更多信息可以在此线程中找到)

def ceil_division(left_number, right_number):
    """
    Divides given numbers with ceiling.
    """
    return -(-left_number // right_number)

split 通过从右端给定的分隔符分割字符串并保持其能力:

def split(string, separator, keep_separator):
    """
    Splits given string by given separator.
    """
    parts = string.split(separator)
    if keep_separator:
        *parts, last_part = parts
        parts = [part + separator for part in parts]
        if last_part:
            return parts + [last_part]
    return parts

read_batch_from_end 从二进制流的右端读取批处理

def read_batch_from_end(byte_stream, size, end_position):
    """
    Reads batch from the end of given byte stream.
    """
    if end_position > size:
        offset = end_position - size
    else:
        offset = 0
        size = end_position
    byte_stream.seek(offset)
    return byte_stream.read(size)

之后,我们可以定义函数以相反的顺序读取字节流,例如

import functools
import itertools
import os
from operator import methodcaller, sub


def reverse_binary_stream(byte_stream, batch_size=None,
                          lines_separator=None,
                          keep_lines_separator=True):
    if lines_separator is None:
        lines_separator = (b'\r', b'\n', b'\r\n')
        lines_splitter = methodcaller(str.splitlines.__name__,
                                      keep_lines_separator)
    else:
        lines_splitter = functools.partial(split,
                                           separator=lines_separator,
                                           keep_separator=keep_lines_separator)
    stream_size = byte_stream.seek(0, os.SEEK_END)
    if batch_size is None:
        batch_size = stream_size or 1
    batches_count = ceil_division(stream_size, batch_size)
    remaining_bytes_indicator = itertools.islice(
            itertools.accumulate(itertools.chain([stream_size],
                                                 itertools.repeat(batch_size)),
                                 sub),
            batches_count)
    try:
        remaining_bytes_count = next(remaining_bytes_indicator)
    except StopIteration:
        return

    def read_batch(position):
        result = read_batch_from_end(byte_stream,
                                     size=batch_size,
                                     end_position=position)
        while result.startswith(lines_separator):
            try:
                position = next(remaining_bytes_indicator)
            except StopIteration:
                break
            result = (read_batch_from_end(byte_stream,
                                          size=batch_size,
                                          end_position=position)
                      + result)
        return result

    batch = read_batch(remaining_bytes_count)
    segment, *lines = lines_splitter(batch)
    yield from reverse(lines)
    for remaining_bytes_count in remaining_bytes_indicator:
        batch = read_batch(remaining_bytes_count)
        lines = lines_splitter(batch)
        if batch.endswith(lines_separator):
            yield segment
        else:
            lines[-1] += segment
        segment, *lines = lines
        yield from reverse(lines)
    yield segment

最后,可以将文本文件反转功能定义如下:

import codecs


def reverse_file(file, batch_size=None, 
                 lines_separator=None,
                 keep_lines_separator=True):
    encoding = file.encoding
    if lines_separator is not None:
        lines_separator = lines_separator.encode(encoding)
    yield from map(functools.partial(codecs.decode,
                                     encoding=encoding),
                   reverse_binary_stream(
                           file.buffer,
                           batch_size=batch_size,
                           lines_separator=lines_separator,
                           keep_lines_separator=keep_lines_separator))

测验

准备工作

我已经使用fsutilcommand生成了4个文件:

  1. 没有内容的empty.txt,大小为0MB
  2. tiny.txt,大小为1MB
  3. small.txt,大小为10MB
  4. large.txt,大小为50MB

我也重构了@srohde解决方案以使用文件对象而不是文件路径。

测试脚本

from timeit import Timer

repeats_count = 7
number = 1
create_setup = ('from collections import deque\n'
                'from __main__ import reverse_file, reverse_readline\n'
                'file = open("{}")').format
srohde_solution = ('with file:\n'
                   '    deque(reverse_readline(file,\n'
                   '                           buf_size=8192),'
                   '          maxlen=0)')
azat_ibrakov_solution = ('with file:\n'
                         '    deque(reverse_file(file,\n'
                         '                       lines_separator="\\n",\n'
                         '                       keep_lines_separator=False,\n'
                         '                       batch_size=8192), maxlen=0)')
print('reversing empty file by "srohde"',
      min(Timer(srohde_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing empty file by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))

注意:我已经使用了collections.deque类来耗尽生成器。

产出

对于Windows 10上的PyPy 3.5:

reversing empty file by "srohde" 8.31e-05
reversing empty file by "Azat Ibrakov" 0.00016090000000000028
reversing tiny file (1MB) by "srohde" 0.160081
reversing tiny file (1MB) by "Azat Ibrakov" 0.09594989999999998
reversing small file (10MB) by "srohde" 8.8891863
reversing small file (10MB) by "Azat Ibrakov" 5.323388100000001
reversing large file (50MB) by "srohde" 186.5338368
reversing large file (50MB) by "Azat Ibrakov" 99.07450229999998

对于Windows 10上的CPython 3.5:

reversing empty file by "srohde" 3.600000000000001e-05
reversing empty file by "Azat Ibrakov" 4.519999999999958e-05
reversing tiny file (1MB) by "srohde" 0.01965560000000001
reversing tiny file (1MB) by "Azat Ibrakov" 0.019207699999999994
reversing small file (10MB) by "srohde" 3.1341862999999996
reversing small file (10MB) by "Azat Ibrakov" 3.0872588000000007
reversing large file (50MB) by "srohde" 82.01206720000002
reversing large file (50MB) by "Azat Ibrakov" 82.16775059999998

因此,我们可以看到它的性能像原始解决方案一样,但它更具通用性,并且没有上面列出的缺点。


广告

我已经将此添加到具有许多经过良好测试的功能/迭代实用程序0.3.0lz软件包版本(需要Python 3.5 +)中。

可以像

 import io
 from lz.iterating import reverse
 ...
 with open('path/to/file') as file:
     for line in reverse(file, batch_size=io.DEFAULT_BUFFER_SIZE):
         print(line)

它支持所有标准编码(可能utf-7是因为我很难定义一种策略来生成可编码的字符串)。

Accepted answer won’t work for cases with large files that won’t fit in memory (which is not a rare case).

As it was noted by others, @srohde answer looks good, but it has next issues:

  • openning file looks redundant, when we can pass file object & leave it to user to decide in which encoding it should be read,
  • even if we refactor to accept file object, it won’t work for all encodings: we can choose file with utf-8 encoding and non-ascii contents like

    й
    

    pass buf_size equal to 1 and will have

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: invalid start byte
    

    of course text may be larger but buf_size may be picked up so it’ll lead to obfuscated error like above,

  • we can’t specify custom line separator,
  • we can’t choose to keep line separator.

So considering all these concerns I’ve written separate functions:

  • one which works with byte streams,
  • second one which works with text streams and delegates its underlying byte stream to the first one and decodes resulting lines.

First of all let’s define next utility functions:

ceil_division for making division with ceiling (in contrast with standard // division with floor, more info can be found in this thread)

def ceil_division(left_number, right_number):
    """
    Divides given numbers with ceiling.
    """
    return -(-left_number // right_number)

split for splitting string by given separator from right end with ability to keep it:

def split(string, separator, keep_separator):
    """
    Splits given string by given separator.
    """
    parts = string.split(separator)
    if keep_separator:
        *parts, last_part = parts
        parts = [part + separator for part in parts]
        if last_part:
            return parts + [last_part]
    return parts

read_batch_from_end to read batch from the right end of binary stream

def read_batch_from_end(byte_stream, size, end_position):
    """
    Reads batch from the end of given byte stream.
    """
    if end_position > size:
        offset = end_position - size
    else:
        offset = 0
        size = end_position
    byte_stream.seek(offset)
    return byte_stream.read(size)

After that we can define function for reading byte stream in reverse order like

import functools
import itertools
import os
from operator import methodcaller, sub


def reverse_binary_stream(byte_stream, batch_size=None,
                          lines_separator=None,
                          keep_lines_separator=True):
    if lines_separator is None:
        lines_separator = (b'\r', b'\n', b'\r\n')
        lines_splitter = methodcaller(str.splitlines.__name__,
                                      keep_lines_separator)
    else:
        lines_splitter = functools.partial(split,
                                           separator=lines_separator,
                                           keep_separator=keep_lines_separator)
    stream_size = byte_stream.seek(0, os.SEEK_END)
    if batch_size is None:
        batch_size = stream_size or 1
    batches_count = ceil_division(stream_size, batch_size)
    remaining_bytes_indicator = itertools.islice(
            itertools.accumulate(itertools.chain([stream_size],
                                                 itertools.repeat(batch_size)),
                                 sub),
            batches_count)
    try:
        remaining_bytes_count = next(remaining_bytes_indicator)
    except StopIteration:
        return

    def read_batch(position):
        result = read_batch_from_end(byte_stream,
                                     size=batch_size,
                                     end_position=position)
        while result.startswith(lines_separator):
            try:
                position = next(remaining_bytes_indicator)
            except StopIteration:
                break
            result = (read_batch_from_end(byte_stream,
                                          size=batch_size,
                                          end_position=position)
                      + result)
        return result

    batch = read_batch(remaining_bytes_count)
    segment, *lines = lines_splitter(batch)
    yield from reverse(lines)
    for remaining_bytes_count in remaining_bytes_indicator:
        batch = read_batch(remaining_bytes_count)
        lines = lines_splitter(batch)
        if batch.endswith(lines_separator):
            yield segment
        else:
            lines[-1] += segment
        segment, *lines = lines
        yield from reverse(lines)
    yield segment

and finally a function for reversing text file can be defined like:

import codecs


def reverse_file(file, batch_size=None, 
                 lines_separator=None,
                 keep_lines_separator=True):
    encoding = file.encoding
    if lines_separator is not None:
        lines_separator = lines_separator.encode(encoding)
    yield from map(functools.partial(codecs.decode,
                                     encoding=encoding),
                   reverse_binary_stream(
                           file.buffer,
                           batch_size=batch_size,
                           lines_separator=lines_separator,
                           keep_lines_separator=keep_lines_separator))

Tests

Preparations

I’ve generated 4 files using fsutil command:

  1. empty.txt with no contents, size 0MB
  2. tiny.txt with size of 1MB
  3. small.txt with size of 10MB
  4. large.txt with size of 50MB

also I’ve refactored @srohde solution to work with file object instead of file path.

Test script

from timeit import Timer

repeats_count = 7
number = 1
create_setup = ('from collections import deque\n'
                'from __main__ import reverse_file, reverse_readline\n'
                'file = open("{}")').format
srohde_solution = ('with file:\n'
                   '    deque(reverse_readline(file,\n'
                   '                           buf_size=8192),'
                   '          maxlen=0)')
azat_ibrakov_solution = ('with file:\n'
                         '    deque(reverse_file(file,\n'
                         '                       lines_separator="\\n",\n'
                         '                       keep_lines_separator=False,\n'
                         '                       batch_size=8192), maxlen=0)')
print('reversing empty file by "srohde"',
      min(Timer(srohde_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing empty file by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))

Note: I’ve used collections.deque class to exhaust generator.

Outputs

For PyPy 3.5 on Windows 10:

reversing empty file by "srohde" 8.31e-05
reversing empty file by "Azat Ibrakov" 0.00016090000000000028
reversing tiny file (1MB) by "srohde" 0.160081
reversing tiny file (1MB) by "Azat Ibrakov" 0.09594989999999998
reversing small file (10MB) by "srohde" 8.8891863
reversing small file (10MB) by "Azat Ibrakov" 5.323388100000001
reversing large file (50MB) by "srohde" 186.5338368
reversing large file (50MB) by "Azat Ibrakov" 99.07450229999998

For CPython 3.5 on Windows 10:

reversing empty file by "srohde" 3.600000000000001e-05
reversing empty file by "Azat Ibrakov" 4.519999999999958e-05
reversing tiny file (1MB) by "srohde" 0.01965560000000001
reversing tiny file (1MB) by "Azat Ibrakov" 0.019207699999999994
reversing small file (10MB) by "srohde" 3.1341862999999996
reversing small file (10MB) by "Azat Ibrakov" 3.0872588000000007
reversing large file (50MB) by "srohde" 82.01206720000002
reversing large file (50MB) by "Azat Ibrakov" 82.16775059999998

So as we can see it performs like original solution, but is more general and free of its disadvantages listed above.


Advertisement

I’ve added this to 0.3.0 version of lz package (requires Python 3.5+) that have many well-tested functional/iterating utilities.

Can be used like

 import io
 from lz.iterating import reverse
 ...
 with open('path/to/file') as file:
     for line in reverse(file, batch_size=io.DEFAULT_BUFFER_SIZE):
         print(line)

It supports all standard encodings (maybe except utf-7 since it is hard for me to define a strategy for generating strings encodable with it).


回答 7

在这里,您可以找到我的实现,可以通过更改“ buffer”变量来限制ram的使用,这是一个程序在开始时打印空行的错误。

如果没有新行超过缓冲区字节,内存使用也会增加,“ leak”变量将一直增加,直到看到新行(“ \ n”)为止。

这也适用于大于我的总内存的16 GB文件。

import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()

division, remainder = divmod(filesize, buffer)
line_leak=''

for chunk_counter in range(1,division + 2):
    if division - chunk_counter < 0:
        f.seek(0, os.SEEK_SET)
        chunk = f.read(remainder)
    elif division - chunk_counter >= 0:
        f.seek(-(buffer*chunk_counter), os.SEEK_END)
        chunk = f.read(buffer)

    chunk_lines_reversed = list(reversed(chunk.split('\n')))
    if line_leak: # add line_leak from previous chunk to beginning
        chunk_lines_reversed[0] += line_leak

    # after reversed, save the leakedline for next chunk iteration
    line_leak = chunk_lines_reversed.pop()

    if chunk_lines_reversed:
        print "\n".join(chunk_lines_reversed)
    # print the last leaked line
    if division - chunk_counter < 0:
        print line_leak

Here you can find my my implementation, you can limit the ram usage by changing the “buffer” variable, there is a bug that the program prints an empty line in the beginning.

And also ram usage may be increase if there is no new lines for more than buffer bytes, “leak” variable will increase until seeing a new line (“\n”).

This is also working for 16 GB files which is bigger then my total memory.

import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()

division, remainder = divmod(filesize, buffer)
line_leak=''

for chunk_counter in range(1,division + 2):
    if division - chunk_counter < 0:
        f.seek(0, os.SEEK_SET)
        chunk = f.read(remainder)
    elif division - chunk_counter >= 0:
        f.seek(-(buffer*chunk_counter), os.SEEK_END)
        chunk = f.read(buffer)

    chunk_lines_reversed = list(reversed(chunk.split('\n')))
    if line_leak: # add line_leak from previous chunk to beginning
        chunk_lines_reversed[0] += line_leak

    # after reversed, save the leakedline for next chunk iteration
    line_leak = chunk_lines_reversed.pop()

    if chunk_lines_reversed:
        print "\n".join(chunk_lines_reversed)
    # print the last leaked line
    if division - chunk_counter < 0:
        print line_leak

回答 8

感谢您的回答@srohde。它使用“ is”运算符检查换行符时有一个小错误,并且我无法对信誉为1的答案进行评论。我也想管理外部文件的打开,因为这使我可以将杂物嵌入luigi任务。

我需要更改的形式如下:

with open(filename) as fp:
    for line in fp:
        #print line,  # contains new line
        print '>{}<'.format(line)

我想更改为:

with open(filename) as fp:
    for line in reversed_fp_iter(fp, 4):
        #print line,  # contains new line
        print '>{}<'.format(line)

这是修改后的答案,需要文件句柄并保留换行符:

def reversed_fp_iter(fp, buf_size=8192):
    """a generator that returns the lines of a file in reverse order
    ref: https://stackoverflow.com/a/23646049/8776239
    """
    segment = None  # holds possible incomplete segment at the beginning of the buffer
    offset = 0
    fp.seek(0, os.SEEK_END)
    file_size = remaining_size = fp.tell()
    while remaining_size > 0:
        offset = min(file_size, offset + buf_size)
        fp.seek(file_size - offset)
        buffer = fp.read(min(remaining_size, buf_size))
        remaining_size -= buf_size
        lines = buffer.splitlines(True)
        # the first line of the buffer is probably not a complete line so
        # we'll save it and append it to the last line of the next buffer
        # we read
        if segment is not None:
            # if the previous chunk starts right from the beginning of line
            # do not concat the segment to the last line of new chunk
            # instead, yield the segment first
            if buffer[-1] == '\n':
                #print 'buffer ends with newline'
                yield segment
            else:
                lines[-1] += segment
                #print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
        segment = lines[0]
        for index in range(len(lines) - 1, 0, -1):
            if len(lines[index]):
                yield lines[index]
    # Don't yield None if the file was empty
    if segment is not None:
        yield segment

Thanks for the answer @srohde. It has a small bug checking for newline character with ‘is’ operator, and I could not comment on the answer with 1 reputation. Also I’d like to manage file open outside because that enables me to embed my ramblings for luigi tasks.

What I needed to change has the form:

with open(filename) as fp:
    for line in fp:
        #print line,  # contains new line
        print '>{}<'.format(line)

I’d love to change to:

with open(filename) as fp:
    for line in reversed_fp_iter(fp, 4):
        #print line,  # contains new line
        print '>{}<'.format(line)

Here is a modified answer that wants a file handle and keeps newlines:

def reversed_fp_iter(fp, buf_size=8192):
    """a generator that returns the lines of a file in reverse order
    ref: https://stackoverflow.com/a/23646049/8776239
    """
    segment = None  # holds possible incomplete segment at the beginning of the buffer
    offset = 0
    fp.seek(0, os.SEEK_END)
    file_size = remaining_size = fp.tell()
    while remaining_size > 0:
        offset = min(file_size, offset + buf_size)
        fp.seek(file_size - offset)
        buffer = fp.read(min(remaining_size, buf_size))
        remaining_size -= buf_size
        lines = buffer.splitlines(True)
        # the first line of the buffer is probably not a complete line so
        # we'll save it and append it to the last line of the next buffer
        # we read
        if segment is not None:
            # if the previous chunk starts right from the beginning of line
            # do not concat the segment to the last line of new chunk
            # instead, yield the segment first
            if buffer[-1] == '\n':
                #print 'buffer ends with newline'
                yield segment
            else:
                lines[-1] += segment
                #print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
        segment = lines[0]
        for index in range(len(lines) - 1, 0, -1):
            if len(lines[index]):
                yield lines[index]
    # Don't yield None if the file was empty
    if segment is not None:
        yield segment

回答 9

一个简单的函数来创建另一个反转的文件(仅适用于Linux):

import os
def tac(file1, file2):
     print(os.system('tac %s > %s' % (file1,file2)))

如何使用

tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')

a simple function to create a second file reversed (linux only):

import os
def tac(file1, file2):
     print(os.system('tac %s > %s' % (file1,file2)))

how to use

tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')

回答 10

如果您担心文件大小/内存使用情况,则可以对文件进行内存映射并向后扫描换行符,这是一种解决方案:

如何在文本文件中搜索字符串?

If you are concerned about file size / memory usage, memory-mapping the file and scanning backwards for newlines is a solution:

How to search for a string in text files?


回答 11

使用open(“ filename”)as f:

    print(f.read()[::-1])

with open(“filename”) as f:

    print(f.read()[::-1])

回答 12

def reverse_lines(filename):
    y=open(filename).readlines()
    return y[::-1]
def reverse_lines(filename):
    y=open(filename).readlines()
    return y[::-1]

回答 13

with处理文件时请务必使用,因为它可以为您处理所有事情:

with open('filename', 'r') as f:
    for line in reversed(f.readlines()):
        print line

或在Python 3中:

with open('filename', 'r') as f:
    for line in reversed(list(f.readlines())):
        print(line)

Always use with when working with files as it handles everything for you:

with open('filename', 'r') as f:
    for line in reversed(f.readlines()):
        print line

Or in Python 3:

with open('filename', 'r') as f:
    for line in reversed(list(f.readlines())):
        print(line)

回答 14

您需要首先以读取格式打开文件,将其保存到变量,然后以写入格式打开第二个文件,在该文件中您将使用[::-1]切片来写入或附加变量,从而完全反转文件。您还可以使用readlines()使其成为一行列表,您可以对其进行操作

def copy_and_reverse(filename, newfile):
    with open(filename) as file:
        text = file.read()
    with open(newfile, "w") as file2:
        file2.write(text[::-1])

you would need to first open your file in read format, save it to a variable, then open the second file in write format where you would write or append the variable using a the [::-1] slice, completely reversing the file. You can also use readlines() to make it into a list of lines, which you can manipulate

def copy_and_reverse(filename, newfile):
    with open(filename) as file:
        text = file.read()
    with open(newfile, "w") as file2:
        file2.write(text[::-1])

回答 15

大多数答案都需要在执行任何操作之前先读取整个文件。此样本从头开始读取越来越大的样本

在编写此答案时,我仅看到MuratYükselen的答案。几乎一样,我认为这是一件好事。下面的示例还处理\ r,并在每个步骤中增加其缓冲区大小。我也有一些单元测试来备份此代码。

def readlines_reversed(f):
    """ Iterate over the lines in a file in reverse. The file must be
    open in 'rb' mode. Yields the lines unencoded (as bytes), including the
    newline character. Produces the same result as readlines, but reversed.
    If this is used to reverse the line in a file twice, the result is
    exactly the same.
    """
    head = b""
    f.seek(0, 2)
    t = f.tell()
    buffersize, maxbuffersize = 64, 4096
    while True:
        if t <= 0:
            break
        # Read next block
        buffersize = min(buffersize * 2, maxbuffersize)
        tprev = t
        t = max(0, t - buffersize)
        f.seek(t)
        lines = f.read(tprev - t).splitlines(True)
        # Align to line breaks
        if not lines[-1].endswith((b"\n", b"\r")):
            lines[-1] += head  # current tail is previous head
        elif head == b"\n" and lines[-1].endswith(b"\r"):
            lines[-1] += head  # Keep \r\n together
        elif head:
            lines.append(head)
        head = lines.pop(0)  # can be '\n' (ok)
        # Iterate over current block in reverse
        for line in reversed(lines):
            yield line
    if head:
        yield head

Most of the answers need to read the whole file before doing anything. This sample reads increasingly large samples from the end.

I only saw Murat Yükselen’s answer while writing this answer. It’s nearly the same, which I suppose is a good thing. The sample below also deals with \r and increases its buffersize at each step. I also have some unit tests to back this code up.

def readlines_reversed(f):
    """ Iterate over the lines in a file in reverse. The file must be
    open in 'rb' mode. Yields the lines unencoded (as bytes), including the
    newline character. Produces the same result as readlines, but reversed.
    If this is used to reverse the line in a file twice, the result is
    exactly the same.
    """
    head = b""
    f.seek(0, 2)
    t = f.tell()
    buffersize, maxbuffersize = 64, 4096
    while True:
        if t <= 0:
            break
        # Read next block
        buffersize = min(buffersize * 2, maxbuffersize)
        tprev = t
        t = max(0, t - buffersize)
        f.seek(t)
        lines = f.read(tprev - t).splitlines(True)
        # Align to line breaks
        if not lines[-1].endswith((b"\n", b"\r")):
            lines[-1] += head  # current tail is previous head
        elif head == b"\n" and lines[-1].endswith(b"\r"):
            lines[-1] += head  # Keep \r\n together
        elif head:
            lines.append(head)
        head = lines.pop(0)  # can be '\n' (ok)
        # Iterate over current block in reverse
        for line in reversed(lines):
            yield line
    if head:
        yield head

回答 16

逐行读取文件,然后以相反顺序将其添加到列表中。

这是代码示例:

reverse = []
with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()
         reverse[0:0] = line

Read the file line by line and then add it on a list in reverse order.

Here is an example of code :

reverse = []
with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()
         reverse[0:0] = line

回答 17

import sys
f = open(sys.argv[1] , 'r')
for line in f.readlines()[::-1]:
    print line
import sys
f = open(sys.argv[1] , 'r')
for line in f.readlines()[::-1]:
    print line

回答 18

def previous_line(self, opened_file):
        opened_file.seek(0, os.SEEK_END)
        position = opened_file.tell()
        buffer = bytearray()
        while position >= 0:
            opened_file.seek(position)
            position -= 1
            new_byte = opened_file.read(1)
            if new_byte == self.NEW_LINE:
                parsed_string = buffer.decode()
                yield parsed_string
                buffer = bytearray()
            elif new_byte == self.EMPTY_BYTE:
                continue
            else:
                new_byte_array = bytearray(new_byte)
                new_byte_array.extend(buffer)
                buffer = new_byte_array
        yield None

使用:

opened_file = open(filepath, "rb")
iterator = self.previous_line(opened_file)
line = next(iterator) #one step
close(opened_file)
def previous_line(self, opened_file):
        opened_file.seek(0, os.SEEK_END)
        position = opened_file.tell()
        buffer = bytearray()
        while position >= 0:
            opened_file.seek(position)
            position -= 1
            new_byte = opened_file.read(1)
            if new_byte == self.NEW_LINE:
                parsed_string = buffer.decode()
                yield parsed_string
                buffer = bytearray()
            elif new_byte == self.EMPTY_BYTE:
                continue
            else:
                new_byte_array = bytearray(new_byte)
                new_byte_array.extend(buffer)
                buffer = new_byte_array
        yield None

to use:

opened_file = open(filepath, "rb")
iterator = self.previous_line(opened_file)
line = next(iterator) #one step
close(opened_file)

回答 19

我不得不在一段时间前使用以下代码。它通过管道传递给外壳。恐怕我没有完整的脚本了。如果您使用的是unixish操作系统,则可以使用“ tac”,但是,例如在Mac OSX上,tac命令不起作用,请使用tail -r。以下代码段测试了您所使用的平台,并相应地调整了命令

# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.

if sys.platform == "darwin":
    command += "|tail -r"
elif sys.platform == "linux2":
    command += "|tac"
else:
    raise EnvironmentError('Platform %s not supported' % sys.platform)

I had to do this some time ago and used the below code. It pipes to the shell. I am afraid i do not have the complete script anymore. If you are on a unixish operating system, you can use “tac”, however on e.g. Mac OSX tac command does not work, use tail -r. The below code snippet tests for which platform you’re on, and adjusts the command accordingly

# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.

if sys.platform == "darwin":
    command += "|tail -r"
elif sys.platform == "linux2":
    command += "|tac"
else:
    raise EnvironmentError('Platform %s not supported' % sys.platform)

有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付