标签归档:io

Python的file.flush()到底在做什么?

问题:Python的file.flush()到底在做什么?

我在Python 文档的File Objects中找到了这个:

flush()不一定会将文件的数据写入磁盘。使用flush()和os.fsync()来确保此行为。

所以我的问题是:Python到底在flush做什么?我以为这会强制将数据写入磁盘,但现在我发现并没有。为什么?

I found this in the Python documentation for File Objects:

flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.

So my question is: what exactly is Python’s flush doing? I thought that it forces to write data to the disk, but now I see that it doesn’t. Why?


回答 0

通常涉及两个级别的缓冲:

  1. 内部缓冲器
  2. 操作系统缓冲区

内部缓冲区是由您针对其进行编程的运行时/库/语言创建的缓冲区,其目的是通过避免每次写入都调用系统来加快处理速度。取而代之的是,当您写入文件对象时,您将写入其缓冲区,并且只要缓冲区被填满,就会使用系统调用将数据写入实际文件。

但是,由于操作系统缓冲区的原因,这可能并不意味着数据已写入disk。这可能仅意味着将数据从运行时维护的缓冲区复制到操作系统维护的缓冲区。

如果您写了一些东西,并且它最终在缓冲区中(仅),并且切断了计算机的电源,则当计算机关闭时,该数据将不在磁盘上。

因此,为了帮助您在各自的对象上使用flushfsync方法。

第一个flush会简单地将程序缓冲区中残留的所有数据写到实际文件中。通常,这意味着数据将从程序缓冲区复制到操作系统缓冲区。

具体来说,这意味着如果另一个进程打开了要读取的相同文件,它将能够访问刚刷新到该文件的数据。但是,这不一定意味着它已“永久”存储在磁盘上。

为此,您需要调用os.fsync确保所有操作系统缓冲区与它们所使用的存储设备同步的方法,换句话说,该方法会将数据从操作系统缓冲区复制到磁盘。

通常,您无需为这两种方法烦恼,但是,如果您对磁盘上实际存储的内容抱有偏执是好事,则应按照说明进行两次调用。


2018年补遗。

请注意,具有缓存机制的磁盘现在比2013年更加普遍,因此现在涉及的缓存和缓冲区级别更高。我认为这些缓冲区也将由sync / flush调用处理,但我真的不知道。

There’s typically two levels of buffering involved:

  1. Internal buffers
  2. Operating system buffers

The internal buffers are buffers created by the runtime/library/language that you’re programming against and is meant to speed things up by avoiding system calls for every write. Instead, when you write to a file object, you write into its buffer, and whenever the buffer fills up, the data is written to the actual file using system calls.

However, due to the operating system buffers, this might not mean that the data is written to disk. It may just mean that the data is copied from the buffers maintained by your runtime into the buffers maintained by the operating system.

If you write something, and it ends up in the buffer (only), and the power is cut to your machine, that data is not on disk when the machine turns off.

So, in order to help with that you have the flush and fsync methods, on their respective objects.

The first, flush, will simply write out any data that lingers in a program buffer to the actual file. Typically this means that the data will be copied from the program buffer to the operating system buffer.

Specifically what this means is that if another process has that same file open for reading, it will be able to access the data you just flushed to the file. However, it does not necessarily mean it has been “permanently” stored on disk.

To do that, you need to call the os.fsync method which ensures all operating system buffers are synchronized with the storage devices they’re for, in other words, that method will copy data from the operating system buffers to the disk.

Typically you don’t need to bother with either method, but if you’re in a scenario where paranoia about what actually ends up on disk is a good thing, you should make both calls as instructed.


Addendum in 2018.

Note that disks with cache mechanisms is now much more common than back in 2013, so now there are even more levels of caching and buffers involved. I assume these buffers will be handled by the sync/flush calls as well, but I don’t really know.


回答 1

因为操作系统可能不会这样做。刷新操作将文件数据强制进入RAM中的文件缓存,然后从那里开始,操作系统的工作就是将其实际发送到磁盘。

Because the operating system may not do so. The flush operation forces the file data into the file cache in RAM, and from there it’s the OS’s job to actually send it to the disk.


回答 2

它刷新内部缓冲区,这应该导致操作系统将缓冲区写出到文件中。[1] 除非您另行配置,否则Python使用操作系统的默认缓冲。

但是有时OS仍然选择不合作。尤其是在Windows / NTFS中具有诸如写入延迟之类的奇妙功能。基本上清除了内部缓冲区,但OS缓冲区仍保持不变。因此,os.fsync()在这种情况下,您必须告诉操作系统将其写入磁盘。

[1] http://docs.python.org/library/stdtypes.html

It flushes the internal buffer, which is supposed to cause the OS to write out the buffer to the file.[1] Python uses the OS’s default buffering unless you configure it do otherwise.

But sometimes the OS still chooses not to cooperate. Especially with wonderful things like write-delays in Windows/NTFS. Basically the internal buffer is flushed, but the OS buffer is still holding on to it. So you have to tell the OS to write it to disk with os.fsync() in those cases.

[1] http://docs.python.org/library/stdtypes.html


回答 3

基本上,flush()清除RAM缓冲区,其真正功能是让您随后继续写入它-但不应将其视为最佳/最安全的文件写入功能。这将冲刷您的RAM,以获取更多数据,仅此而已。如果要确保安全地将数据写入文件,请改用close()。

Basically, flush() cleans out your RAM buffer, its real power is that it lets you continue to write to it afterwards – but it shouldn’t be thought of as the best/safest write to file feature. It’s flushing your RAM for more data to come, that is all. If you want to ensure data gets written to file safely then use close() instead.


ValueError:对关闭的文件进行I / O操作

问题:ValueError:对关闭的文件进行I / O操作

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

for w, c in p.items():
    cwriter.writerow(w + c)

这里,p是一本字典,w并且c都是字符串。

当我尝试写入文件时,它报告错误:

ValueError: I/O operation on closed file.
import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

for w, c in p.items():
    cwriter.writerow(w + c)

Here, p is a dictionary, w and c both are strings.

When I try to write to the file it reports the error:

ValueError: I/O operation on closed file.

回答 0

正确缩进;您的for陈述应在with区块内:

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

    for w, c in p.items():
        cwriter.writerow(w + c)

with块外部,文件已关闭。

>>> with open('/tmp/1', 'w') as f:
...     print(f.closed)
... 
False
>>> print(f.closed)
True

Indent correctly; your for statement should be inside the with block:

import csv    

with open('v.csv', 'w') as csvfile:
    cwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)

    for w, c in p.items():
        cwriter.writerow(w + c)

Outside the with block, the file is closed.

>>> with open('/tmp/1', 'w') as f:
...     print(f.closed)
... 
False
>>> print(f.closed)
True

回答 1

混合使用:制表符+空格会引起相同的错误

with open('/foo', 'w') as f:
 (spaces OR  tab) print f       <-- success
 (spaces AND tab) print f       <-- fail

Same error can raise by mixing: tabs + spaces.

with open('/foo', 'w') as f:
 (spaces OR  tab) print f       <-- success
 (spaces AND tab) print f       <-- fail

Python逐行写入CSV

问题:Python逐行写入CSV

我有通过http请求访问的数据,并由服务器以逗号分隔的格式发送回去,我有以下代码:

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

文本内容如下:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

如何将这些数据保存到CSV文件中。我知道我可以按照以下步骤做一些事情,逐行进行迭代:

import StringIO
s = StringIO.StringIO(text)
for line in s:

但是我不确定现在如何正确地将每一行写入CSV

编辑—>感谢您提供的反馈,该解决方案非常简单,可以在下面看到。

解:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

I have data which is being accessed via http request and is sent back by the server in a comma separated format, I have the following code :

site= 'www.example.com'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
soup = soup.get_text()
text=str(soup)

The content of text is as follows:

april,2,5,7
may,3,5,8
june,4,7,3
july,5,6,9

How can I save this data into a CSV file. I know I can do something along the lines of the following to iterate line by line:

import StringIO
s = StringIO.StringIO(text)
for line in s:

But i’m unsure how to now properly write each line to CSV

EDIT—> Thanks for the feedback as suggested the solution was rather simple and can be seen below.

Solution:

import StringIO
s = StringIO.StringIO(text)
with open('fileName.csv', 'w') as f:
    for line in s:
        f.write(line)

回答 0

一般方式:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

要么

使用CSV编写器:

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

要么

最简单的方法:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

General way:

##text=List of strings to be written to file
with open('csvfile.csv','wb') as file:
    for line in text:
        file.write(line)
        file.write('\n')

OR

Using CSV writer :

import csv
with open(<path to output_csv>, "wb") as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for line in data:
            writer.writerow(line)

OR

Simplest way:

f = open('csvfile.csv','w')
f.write('hi there\n') #Give your csv text here.
## Python will convert \n to os.linesep
f.close()

回答 1

您可以像写入任何普通文件一样直接写入文件。

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

如果以防万一,它是一个列表列表,您可以直接使用内置csv模块

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

You could just write to the file as you would write any normal file.

with open('csvfile.csv','wb') as file:
    for l in text:
        file.write(l)
        file.write('\n')

If just in case, it is a list of lists, you could directly use built-in csv module

import csv

with open("csvfile.csv", "wb") as file:
    writer = csv.writer(file)
    writer.writerows(text)

回答 2

我只需将每一行写入文件,因为它已经是CSV格式:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

我现在不记得如何写带有换行符的行,尽管:p

此外,你可能想看看这个答案write()writelines()'\n'

I would simply write each line to a file, since it’s already in a CSV format:

write_file = "output.csv"
with open(write_file, "w") as output:
    for line in text:
        output.write(line + '\n')

I can’t recall how to write lines with line-breaks at the moment, though :p

Also, you might like to take a look at this answer about write(), writelines(), and '\n'.


回答 3

为了补充前面的答案,我快速上了一堂课来写CSV文件。如果您必须处理多个文件,它可以更轻松地管理和关闭打开的文件,并实现一致性和更简洁的代码。

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

用法示例:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

玩得开心

To complement the previous answers, I whipped up a quick class to write to CSV files. It makes it easier to manage and close open files and achieve consistency and cleaner code if you have to deal with multiple files.

class CSVWriter():

    filename = None
    fp = None
    writer = None

    def __init__(self, filename):
        self.filename = filename
        self.fp = open(self.filename, 'w', encoding='utf8')
        self.writer = csv.writer(self.fp, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')

    def close(self):
        self.fp.close()

    def write(self, elems):
        self.writer.writerow(elems)

    def size(self):
        return os.path.getsize(self.filename)

    def fname(self):
        return self.filename

Example usage:

mycsv = CSVWriter('/tmp/test.csv')
mycsv.write((12,'green','apples'))
mycsv.write((7,'yellow','bananas'))
mycsv.close()
print("Written %d bytes to %s" % (mycsv.size(), mycsv.fname()))

Have fun


回答 4

那这个呢:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join()返回一个字符串,该字符串是可迭代的字符串的串联。元素之间的分隔符是提供此方法的字符串。

What about this:

with open("your_csv_file.csv", "w") as f:
    f.write("\n".join(text))

str.join() Return a string which is the concatenation of the strings in iterable. The separator between elements is the string providing this method.


为什么我不能在打开的文件上两次调用read()?

问题:为什么我不能在打开的文件上两次调用read()?

对于我正在做的练习,我试图使用read()方法两次读取给定文件的内容。奇怪的是,当我第二次调用它时,似乎没有将文件内容返回为字符串?

这是代码

f = f.open()

# get the year
match = re.search(r'Popularity in (\d+)', f.read())

if match:
  print match.group(1)

# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())

if matches:
  # matches is always None

我当然知道这不是最有效或最好的方法,这不是重点。问题是,为什么我不能打read()两次电话?我是否需要重置文件句柄?还是关闭/重新打开文件以执行此操作?

For an exercise I’m doing, I’m trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn’t seem to return the file content as a string?

Here’s the code

f = f.open()

# get the year
match = re.search(r'Popularity in (\d+)', f.read())

if match:
  print match.group(1)

# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())

if matches:
  # matches is always None

Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can’t I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?


回答 0

调用read()将读取整个文件,并将读取的游标留在文件的末尾(仅读取其他内容)。如果您希望一次阅读一定数量的行,则可以使用readline()readlines()或使用 遍历行for line in handle:

要直接回答您的问题,请在读取文件后read()使用seek(0),将读取的光标返回到文件的开头(文档在此处)。如果您知道文件不会太大,也可以将read()输出保存到变量中,并在findall表达式中使用它。

附言 完成操作后,不要忘记关闭文件;)

Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.

To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn’t going to be too large, you can also save the read() output to a variable, using it in your findall expressions.

Ps. Dont forget to close the file after you are done with it ;)


回答 1

是的,如上所述

我只写一个例子:

>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output

yeah, as above…

i’ll write just an example:

>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output

回答 2

到目前为止,回答此问题的每个人都是绝对正确的- read()遍历文件,因此在调用该文件后,就无法再次调用它。

我要补充的是,在您的特定情况下,您无需重新查找文件或重新打开文件,您只需将已阅读的文本存储在局部变量中,然后使用两次,或者在您的程序中进行任意多次:

f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
  print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
  # matches will now not always be None

Everyone who has answered this question so far is absolutely right – read() moves through the file, so after you’ve called it, you can’t call it again.

What I’ll add is that in your particular case, you don’t need to seek back to the start or reopen the file, you can just store the text that you’ve read in a local variable, and use it twice, or as many times as you like, in your program:

f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
  print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
  # matches will now not always be None

回答 3

读指针移动到最后一个读字节/字符之后。使用该seek()方法将读取的指针后退到开头。

The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.


回答 4

每个打开的文件都有一个关联的位置。
当您读取()时,您将从该位置读取。例如read(10),从一个新打开的文件中读取前10个字节,然后另一个read(10)读取后10个字节。 read()不带参数的文件将读取文件的所有内容,而将文件位置保留在文件末尾。下次调用时read(),没有任何内容可供阅读。

您可以seek用来移动文件位置。或者在您的情况下更好的方法是做一个read()并保留两个搜索的结果。

Every open file has an associated position.
When you read() you read from that position. For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes. read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.

You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.


回答 5

read() 消耗。因此,您可以重设文件,或在重新读取之前寻求开始。或者,如果它read(n)适合您的任务,则可以用来仅消耗n字节。

read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.


回答 6

我总是发现读取方法有点像在黑暗的小巷中漫步。您会停下来停下来,但是如果您不计算步数,则不确定您走了多远。Seek通过重新定位来提供解决方案,另一个选项是Tell,它返回沿文件的位置。可能是Python文件api可以将读取和查找合并为一个read_from(position,bytes)以使其更简单-直到发生这种情况,您应该阅读此页面

I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler – till that happens you should read this page.


使用熊猫从txt加载数据

问题:使用熊猫从txt加载数据

我正在加载一个包含浮点和字符串数据混合的txt文件。我想将它们存储在可以访问每个元素的数组中。现在我正在做

import pandas as pd

data = pd.read_csv('output_list.txt', header = None)
print data

这是输入文件的结构:1 0 2000.0 70.2836942112 1347.28369421 /file_address.txt

现在,数据将作为唯一列导入。我如何划分它,以便分别存储不同的元素(所以我可以调用data[i,j])?以及如何定义标题?

I am loading a txt file containig a mix of float and string data. I want to store them in an array where I can access each element. Now I am just doing

import pandas as pd

data = pd.read_csv('output_list.txt', header = None)
print data

This is the structure of the input file: 1 0 2000.0 70.2836942112 1347.28369421 /file_address.txt.

Now the data are imported as a unique column. How can I divide it, so to store different elements separately (so I can call data[i,j])? And how can I define a header?


回答 0

您可以使用:

data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

添加sep=" "您的代码,在引号之间留一个空格。因此,熊猫可以检测值之间的空格并按列排序。数据列用于命名您的列。

You can use:

data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["a", "b", "c", "etc."]

Add sep=" " in your code, leaving a blank space between the quotes. So pandas can detect spaces between values and sort in columns. Data columns is for naming your columns.


回答 1

我想补充上面的答案,你可以直接使用

df = pd.read_fwf('output_list.txt')

fwf代表固定宽度的格式化行。

I’d like to add to the above answers, you could directly use

df = pd.read_fwf('output_list.txt')

fwf stands for fixed width formatted lines.


回答 2

@Pietrovismara的解决方案是正确的,但我只想添加:可以使用pd.read_csv来执行此操作,而不必使用单独的行来添加列名称。

df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])

@Pietrovismara’s solution is correct but I’d just like to add: rather than having a separate line to add column names, it’s possible to do this from pd.read_csv.

df = pd.read_csv('output_list.txt', sep=" ", header=None, names=["a", "b", "c"])

回答 3

你可以用这个

import pandas as pd
dataset=pd.read_csv("filepath.txt",delimiter="\t")

you can use this

import pandas as pd
dataset=pd.read_csv("filepath.txt",delimiter="\t")

回答 4

如果您没有为数据分配索引,并且不确定间距是多少,可以使用让熊猫分配索引并查找多个空格。

df = pd.read_csv('filename.txt', delimiter= '\s+', index_col=False)

If you don’t have an index assigned to the data and you are not sure what the spacing is, you can use to let pandas assign an index and look for multiple spaces.

df = pd.read_csv('filename.txt', delimiter= '\s+', index_col=False)

回答 5

您可以这样做:

import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")

(例如df = pd.read_csv(’F:\ Desktop \ ds \ text.txt’,分隔符=“ \ t”)

You can do as:

import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")

(like, df = pd.read_csv(‘F:\Desktop\ds\text.txt’, delimiter = “\t”)


回答 6

根据熊猫的最新更改,您可以使用read_csv,不建议使用read_table:

import pandas as pd
pd.read_csv("file.txt", sep = "\t")

Based on the latest changes in pandas, you can use, read_csv , read_table is deprecated:

import pandas as pd
pd.read_csv("file.txt", sep = "\t")

回答 7

您可以使用read_table命令导入文本文件,如下所示:

import pandas as pd
df=pd.read_table('output_list.txt',header=None)

加载后需要进行预处理

You can import the text file using the read_table command as so:

import pandas as pd
df=pd.read_table('output_list.txt',header=None)

Preprocessing will need to be done after loading


回答 8

通常,我通常先看一下数据,或者只是尝试将其导入并执行data.head(),如果看到列之间用\ t分隔,则应指定sep="\t"否则sep = " "

import pandas as pd     
data = pd.read_csv('data.txt', sep=" ", header=None)

I usually take a look at the data first or just try to import it and do data.head(), if you see that the columns are separated with \t then you should specify sep="\t" otherwise, sep = " ".

import pandas as pd     
data = pd.read_csv('data.txt', sep=" ", header=None)

被python文件模式“ w +”混淆

问题:被python文件模式“ w +”混淆

文档

模式“ r +”,“ w +”和“ a +”打开文件进行更新(请注意,“ w +”会截断文件)。在区分二进制文件和文本文件的系统上,将’b’追加到以二进制模式打开文件的模式;在没有此区别的系统上,添加“ b”无效。

w +:打开一个文件进行读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

但是,如何读取打开的文件w+

From the doc,

Modes ‘r+’, ‘w+’ and ‘a+’ open the file for updating (note that ‘w+’ truncates the file). Append ‘b’ to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the ‘b’ has no effect.

and here

w+ : Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

But, how to read a file open with w+?


回答 0

假设您打开的文件带有with应有的声明。然后,您将执行以下操作以从文件中读取内容:

with open('somefile.txt', 'w+') as f:
    # Note that f has now been truncated to 0 bytes, so you'll only
    # be able to read data that you write after this point
    f.write('somedata\n')
    f.seek(0)  # Important: return to the top of the file before reading, otherwise you'll just read an empty string
    data = f.read() # Returns 'somedata\n'

请注意f.seek(0)-如果您忘记了这一点,则该f.read()调用将尝试从文件末尾读取,并将返回一个空字符串。

Let’s say you’re opening the file with a with statement like you should be. Then you’d do something like this to read from your file:

with open('somefile.txt', 'w+') as f:
    # Note that f has now been truncated to 0 bytes, so you'll only
    # be able to read data that you write after this point
    f.write('somedata\n')
    f.seek(0)  # Important: return to the top of the file before reading, otherwise you'll just read an empty string
    data = f.read() # Returns 'somedata\n'

Note the f.seek(0) — if you forget this, the f.read() call will try to read from the end of the file, and will return an empty string.


回答 1

这是打开文件的不同模式的列表:

  • [R

    打开一个文件以供只读。文件指针放置在文件的开头。这是默认模式。

  • rb

    打开文件以仅以二进制格式读取。文件指针放置在文件的开头。这是默认模式。

  • r +

    打开一个文件进行读取和写入。文件指针将位于文件的开头。

  • rb +

    打开一个文件,以二进制格式读取和写入。文件指针将位于文件的开头。

  • w

    打开仅用于写入的文件。如果文件存在,则覆盖该文件。如果该文件不存在,则创建一个新文件进行写入。

  • b

    打开一个文件,仅以二进制格式写入。如果文件存在,则覆盖该文件。如果该文件不存在,则创建一个新文件进行写入。

  • w +

    打开一个文件进行读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

  • wb +

    打开一个文件以进行二进制格式的读写。如果文件存在,则覆盖现有文件。如果该文件不存在,请创建一个新文件以进行读写。

  • 一个

    打开一个文件进行追加。如果文件存在,则文件指针位于文件的末尾。也就是说,文件处于附加模式。如果该文件不存在,它将创建一个新文件进行写入。

  • b

    打开文件以二进制格式追加。如果文件存在,则文件指针位于文件的末尾。也就是说,文件处于附加模式。如果该文件不存在,它将创建一个新文件进行写入。

  • a +

    打开文件以进行追加和读取。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。如果该文件不存在,它将创建一个用于读取和写入的新文件。

  • ab +

    打开一个文件,以便以二进制格式追加和读取。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。如果该文件不存在,它将创建一个用于读取和写入的新文件。

Here is a list of the different modes of opening a file:

  • r

    Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.

  • rb

    Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.

  • r+

    Opens a file for both reading and writing. The file pointer will be at the beginning of the file.

  • rb+

    Opens a file for both reading and writing in binary format. The file pointer will be at the beginning of the file.

  • w

    Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

  • wb

    Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.

  • w+

    Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

  • wb+

    Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.

  • a

    Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

  • ab

    Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.

  • a+

    Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.

  • ab+

    Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.


回答 2

Python中的所有文件模式

  • r 阅读
  • r+ 打开以进行读写(无法截断文件)
  • w 用于写作
  • w+ 用于读写(可以截断文件)
  • rb用于读取二进制文件。文件指针放置在文件的开头。
  • rb+ 读取或写入二进制文件
  • wb+ 编写二进制文件
  • a+ 打开进行追加
  • ab+打开一个文件,以附加和读取二进制文件。如果文件存在,则文件指针位于文件的末尾。该文件以追加模式打开。
  • x 打开以进行独占创建,如果文件已存在则失败(Python 3)

All file modes in Python

  • r for reading
  • r+ opens for reading and writing (cannot truncate a file)
  • w for writing
  • w+ for writing and reading (can truncate a file)
  • rb for reading a binary file. The file pointer is placed at the beginning of the file.
  • rb+ reading or writing a binary file
  • wb+ writing a binary file
  • a+ opens for appending
  • ab+ Opens a file for both appending and reading in binary. The file pointer is at the end of the file if the file exists. The file opens in the append mode.
  • x open for exclusive creation, failing if the file already exists (Python 3)

回答 3

r 供阅读

w

r+ 用于读/写而不删除原始内容(如果文件存在),否则引发异常

w+ 用于删除原始内容,然后读取/写入(如果文件存在),否则创建文件

例如,

>>> with open("file1.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file1.txt", "w+") as f:
...   f.write("c")
... 

$ cat file1.txt 
c$
>>> with open("file2.txt", "r+") as f:
...   f.write("ab\n")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'file2.txt'
>>> with open("file2.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file2.txt", "r+") as f:
...   f.write("c")
... 

$ cat file2.txt 
cb
$

r for read

w for write

r+ for read/write without deleting the original content if file exists, otherwise raise exception

w+ for delete the original content then read/write if file exists, otherwise create the file

For example,

>>> with open("file1.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file1.txt", "w+") as f:
...   f.write("c")
... 

$ cat file1.txt 
c$
>>> with open("file2.txt", "r+") as f:
...   f.write("ab\n")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'file2.txt'

>>> with open("file2.txt", "w") as f:
...   f.write("ab\n")
... 
>>> with open("file2.txt", "r+") as f:
...   f.write("c")
... 

$ cat file2.txt 
cb
$

回答 4

该文件被截断,因此您可以调用read()(不会引发任何异常,与使用’w’打开时不同),但是您会得到一个空字符串。

The file is truncated, so you can call read() (no exceptions raised, unlike when opened using ‘w’) but you’ll get an empty string.


回答 5

我怀疑有两种方法可以处理您认为要达到的目标。

1)很明显,就是打开文件以供只读,将其读入内存,然后用t打开文件,然后写入更改。

2)使用低级文件处理例程:

# Open file in RW , create if it doesn't exist. *Don't* pass O_TRUNC
 fd = os.open(filename, os.O_RDWR | os.O_CREAT)

希望这可以帮助..

I suspect there are two ways to handle what I think you’r trying to achieve.

1) which is obvious, is open the file for reading only, read it into memory then open the file with t, then write your changes.

2) use the low level file handling routines:

# Open file in RW , create if it doesn't exist. *Don't* pass O_TRUNC
 fd = os.open(filename, os.O_RDWR | os.O_CREAT)

Hope this helps..


回答 6

实际上,关于r+模式的所有其他答案都有问题。

test.in 文件内容:

hello1
ok2
byebye3

和py脚本的:

with open("test.in", 'r+')as f:
    f.readline()
    f.write("addition")

执行它,test.in的内容将更改为:

hello1
ok2
byebye3
addition

但是,当我们将脚本修改为:

with open("test.in", 'r+')as f:
    f.write("addition")

test.in也做了回应:

additionk2
byebye3

所以 r+如果我们不执行读取操作模式将使我们从一开始就覆盖内容。而且,如果我们执行一些读取操作,f.write()则只会追加到文件中。

顺便说一下,如果我们f.seek(0,0)以前f.write(write_content)这样做过,write_content将从positon(0,0)覆盖它们。

Actually, there’s something wrong about all the other answers about r+ mode.

test.in file’s content:

hello1
ok2
byebye3

And the py script’s :

with open("test.in", 'r+')as f:
    f.readline()
    f.write("addition")

Execute it and the test.in‘s content will be changed to :

hello1
ok2
byebye3
addition

However, when we modify the script to :

with open("test.in", 'r+')as f:
    f.write("addition")

the test.in also do the respond:

additionk2
byebye3

So, the r+ mode will allow us to cover the content from the beginning if we did’t do the read operation. And if we do some read operation, f.write()will just append to the file.

By the way, if we f.seek(0,0) before f.write(write_content), the write_content will cover them from the positon(0,0).


回答 7

h4z3所述,为实际使用,有时您的数据太大而无法直接加载所有内容,或者您​​拥有生成器或实时传入的数据,则可以使用w +存储在文件中并在以后读取。

As mentioned by h4z3, For a practical use, Sometimes your data is too big to directly load everything, or you have a generator, or real-time incoming data, you could use w+ to store in a file and read later.


如何使用open with语句打开文件

问题:如何使用open with语句打开文件

我正在研究如何在Python中进行文件输入和输出。我编写了以下代码,以将文件列表中的名称列表(每行一个)读入另一个文件中,同时对照文件中的名称检查名称并将文本附加到文件中。该代码有效。可以做得更好吗?

我想对with open(...输入文件和输出文件都使用该语句,但是看不到它们如何位于同一块中,这意味着我需要将名称存储在一个临时位置。

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    outfile = open(newfile, 'w')
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

    outfile.close()
    return # Do I gain anything by including this?

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

I’m looking at how to do file input and output in Python. I’ve written the following code to read a list of names (one per line) from a file into another file while checking a name against the names in the file and appending text to the occurrences in the file. The code works. Could it be done better?

I’d wanted to use the with open(... statement for both input and output files but can’t see how they could be in the same block meaning I’d need to store the names in a temporary location.

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    outfile = open(newfile, 'w')
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

    outfile.close()
    return # Do I gain anything by including this?

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

回答 0

Python允许将多个open()语句放在一个语句中with。您用逗号分隔。您的代码将是:

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    with open(newfile, 'w') as outfile, open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

不,通过return在函数的末尾放置一个显式字符不会获得任何收益。您可以使用return提前退出,但最后要退出,并且该函数将在没有退出的情况下退出。(当然,对于返回值的函数,您可以使用return来指定要返回的值。)

引入该语句时,Python 2.5 或Python 2.6 不支持open()与一起使用多个项目,但Python 2.7和Python 3.1或更高版本with支持使用多个项目with

http://docs.python.org/reference/compound_stmts.html#the-with-statement http://docs.python.org/release/3.1/reference/compound_stmts.html#the-with-statement

如果要编写必须在Python 2.5、2.6或3.0中运行的代码,则将with语句嵌套为建议的其他答案或使用contextlib.nested

Python allows putting multiple open() statements in a single with. You comma-separate them. Your code would then be:

def filter(txt, oldfile, newfile):
    '''\
    Read a list of names from a file line by line into an output file.
    If a line begins with a particular name, insert a string of text
    after the name before appending the line to the output file.
    '''

    with open(newfile, 'w') as outfile, open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

# input the name you want to check against
text = input('Please enter the name of a great person: ')    
letsgo = filter(text,'Spanish', 'Spanish2')

And no, you don’t gain anything by putting an explicit return at the end of your function. You can use return to exit early, but you had it at the end, and the function will exit without it. (Of course with functions that return a value, you use the return to specify the value to return.)

Using multiple open() items with with was not supported in Python 2.5 when the with statement was introduced, or in Python 2.6, but it is supported in Python 2.7 and Python 3.1 or newer.

http://docs.python.org/reference/compound_stmts.html#the-with-statement http://docs.python.org/release/3.1/reference/compound_stmts.html#the-with-statement

If you are writing code that must run in Python 2.5, 2.6 or 3.0, nest the with statements as the other answers suggested or use contextlib.nested.


回答 1

这样使用嵌套块

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        # your logic goes right here

Use nested blocks like this,

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        # your logic goes right here

回答 2

您可以将块嵌套。像这样:

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

这比您的版本更好,因为outfile即使您的代码遇到异常,您也可以保证将其关闭。显然,您可以通过try / finally进行操作,但这with是正确的方法。

或者,正如我刚学到的,您可以在@steveha描述的with语句中包含多个上下文管理器。在我看来,这比嵌套是更好的选择。

对于您的最后一个小问题,退货没有实际目的。我会删除它。

You can nest your with blocks. Like this:

with open(newfile, 'w') as outfile:
    with open(oldfile, 'r', encoding='utf-8') as infile:
        for line in infile:
            if line.startswith(txt):
                line = line[0:len(txt)] + ' - Truly a great person!\n'
            outfile.write(line)

This is better than your version because you guarantee that outfile will be closed even if your code encounters exceptions. Obviously you could do that with try/finally, but with is the right way to do this.

Or, as I have just learnt, you can have multiple context managers in a with statement as described by @steveha. That seems to me to be a better option than nesting.

And for your final minor question, the return serves no real purpose. I would remove it.


回答 3

有时,您可能想打开不同数量的文件,并对待每个文件相同,可以使用 contextlib

from contextlib import ExitStack
filenames = [file1.txt, file2.txt, file3.txt]

with open('outfile.txt', 'a') as outfile:
    with ExitStack() as stack:
        file_pointers = [stack.enter_context(open(file, 'r')) for file in filenames]                
            for fp in file_pointers:
                outfile.write(fp.read())                   

Sometimes, you might want to open a variable amount of files and treat each one the same, you can do this with contextlib

from contextlib import ExitStack
filenames = [file1.txt, file2.txt, file3.txt]

with open('outfile.txt', 'a') as outfile:
    with ExitStack() as stack:
        file_pointers = [stack.enter_context(open(file, 'r')) for file in filenames]                
            for fp in file_pointers:
                outfile.write(fp.read())                   

Unicode(UTF-8)用Python读写文件

问题:Unicode(UTF-8)用Python读写文件

我在理解将文本写入文件和将文件写入文件时遇到了一些大脑故障(Python 2.4)。

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

(“ u’Capit \ xe1n’”,“’Capit \ xc3 \ xa1n’”)

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

因此,我Capit\xc3\xa1n在文件f2 中输入我最喜欢的编辑器。

然后:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

我在这里不明白什么?显然,我缺少一些至关重要的魔术(或理智)。一种类型的文本文件可以正确转换?

我真正无法理解的是UTF-8表示法的意义所在,如果您实际上无法让Python识别它的话(如果它来自外部)。也许我应该只将JSON转储字符串,然后使用它,因为它具有可表示性!更重要的是,当来自文件时,Python是否会识别并解码该Unicode对象的ASCII表示形式?如果是这样,我如何得到它?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

I’m having some brain failure in understanding reading and writing text to a file (Python 2.4).

# The string, which has an a-acute in it.
ss = u'Capit\xe1n'
ss8 = ss.encode('utf8')
repr(ss), repr(ss8)

(“u’Capit\xe1n'”, “‘Capit\xc3\xa1n'”)

print ss, ss8
print >> open('f1','w'), ss8

>>> file('f1').read()
'Capit\xc3\xa1n\n'

So I type in Capit\xc3\xa1n into my favorite editor, in file f2.

Then:

>>> open('f1').read()
'Capit\xc3\xa1n\n'
>>> open('f2').read()
'Capit\\xc3\\xa1n\n'
>>> open('f1').read().decode('utf8')
u'Capit\xe1n\n'
>>> open('f2').read().decode('utf8')
u'Capit\\xc3\\xa1n\n'

What am I not understanding here? Clearly there is some vital bit of magic (or good sense) that I’m missing. What does one type into text files to get proper conversions?

What I’m truly failing to grok here, is what the point of the UTF-8 representation is, if you can’t actually get Python to recognize it, when it comes from outside. Maybe I should just JSON dump the string, and use that instead, since that has an asciiable representation! More to the point, is there an ASCII representation of this Unicode object that Python will recognize and decode, when coming in from a file? If so, how do I get it?

>>> print simplejson.dumps(ss)
'"Capit\u00e1n"'
>>> print >> file('f3','w'), simplejson.dumps(ss)
>>> simplejson.load(open('f3'))
u'Capit\xe1n'

回答 0

在符号中

u'Capit\xe1n\n'

“ \ xe1”仅代表一个字节。“ \ x”告诉您“ e1”为十六进制。当你写

Capit\xc3\xa1n

到您的文件中,您有“ \ xc3”。这些是4个字节,在您的代码中,您全部读取了它们。显示它们时可以看到以下内容:

>>> open('f2').read()
'Capit\\xc3\\xa1n\n'

您可以看到反斜杠被反斜杠转义了。因此,您的字符串中有四个字节:“ \”,“ x”,“ c”和“ 3”。

编辑:

正如其他人在他们的答案中指出的那样,您只需要在编辑器中输入字符,然后您的编辑器就应处理到UTF-8的转换并保存。

如果您实际上有这种格式的字符串,则可以使用string_escape编解码器将其解码为普通字符串:

In [15]: print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán

结果是一个以UTF-8编码的字符串,其中重音字符由\\xc3\\xa1原始字符串中写入的两个字节表示。如果要使用unicode字符串,则必须使用UTF-8再次解码。

编辑:您的文件中没有UTF-8。实际查看它的外观:

s = u'Capit\xe1n\n'
sutf8 = s.encode('UTF-8')
open('utf-8.out', 'w').write(sutf8)

将文件utf-8.out内容与使用编辑器保存的文件内容进行比较。

In the notation

u'Capit\xe1n\n'

the “\xe1” represents just one byte. “\x” tells you that “e1” is in hexadecimal. When you write

Capit\xc3\xa1n

into your file you have “\xc3” in it. Those are 4 bytes and in your code you read them all. You can see this when you display them:

>>> open('f2').read()
'Capit\\xc3\\xa1n\n'

You can see that the backslash is escaped by a backslash. So you have four bytes in your string: “\”, “x”, “c” and “3”.

Edit:

As others pointed out in their answers you should just enter the characters in the editor and your editor should then handle the conversion to UTF-8 and save it.

If you actually have a string in this format you can use the string_escape codec to decode it into a normal string:

In [15]: print 'Capit\\xc3\\xa1n\n'.decode('string_escape')
Capitán

The result is a string that is encoded in UTF-8 where the accented character is represented by the two bytes that were written \\xc3\\xa1 in the original string. If you want to have a unicode string you have to decode again with UTF-8.

To your edit: you don’t have UTF-8 in your file. To actually see how it would look like:

s = u'Capit\xe1n\n'
sutf8 = s.encode('UTF-8')
open('utf-8.out', 'w').write(sutf8)

Compare the content of the file utf-8.out to the content of the file you saved with your editor.


回答 1

我发现打开文件时更容易指定编码,而不是搞乱编码和解码方法。该io模块(Python 2.6中添加)提供了一个io.open函数,该函数具有一个编码参数。

使用io模块中的open方法。

>>>import io
>>>f = io.open("test", mode="r", encoding="utf-8")

然后,在调用f的read()函数之后,将返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

请注意,在Python 3中,该io.open函数是内置函数的别名open。内置的open函数仅在Python 3中支持encoding参数,而在Python 2中不支持。

编辑:以前此答案推荐编解码器模块。该混合编解码器时,模块可能会造成问题read()readline(),所以这个答案现在建议的IO模块来代替。

使用编解码器模块中的open方法。

>>>import codecs
>>>f = codecs.open("test", "r", "utf-8")

然后,在调用f的read()函数之后,将返回一个编码的Unicode对象。

>>>f.read()
u'Capit\xe1l\n\n'

如果您知道文件的编码,那么使用编解码器软件包将减少混乱。

请参阅http://docs.python.org/library/codecs.html#codecs.open

Rather than mess with the encode and decode methods I find it easier to specify the encoding when opening the file. The io module (added in Python 2.6) provides an io.open function, which has an encoding parameter.

Use the open method from the io module.

>>>import io
>>>f = io.open("test", mode="r", encoding="utf-8")

Then after calling f’s read() function, an encoded Unicode object is returned.

>>>f.read()
u'Capit\xe1l\n\n'

Note that in Python 3, the io.open function is an alias for the built-in open function. The built-in open function only supports the encoding argument in Python 3, not Python 2.

Edit: Previously this answer recommended the codecs module. The codecs module can cause problems when mixing read() and readline(), so this answer now recommends the io module instead.

Use the open method from the codecs module.

>>>import codecs
>>>f = codecs.open("test", "r", "utf-8")

Then after calling f’s read() function, an encoded Unicode object is returned.

>>>f.read()
u'Capit\xe1l\n\n'

If you know the encoding of a file, using the codecs package is going to be much less confusing.

See http://docs.python.org/library/codecs.html#codecs.open


回答 2

现在,您在Python3中所需的就是 open(Filename, 'r', encoding='utf-8')

[在2016-02-10上进行编辑以要求澄清]

Python3在其open函数中添加了encoding参数。从此处收集了有关open函数的以下信息:https : //docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=-1, 
      encoding=None, errors=None, newline=None, 
      closefd=True, opener=None)

编码是用于解码或编码文件的编码名称。仅应在文本模式下使用。默认编码取决于平台(无论locale.getpreferredencoding() 返回什么),但是可以使用Python支持的任何文本编码。有关支持的编码列表,请参见编解码器模块。

因此,通过向encoding='utf-8'open函数添加参数,所有文件的读取和写入操作都将以utf8的形式完成(现在,这也是使用Python完成的所有操作的默认编码。)

Now all you need in Python3 is open(Filename, 'r', encoding='utf-8')

[Edit on 2016-02-10 for requested clarification]

Python3 added the encoding parameter to its open function. The following information about the open function is gathered from here: https://docs.python.org/3/library/functions.html#open

open(file, mode='r', buffering=-1, 
      encoding=None, errors=None, newline=None, 
      closefd=True, opener=None)

Encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

So by adding encoding='utf-8' as a parameter to the open function, the file reading and writing is all done as utf8 (which is also now the default encoding of everything done in Python.)


回答 3

因此,我找到了所需的解决方案,即:

print open('f2').read().decode('string-escape').decode("utf-8")

这里有一些不常用的编解码器。这种特殊的阅读方式允许人们从Python内部获取UTF-8表示形式,将其复制到ASCII文件中,然后将其读入Unicode。在“字符串转义”解码下,斜杠不会加倍。

这允许我想象中的那种往返。

So, I’ve found a solution for what I’m looking for, which is:

print open('f2').read().decode('string-escape').decode("utf-8")

There are some unusual codecs that are useful here. This particular reading allows one to take UTF-8 representations from within Python, copy them into an ASCII file, and have them be read in to Unicode. Under the “string-escape” decode, the slashes won’t be doubled.

This allows for the sort of round trip that I was imagining.


回答 4

# -*- encoding: utf-8 -*-

# converting a unknown formatting file in utf-8

import codecs
import commands

file_location = "jumper.sub"
file_encoding = commands.getoutput('file -b --mime-encoding %s' % file_location)

file_stream = codecs.open(file_location, 'r', file_encoding)
file_output = codecs.open(file_location+"b", 'w', 'utf-8')

for l in file_stream:
    file_output.write(l)

file_stream.close()
file_output.close()
# -*- encoding: utf-8 -*-

# converting a unknown formatting file in utf-8

import codecs
import commands

file_location = "jumper.sub"
file_encoding = commands.getoutput('file -b --mime-encoding %s' % file_location)

file_stream = codecs.open(file_location, 'r', file_encoding)
file_output = codecs.open(file_location+"b", 'w', 'utf-8')

for l in file_stream:
    file_output.write(l)

file_stream.close()
file_output.close()

回答 5

实际上,这对于在Python 3.2中读取UTF-8编码的文件非常有用:

import codecs
f = codecs.open('file_name.txt', 'r', 'UTF-8')
for line in f:
    print(line)

Actually, this worked for me for reading a file with UTF-8 encoding in Python 3.2:

import codecs
f = codecs.open('file_name.txt', 'r', 'UTF-8')
for line in f:
    print(line)

回答 6

要读取Unicode字符串然后发送到HTML,我这样做:

fileline.decode("utf-8").encode('ascii', 'xmlcharrefreplace')

对于由python驱动的http服务器有用。

To read in an Unicode string and then send to HTML, I did this:

fileline.decode("utf-8").encode('ascii', 'xmlcharrefreplace')

Useful for python powered http servers.


回答 7

您已经迷惑了编码的一般问题:如何确定文件是哪种编码?

答:除非文件格式为此提供,否则您不能这样做。例如,XML以:

<?xml encoding="utf-8"?>

仔细选择了此标头,以便无论编码方式都可以读取它。在您的情况下,没有这样的提示,因此您的编辑器和Python都不知道发生了什么。因此,您必须使用codecs模块并使用codecs.open(path,mode,encoding)它提供Python中缺少的位。

对于您的编辑器,您必须检查它是否提供某种方式来设置文件的编码。

UTF-8的重点是能够将21位字符(Unicode)编码为8位数据流(因为这是世界上所有计算机只能处理的事情)。但是,由于大多数操作系统早于Unicode时代,因此它们没有合适的工具将编码信息附加到硬盘上的文件中。

下一个问题是Python中的表示形式。heikogerlach评论中对此做了完美解释。您必须了解控制台只能显示ASCII。为了显示Unicode或> = charcode 128的任何内容,它必须使用某种转义方法。在编辑器中,您不得键入转义的显示字符串,而应输入字符串的含义(在这种情况下,必须输入变音符号并保存文件)。

也就是说,您可以使用Python函数eval()将转义的字符串转换为字符串:

>>> x = eval("'Capit\\xc3\\xa1n\\n'")
>>> x
'Capit\xc3\xa1n\n'
>>> x[5]
'\xc3'
>>> len(x[5])
1

如您所见,字符串“ \ xc3”已变成单个字符。现在,这是一个8位字符串,采用UTF-8编码。要获取Unicode:

>>> x.decode('utf-8')
u'Capit\xe1n\n'

Gregg Lind问:我认为这里缺少一些内容:文件f2包含:十六进制:

0000000: 4361 7069 745c 7863 335c 7861 316e  Capit\xc3\xa1n

codecs.open('f2','rb', 'utf-8'),例如,将它们全部读取到一个单独的字符中(期望),是否有任何方法可以用ASCII写入文件?

答:这取决于您的意思。ASCII不能表示大于127的字符。因此,您需要某种方式来表示“接下来的几个字符表示特殊的含义”,这就是序列“ \ x”的作用。它说:接下来的两个字符是单个字符的代码。“ \ u”使用四个字符对最多0xFFFF(65535)的Unicode进行编码。

因此,您不能直接将Unicode写为ASCII(因为ASCII根本不包含相同的字符)。您可以将其写为字符串转义符(如f2所示);在这种情况下,文件可以表示为ASCII。或者您可以将其编写为UTF-8,在这种情况下,您需要8位安全流。

您的解决方案使用decode('string-escape')确实可以,但是您必须知道使用了多少内存:使用量的三倍codecs.open()

请记住,文件只是一个具有8位的字节序列。位和字节都没有意义。是您说“ 65代表’A’”。由于\xc3\xa1应该变成“à”,但是计算机无法识别,因此必须通过指定在写入文件时使用的编码来告诉它。

You have stumbled over the general problem with encodings: How can I tell in which encoding a file is?

Answer: You can’t unless the file format provides for this. XML, for example, begins with:

<?xml encoding="utf-8"?>

This header was carefully chosen so that it can be read no matter the encoding. In your case, there is no such hint, hence neither your editor nor Python has any idea what is going on. Therefore, you must use the codecs module and use codecs.open(path,mode,encoding) which provides the missing bit in Python.

As for your editor, you must check if it offers some way to set the encoding of a file.

The point of UTF-8 is to be able to encode 21-bit characters (Unicode) as an 8-bit data stream (because that’s the only thing all computers in the world can handle). But since most OSs predate the Unicode era, they don’t have suitable tools to attach the encoding information to files on the hard disk.

The next issue is the representation in Python. This is explained perfectly in the comment by heikogerlach. You must understand that your console can only display ASCII. In order to display Unicode or anything >= charcode 128, it must use some means of escaping. In your editor, you must not type the escaped display string but what the string means (in this case, you must enter the umlaut and save the file).

That said, you can use the Python function eval() to turn an escaped string into a string:

>>> x = eval("'Capit\\xc3\\xa1n\\n'")
>>> x
'Capit\xc3\xa1n\n'
>>> x[5]
'\xc3'
>>> len(x[5])
1

As you can see, the string “\xc3” has been turned into a single character. This is now an 8-bit string, UTF-8 encoded. To get Unicode:

>>> x.decode('utf-8')
u'Capit\xe1n\n'

Gregg Lind asked: I think there are some pieces missing here: the file f2 contains: hex:

0000000: 4361 7069 745c 7863 335c 7861 316e  Capit\xc3\xa1n

codecs.open('f2','rb', 'utf-8'), for example, reads them all in a separate chars (expected) Is there any way to write to a file in ASCII that would work?

Answer: That depends on what you mean. ASCII can’t represent characters > 127. So you need some way to say “the next few characters mean something special” which is what the sequence “\x” does. It says: The next two characters are the code of a single character. “\u” does the same using four characters to encode Unicode up to 0xFFFF (65535).

So you can’t directly write Unicode to ASCII (because ASCII simply doesn’t contain the same characters). You can write it as string escapes (as in f2); in this case, the file can be represented as ASCII. Or you can write it as UTF-8, in which case, you need an 8-bit safe stream.

Your solution using decode('string-escape') does work, but you must be aware how much memory you use: Three times the amount of using codecs.open().

Remember that a file is just a sequence of bytes with 8 bits. Neither the bits nor the bytes have a meaning. It’s you who says “65 means ‘A'”. Since \xc3\xa1 should become “à” but the computer has no means to know, you must tell it by specifying the encoding which was used when writing the file.


回答 8

除了之外codecs.open(),可以使用io.open()Python2或Python3来读取/写入unicode文件

import io

text = u'á'
encoding = 'utf8'

with io.open('data.txt', 'w', encoding=encoding, newline='\n') as fout:
    fout.write(text)

with io.open('data.txt', 'r', encoding=encoding, newline='\n') as fin:
    text2 = fin.read()

assert text == text2

except for codecs.open(), one can uses io.open() to work with Python2 or Python3 to read / write unicode file

example

import io

text = u'á'
encoding = 'utf8'

with io.open('data.txt', 'w', encoding=encoding, newline='\n') as fout:
    fout.write(text)

with io.open('data.txt', 'r', encoding=encoding, newline='\n') as fin:
    text2 = fin.read()

assert text == text2

回答 9

好吧,您最喜欢的文本编辑器没有意识到这\xc3\xa1应该是字符文字,而是将它们解释为文本。这就是为什么在最后一行得到双反斜杠的原因-它现在是xc3文件中的真实反斜杠+ 等。

如果要用Python读写编码文件,最好使用编解码器模块。

在终端和应用程序之间粘贴文本很困难,因为您不知道哪个程序将使用哪种编码来解释您的文本。您可以尝试以下方法:

>>> s = file("f1").read()
>>> print unicode(s, "Latin-1")
Capitán

然后将此字符串粘贴到编辑器中,并确保使用Latin-1将其存储。在剪贴板不乱码的假设下,往返应该起作用。

Well, your favorite text editor does not realize that \xc3\xa1 are supposed to be character literals, but it interprets them as text. That’s why you get the double backslashes in the last line — it’s now a real backslash + xc3, etc. in your file.

If you want to read and write encoded files in Python, best use the codecs module.

Pasting text between the terminal and applications is difficult, because you don’t know which program will interpret your text using which encoding. You could try the following:

>>> s = file("f1").read()
>>> print unicode(s, "Latin-1")
Capitán

Then paste this string into your editor and make sure that it stores it using Latin-1. Under the assumption that the clipboard does not garble the string, the round trip should work.


回答 10

\ x ..序列特定于Python。这不是通用字节转义序列。

实际输入UTF-8编码的非ASCII的方式取决于您的操作系统和/或编辑器。这是您在Windows中的操作方法。对于OS X进入一个带有尖音,你可以点击option+ E,然后A在OS X的支持UTF-8,而几乎所有的文本编辑器。

The \x.. sequence is something that’s specific to Python. It’s not a universal byte escape sequence.

How you actually enter in UTF-8-encoded non-ASCII depends on your OS and/or your editor. Here’s how you do it in Windows. For OS X to enter a with an acute accent you can just hit option + E, then A, and almost all text editors in OS X support UTF-8.


回答 11

您还可以open()通过使用该partial函数替换原来的函数,从而改进原始函数以使用Unicode文件。该解决方案的优点在于您无需更改任何旧代码。它是透明的。

import codecs
import functools
open = functools.partial(codecs.open, encoding='utf-8')

You can also improve the original open() function to work with Unicode files by replacing it in place, using the partial function. The beauty of this solution is you don’t need to change any old code. It’s transparent.

import codecs
import functools
open = functools.partial(codecs.open, encoding='utf-8')

回答 12

我试图使用Python 2.7.9 解析iCal

从icalendar导入日历

但是我得到了:

 Traceback (most recent call last):
 File "ical.py", line 92, in parse
    print "{}".format(e[attr])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 7: ordinal not in range(128)

它被固定为:

print "{}".format(e[attr].encode("utf-8"))

(现在,它可以打印likéáböss了。)

I was trying to parse iCal using Python 2.7.9:

from icalendar import Calendar

But I was getting:

 Traceback (most recent call last):
 File "ical.py", line 92, in parse
    print "{}".format(e[attr])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 7: ordinal not in range(128)

and it was fixed with just:

print "{}".format(e[attr].encode("utf-8"))

(Now it can print liké á böss.)


回答 13

通过将整个脚本的默认编码更改为’UTF-8’,我找到了最简单的方法:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

任何openprint或其他语句将只使用utf8

至少适用于Python 2.7.9

Thx转到https://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/(看看结尾)。

I found the most simple approach by changing the default encoding of the whole script to be ‘UTF-8’:

import sys
reload(sys)
sys.setdefaultencoding('utf8')

any open, print or other statement will just use utf8.

Works at least for Python 2.7.9.

Thx goes to https://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/ (look at the end).


在Python中对子进程进行非阻塞读取

问题:在Python中对子进程进行非阻塞读取

我正在使用子流程模块来启动子流程并连接到其输出流(stdout)。我希望能够在其stdout上执行非阻塞读取。有没有一种方法可以使.readline无阻塞或在调用之前检查流中是否有数据.readline?我希望这是可移植的,或者至少要在Windows和Linux下工作。

这是我目前的操作方式(.readline如果没有可用数据,则会阻塞):

p = subprocess.Popen('myprogram.exe', stdout = subprocess.PIPE)
output_str = p.stdout.readline()

I’m using the subprocess module to start a subprocess and connect to it’s output stream (stdout). I want to be able to execute non-blocking reads on its stdout. Is there a way to make .readline non-blocking or to check if there is data on the stream before I invoke .readline? I’d like this to be portable or at least work under Windows and Linux.

here is how I do it for now (It’s blocking on the .readline if no data is avaible):

p = subprocess.Popen('myprogram.exe', stdout = subprocess.PIPE)
output_str = p.stdout.readline()

回答 0

fcntlselectasyncproc不会在这种情况下帮助。

不管使用什么操作系统,一种可靠地读取流而不阻塞的可靠方法是使用Queue.get_nowait()

import sys
from subprocess import PIPE, Popen
from threading  import Thread

try:
    from queue import Queue, Empty
except ImportError:
    from Queue import Queue, Empty  # python 2.x

ON_POSIX = 'posix' in sys.builtin_module_names

def enqueue_output(out, queue):
    for line in iter(out.readline, b''):
        queue.put(line)
    out.close()

p = Popen(['myprogram.exe'], stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()

# ... do other things here

# read line without blocking
try:  line = q.get_nowait() # or q.get(timeout=.1)
except Empty:
    print('no output yet')
else: # got line
    # ... do something with line

fcntl, select, asyncproc won’t help in this case.

A reliable way to read a stream without blocking regardless of operating system is to use Queue.get_nowait():

import sys
from subprocess import PIPE, Popen
from threading  import Thread

try:
    from queue import Queue, Empty
except ImportError:
    from Queue import Queue, Empty  # python 2.x

ON_POSIX = 'posix' in sys.builtin_module_names

def enqueue_output(out, queue):
    for line in iter(out.readline, b''):
        queue.put(line)
    out.close()

p = Popen(['myprogram.exe'], stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()

# ... do other things here

# read line without blocking
try:  line = q.get_nowait() # or q.get(timeout=.1)
except Empty:
    print('no output yet')
else: # got line
    # ... do something with line

回答 1

我经常遇到类似的问题。我经常编写的Python程序需要具有执行一些主要功能的能力,同时还要从命令行(stdin)接受用户输入。仅将用户输入处理功能放在另一个线程中并不能解决问题,因为会readline()阻塞并且没有超时。如果主要功能已经完成,并且不再需要等待进一步的用户输入,我通常希望我的程序退出,但是不能,因为readline()仍然在另一个线程中等待一行。我发现此问题的解决方案是使用fcntl模块使stdin成为非阻塞文件:

import fcntl
import os
import sys

# make stdin a non-blocking file
fd = sys.stdin.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

# user input handling thread
while mainThreadIsRunning:
      try: input = sys.stdin.readline()
      except: continue
      handleInput(input)

在我看来,这比使用选择或信号模块来解决此问题要干净一些,但是再说一次,它仅适用于UNIX …

I have often had a similar problem; Python programs I write frequently need to have the ability to execute some primary functionality while simultaneously accepting user input from the command line (stdin). Simply putting the user input handling functionality in another thread doesn’t solve the problem because readline() blocks and has no timeout. If the primary functionality is complete and there is no longer any need to wait for further user input I typically want my program to exit, but it can’t because readline() is still blocking in the other thread waiting for a line. A solution I have found to this problem is to make stdin a non-blocking file using the fcntl module:

import fcntl
import os
import sys

# make stdin a non-blocking file
fd = sys.stdin.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

# user input handling thread
while mainThreadIsRunning:
      try: input = sys.stdin.readline()
      except: continue
      handleInput(input)

In my opinion this is a bit cleaner than using the select or signal modules to solve this problem but then again it only works on UNIX…


回答 2

Python 3.4引入了用于异步IO 模块的临时APIasyncio

该方法类似于twisted@Bryan Ward的基于答案 -定义协议,并在数据准备好后立即调用其方法:

#!/usr/bin/env python3
import asyncio
import os

class SubprocessProtocol(asyncio.SubprocessProtocol):
    def pipe_data_received(self, fd, data):
        if fd == 1: # got stdout data (bytes)
            print(data)

    def connection_lost(self, exc):
        loop.stop() # end loop.run_forever()

if os.name == 'nt':
    loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(loop.subprocess_exec(SubprocessProtocol, 
        "myprogram.exe", "arg1", "arg2"))
    loop.run_forever()
finally:
    loop.close()

请参阅docs中的“子流程”

有一个高级接口asyncio.create_subprocess_exec(),该接口返回允许使用协程 (使用/ Python 3.5+语法)异步读取行的Process对象StreamReader.readline()asyncawait

#!/usr/bin/env python3.5
import asyncio
import locale
import sys
from asyncio.subprocess import PIPE
from contextlib import closing

async def readline_and_kill(*args):
    # start child process
    process = await asyncio.create_subprocess_exec(*args, stdout=PIPE)

    # read line (sequence of bytes ending with b'\n') asynchronously
    async for line in process.stdout:
        print("got line:", line.decode(locale.getpreferredencoding(False)))
        break
    process.kill()
    return await process.wait() # wait for the child process to exit


if sys.platform == "win32":
    loop = asyncio.ProactorEventLoop()
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()

with closing(loop):
    sys.exit(loop.run_until_complete(readline_and_kill(
        "myprogram.exe", "arg1", "arg2")))

readline_and_kill() 执行以下任务:

  • 启动子进程,将其标准输出重定向到管道
  • 从子进程的stdout异步读取一行
  • 杀死子进程
  • 等待它退出

如有必要,每个步骤可能会受到超时秒数的限制。

Python 3.4 introduces new provisional API for asynchronous IO — asyncio module.

The approach is similar to twisted-based answer by @Bryan Ward — define a protocol and its methods are called as soon as data is ready:

#!/usr/bin/env python3
import asyncio
import os

class SubprocessProtocol(asyncio.SubprocessProtocol):
    def pipe_data_received(self, fd, data):
        if fd == 1: # got stdout data (bytes)
            print(data)

    def connection_lost(self, exc):
        loop.stop() # end loop.run_forever()

if os.name == 'nt':
    loop = asyncio.ProactorEventLoop() # for subprocess' pipes on Windows
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(loop.subprocess_exec(SubprocessProtocol, 
        "myprogram.exe", "arg1", "arg2"))
    loop.run_forever()
finally:
    loop.close()

See “Subprocess” in the docs.

There is a high-level interface asyncio.create_subprocess_exec() that returns Process objects that allows to read a line asynchroniosly using StreamReader.readline() coroutine (with async/await Python 3.5+ syntax):

#!/usr/bin/env python3.5
import asyncio
import locale
import sys
from asyncio.subprocess import PIPE
from contextlib import closing

async def readline_and_kill(*args):
    # start child process
    process = await asyncio.create_subprocess_exec(*args, stdout=PIPE)

    # read line (sequence of bytes ending with b'\n') asynchronously
    async for line in process.stdout:
        print("got line:", line.decode(locale.getpreferredencoding(False)))
        break
    process.kill()
    return await process.wait() # wait for the child process to exit


if sys.platform == "win32":
    loop = asyncio.ProactorEventLoop()
    asyncio.set_event_loop(loop)
else:
    loop = asyncio.get_event_loop()

with closing(loop):
    sys.exit(loop.run_until_complete(readline_and_kill(
        "myprogram.exe", "arg1", "arg2")))

readline_and_kill() performs the following tasks:

  • start subprocess, redirect its stdout to a pipe
  • read a line from subprocess’ stdout asynchronously
  • kill subprocess
  • wait for it to exit

Each step could be limited by timeout seconds if necessary.


回答 3

尝试使用asyncproc模块。例如:

import os
from asyncproc import Process
myProc = Process("myprogram.app")

while True:
    # check to see if process has ended
    poll = myProc.wait(os.WNOHANG)
    if poll != None:
        break
    # print any new output
    out = myProc.read()
    if out != "":
        print out

该模块负责S.Lott建议的所有线程。

Try the asyncproc module. For example:

import os
from asyncproc import Process
myProc = Process("myprogram.app")

while True:
    # check to see if process has ended
    poll = myProc.wait(os.WNOHANG)
    if poll != None:
        break
    # print any new output
    out = myProc.read()
    if out != "":
        print out

The module takes care of all the threading as suggested by S.Lott.


回答 4

您可以在Twisted中非常轻松地执行此操作。根据您现有的代码库,这可能不是那么容易使用,但是如果您要构建一个扭曲的应用程序,那么类似这样的事情将变得微不足道。您创建一个ProcessProtocol类,并重写该outReceived()方法。扭曲(取决于所使用的反应堆)通常只是一个大select()循环,其中安装了用于处理来自不同文件描述符(通常是网络套接字)的数据的回调。因此,该outReceived()方法只是安装用于处理来自的数据的回调STDOUT。演示此行为的简单示例如下:

from twisted.internet import protocol, reactor

class MyProcessProtocol(protocol.ProcessProtocol):

    def outReceived(self, data):
        print data

proc = MyProcessProtocol()
reactor.spawnProcess(proc, './myprogram', ['./myprogram', 'arg1', 'arg2', 'arg3'])
reactor.run()

扭曲的文档对此有一些有用的信息。

如果您围绕Twisted构建整个应用程序,它将与本地或远程的其他进程进行异步通信,就像这样非常优雅。另一方面,如果您的程序不是基于Twisted构建的,那么它实际上并没有那么大的帮助。希望这对其他读者有帮助,即使它不适用于您的特定应用程序。

You can do this really easily in Twisted. Depending upon your existing code base, this might not be that easy to use, but if you are building a twisted application, then things like this become almost trivial. You create a ProcessProtocol class, and override the outReceived() method. Twisted (depending upon the reactor used) is usually just a big select() loop with callbacks installed to handle data from different file descriptors (often network sockets). So the outReceived() method is simply installing a callback for handling data coming from STDOUT. A simple example demonstrating this behavior is as follows:

from twisted.internet import protocol, reactor

class MyProcessProtocol(protocol.ProcessProtocol):

    def outReceived(self, data):
        print data

proc = MyProcessProtocol()
reactor.spawnProcess(proc, './myprogram', ['./myprogram', 'arg1', 'arg2', 'arg3'])
reactor.run()

The Twisted documentation has some good information on this.

If you build your entire application around Twisted, it makes asynchronous communication with other processes, local or remote, really elegant like this. On the other hand, if your program isn’t built on top of Twisted, this isn’t really going to be that helpful. Hopefully this can be helpful to other readers, even if it isn’t applicable for your particular application.


回答 5

使用select&read(1)。

import subprocess     #no new requirements
def readAllSoFar(proc, retVal=''): 
  while (select.select([proc.stdout],[],[],0)[0]!=[]):   
    retVal+=proc.stdout.read(1)
  return retVal
p = subprocess.Popen(['/bin/ls'], stdout=subprocess.PIPE)
while not p.poll():
  print (readAllSoFar(p))

对于类似readline()的:

lines = ['']
while not p.poll():
  lines = readAllSoFar(p, lines[-1]).split('\n')
  for a in range(len(lines)-1):
    print a
lines = readAllSoFar(p, lines[-1]).split('\n')
for a in range(len(lines)-1):
  print a

Use select & read(1).

import subprocess     #no new requirements
def readAllSoFar(proc, retVal=''): 
  while (select.select([proc.stdout],[],[],0)[0]!=[]):   
    retVal+=proc.stdout.read(1)
  return retVal
p = subprocess.Popen(['/bin/ls'], stdout=subprocess.PIPE)
while not p.poll():
  print (readAllSoFar(p))

For readline()-like:

lines = ['']
while not p.poll():
  lines = readAllSoFar(p, lines[-1]).split('\n')
  for a in range(len(lines)-1):
    print a
lines = readAllSoFar(p, lines[-1]).split('\n')
for a in range(len(lines)-1):
  print a

回答 6

一种解决方案是使另一个进程执行该进程的读取,或者使该进程的线程超时。

这是超时功能的线程版本:

http://code.activestate.com/recipes/473878/

但是,您需要在输入stdout时对其进行阅读吗?另一个解决方案可能是将输出转储到文件中,并等待使用p.wait()完成该过程。

f = open('myprogram_output.txt','w')
p = subprocess.Popen('myprogram.exe', stdout=f)
p.wait()
f.close()


str = open('myprogram_output.txt','r').read()

One solution is to make another process to perform your read of the process, or make a thread of the process with a timeout.

Here’s the threaded version of a timeout function:

http://code.activestate.com/recipes/473878/

However, do you need to read the stdout as it’s coming in? Another solution may be to dump the output to a file and wait for the process to finish using p.wait().

f = open('myprogram_output.txt','w')
p = subprocess.Popen('myprogram.exe', stdout=f)
p.wait()
f.close()


str = open('myprogram_output.txt','r').read()

回答 7

免责声明:这仅适用于龙卷风

您可以通过将fd设置为非阻塞,然后使用ioloop来注册回调来实现。我将其包装在一个名为tornado_subprocess的鸡蛋中,您可以通过PyPI安装它:

easy_install tornado_subprocess

现在您可以执行以下操作:

import tornado_subprocess
import tornado.ioloop

    def print_res( status, stdout, stderr ) :
    print status, stdout, stderr
    if status == 0:
        print "OK:"
        print stdout
    else:
        print "ERROR:"
        print stderr

t = tornado_subprocess.Subprocess( print_res, timeout=30, args=[ "cat", "/etc/passwd" ] )
t.start()
tornado.ioloop.IOLoop.instance().start()

您也可以将其与RequestHandler一起使用

class MyHandler(tornado.web.RequestHandler):
    def on_done(self, status, stdout, stderr):
        self.write( stdout )
        self.finish()

    @tornado.web.asynchronous
    def get(self):
        t = tornado_subprocess.Subprocess( self.on_done, timeout=30, args=[ "cat", "/etc/passwd" ] )
        t.start()

Disclaimer: this works only for tornado

You can do this by setting the fd to be nonblocking and then use ioloop to register callbacks. I have packaged this in an egg called tornado_subprocess and you can install it via PyPI:

easy_install tornado_subprocess

now you can do something like this:

import tornado_subprocess
import tornado.ioloop

    def print_res( status, stdout, stderr ) :
    print status, stdout, stderr
    if status == 0:
        print "OK:"
        print stdout
    else:
        print "ERROR:"
        print stderr

t = tornado_subprocess.Subprocess( print_res, timeout=30, args=[ "cat", "/etc/passwd" ] )
t.start()
tornado.ioloop.IOLoop.instance().start()

you can also use it with a RequestHandler

class MyHandler(tornado.web.RequestHandler):
    def on_done(self, status, stdout, stderr):
        self.write( stdout )
        self.finish()

    @tornado.web.asynchronous
    def get(self):
        t = tornado_subprocess.Subprocess( self.on_done, timeout=30, args=[ "cat", "/etc/passwd" ] )
        t.start()

回答 8

现有解决方案对我不起作用(详细信息如下)。最终有效的方法是使用read(1)实现readline(基于此答案)。后者不会阻止:

from subprocess import Popen, PIPE
from threading import Thread
def process_output(myprocess): #output-consuming thread
    nextline = None
    buf = ''
    while True:
        #--- extract line using read(1)
        out = myprocess.stdout.read(1)
        if out == '' and myprocess.poll() != None: break
        if out != '':
            buf += out
            if out == '\n':
                nextline = buf
                buf = ''
        if not nextline: continue
        line = nextline
        nextline = None

        #--- do whatever you want with line here
        print 'Line is:', line
    myprocess.stdout.close()

myprocess = Popen('myprogram.exe', stdout=PIPE) #output-producing process
p1 = Thread(target=process_output, args=(dcmpid,)) #output-consuming thread
p1.daemon = True
p1.start()

#--- do whatever here and then kill process and thread if needed
if myprocess.poll() == None: #kill process; will automatically stop thread
    myprocess.kill()
    myprocess.wait()
if p1 and p1.is_alive(): #wait for thread to finish
    p1.join()

为什么现有解决方案不起作用:

  1. 需要readline的解决方案(包括基于Queue的解决方案)始终会阻塞。很难(不可能?)杀死执行readline的线程。它仅在创建它的过程完成时被杀死,而在产生输出的过程被杀死时则不被杀死。
  2. 正如anonnn指出的那样,将低级fcntl与高级别readline调用混合可能无法正常工作。
  3. 使用select.poll()很简单,但是根据python文档,它在Windows上不起作用。
  4. 使用第三方库似乎无法胜任此任务,并增加了其他依赖性。

Existing solutions did not work for me (details below). What finally worked was to implement readline using read(1) (based on this answer). The latter does not block:

from subprocess import Popen, PIPE
from threading import Thread
def process_output(myprocess): #output-consuming thread
    nextline = None
    buf = ''
    while True:
        #--- extract line using read(1)
        out = myprocess.stdout.read(1)
        if out == '' and myprocess.poll() != None: break
        if out != '':
            buf += out
            if out == '\n':
                nextline = buf
                buf = ''
        if not nextline: continue
        line = nextline
        nextline = None

        #--- do whatever you want with line here
        print 'Line is:', line
    myprocess.stdout.close()

myprocess = Popen('myprogram.exe', stdout=PIPE) #output-producing process
p1 = Thread(target=process_output, args=(dcmpid,)) #output-consuming thread
p1.daemon = True
p1.start()

#--- do whatever here and then kill process and thread if needed
if myprocess.poll() == None: #kill process; will automatically stop thread
    myprocess.kill()
    myprocess.wait()
if p1 and p1.is_alive(): #wait for thread to finish
    p1.join()

Why existing solutions did not work:

  1. Solutions that require readline (including the Queue based ones) always block. It is difficult (impossible?) to kill the thread that executes readline. It only gets killed when the process that created it finishes, but not when the output-producing process is killed.
  2. Mixing low-level fcntl with high-level readline calls may not work properly as anonnn has pointed out.
  3. Using select.poll() is neat, but doesn’t work on Windows according to python docs.
  4. Using third-party libraries seems overkill for this task and adds additional dependencies.

回答 9

这是我的代码,用于捕获子流程ASAP的每个输出,包括部分行。它同时抽水,并且以几乎正确的顺序抽出stdout和stderr。

经过测试并在Python 2.7 linux&Windows上正确工作。

#!/usr/bin/python
#
# Runner with stdout/stderr catcher
#
from sys import argv
from subprocess import Popen, PIPE
import os, io
from threading import Thread
import Queue
def __main__():
    if (len(argv) > 1) and (argv[-1] == "-sub-"):
        import time, sys
        print "Application runned!"
        time.sleep(2)
        print "Slept 2 second"
        time.sleep(1)
        print "Slept 1 additional second",
        time.sleep(2)
        sys.stderr.write("Stderr output after 5 seconds")
        print "Eol on stdin"
        sys.stderr.write("Eol on stderr\n")
        time.sleep(1)
        print "Wow, we have end of work!",
    else:
        os.environ["PYTHONUNBUFFERED"]="1"
        try:
            p = Popen( argv + ["-sub-"],
                       bufsize=0, # line-buffered
                       stdin=PIPE, stdout=PIPE, stderr=PIPE )
        except WindowsError, W:
            if W.winerror==193:
                p = Popen( argv + ["-sub-"],
                           shell=True, # Try to run via shell
                           bufsize=0, # line-buffered
                           stdin=PIPE, stdout=PIPE, stderr=PIPE )
            else:
                raise
        inp = Queue.Queue()
        sout = io.open(p.stdout.fileno(), 'rb', closefd=False)
        serr = io.open(p.stderr.fileno(), 'rb', closefd=False)
        def Pump(stream, category):
            queue = Queue.Queue()
            def rdr():
                while True:
                    buf = stream.read1(8192)
                    if len(buf)>0:
                        queue.put( buf )
                    else:
                        queue.put( None )
                        return
            def clct():
                active = True
                while active:
                    r = queue.get()
                    try:
                        while True:
                            r1 = queue.get(timeout=0.005)
                            if r1 is None:
                                active = False
                                break
                            else:
                                r += r1
                    except Queue.Empty:
                        pass
                    inp.put( (category, r) )
            for tgt in [rdr, clct]:
                th = Thread(target=tgt)
                th.setDaemon(True)
                th.start()
        Pump(sout, 'stdout')
        Pump(serr, 'stderr')

        while p.poll() is None:
            # App still working
            try:
                chan,line = inp.get(timeout = 1.0)
                if chan=='stdout':
                    print "STDOUT>>", line, "<?<"
                elif chan=='stderr':
                    print " ERROR==", line, "=?="
            except Queue.Empty:
                pass
        print "Finish"

if __name__ == '__main__':
    __main__()

Here is my code, used to catch every output from subprocess ASAP, including partial lines. It pumps at same time and stdout and stderr in almost correct order.

Tested and correctly worked on Python 2.7 linux & windows.

#!/usr/bin/python
#
# Runner with stdout/stderr catcher
#
from sys import argv
from subprocess import Popen, PIPE
import os, io
from threading import Thread
import Queue
def __main__():
    if (len(argv) > 1) and (argv[-1] == "-sub-"):
        import time, sys
        print "Application runned!"
        time.sleep(2)
        print "Slept 2 second"
        time.sleep(1)
        print "Slept 1 additional second",
        time.sleep(2)
        sys.stderr.write("Stderr output after 5 seconds")
        print "Eol on stdin"
        sys.stderr.write("Eol on stderr\n")
        time.sleep(1)
        print "Wow, we have end of work!",
    else:
        os.environ["PYTHONUNBUFFERED"]="1"
        try:
            p = Popen( argv + ["-sub-"],
                       bufsize=0, # line-buffered
                       stdin=PIPE, stdout=PIPE, stderr=PIPE )
        except WindowsError, W:
            if W.winerror==193:
                p = Popen( argv + ["-sub-"],
                           shell=True, # Try to run via shell
                           bufsize=0, # line-buffered
                           stdin=PIPE, stdout=PIPE, stderr=PIPE )
            else:
                raise
        inp = Queue.Queue()
        sout = io.open(p.stdout.fileno(), 'rb', closefd=False)
        serr = io.open(p.stderr.fileno(), 'rb', closefd=False)
        def Pump(stream, category):
            queue = Queue.Queue()
            def rdr():
                while True:
                    buf = stream.read1(8192)
                    if len(buf)>0:
                        queue.put( buf )
                    else:
                        queue.put( None )
                        return
            def clct():
                active = True
                while active:
                    r = queue.get()
                    try:
                        while True:
                            r1 = queue.get(timeout=0.005)
                            if r1 is None:
                                active = False
                                break
                            else:
                                r += r1
                    except Queue.Empty:
                        pass
                    inp.put( (category, r) )
            for tgt in [rdr, clct]:
                th = Thread(target=tgt)
                th.setDaemon(True)
                th.start()
        Pump(sout, 'stdout')
        Pump(serr, 'stderr')

        while p.poll() is None:
            # App still working
            try:
                chan,line = inp.get(timeout = 1.0)
                if chan=='stdout':
                    print "STDOUT>>", line, "<?<"
                elif chan=='stderr':
                    print " ERROR==", line, "=?="
            except Queue.Empty:
                pass
        print "Finish"

if __name__ == '__main__':
    __main__()

回答 10

我添加此问题以读取一些subprocess.Popen stdout。这是我的非阻塞读取解决方案:

import fcntl

def non_block_read(output):
    fd = output.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
    try:
        return output.read()
    except:
        return ""

# Use example
from subprocess import *
sb = Popen("echo test && sleep 1000", shell=True, stdout=PIPE)
sb.kill()

# sb.stdout.read() # <-- This will block
non_block_read(sb.stdout)
'test\n'

I add this problem to read some subprocess.Popen stdout. Here is my non blocking read solution:

import fcntl

def non_block_read(output):
    fd = output.fileno()
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
    try:
        return output.read()
    except:
        return ""

# Use example
from subprocess import *
sb = Popen("echo test && sleep 1000", shell=True, stdout=PIPE)
sb.kill()

# sb.stdout.read() # <-- This will block
non_block_read(sb.stdout)
'test\n'

回答 11

这无阻塞读的版本并不需要特殊的模块,并在大多数Linux发行版的工作外的开箱。

import os
import sys
import time
import fcntl
import subprocess

def async_read(fd):
    # set non-blocking flag while preserving old flags
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
    # read char until EOF hit
    while True:
        try:
            ch = os.read(fd.fileno(), 1)
            # EOF
            if not ch: break                                                                                                                                                              
            sys.stdout.write(ch)
        except OSError:
            # waiting for data be available on fd
            pass

def shell(args, async=True):
    # merge stderr and stdout
    proc = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    if async: async_read(proc.stdout)
    sout, serr = proc.communicate()
    return (sout, serr)

if __name__ == '__main__':
    cmd = 'ping 8.8.8.8'
    sout, serr = shell(cmd.split())

This version of non-blocking read doesn’t require special modules and will work out-of-the-box on majority of Linux distros.

import os
import sys
import time
import fcntl
import subprocess

def async_read(fd):
    # set non-blocking flag while preserving old flags
    fl = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
    # read char until EOF hit
    while True:
        try:
            ch = os.read(fd.fileno(), 1)
            # EOF
            if not ch: break                                                                                                                                                              
            sys.stdout.write(ch)
        except OSError:
            # waiting for data be available on fd
            pass

def shell(args, async=True):
    # merge stderr and stdout
    proc = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    if async: async_read(proc.stdout)
    sout, serr = proc.communicate()
    return (sout, serr)

if __name__ == '__main__':
    cmd = 'ping 8.8.8.8'
    sout, serr = shell(cmd.split())

回答 12

这是一个基于线程的简单解决方案,其中:

  • 在Linux和Windows上均可使用(不依赖select)。
  • 同时读取stdoutstderrasynchronouly。
  • 不依赖于具有任意等待时间的主动轮询(CPU友好)。
  • 不使用asyncio(可能与其他库冲突)。
  • 运行直到子进程终止。

打印机

import time
import sys

sys.stdout.write("Hello\n")
sys.stdout.flush()
time.sleep(1)
sys.stdout.write("World!\n")
sys.stdout.flush()
time.sleep(1)
sys.stderr.write("That's an error\n")
sys.stderr.flush()
time.sleep(2)
sys.stdout.write("Actually, I'm fine\n")
sys.stdout.flush()
time.sleep(1)

reader.py

import queue
import subprocess
import sys
import threading


def enqueue_stream(stream, queue, type):
    for line in iter(stream.readline, b''):
        queue.put(str(type) + line.decode('utf-8'))
    stream.close()


def enqueue_process(process, queue):
    process.wait()
    queue.put('x')


p = subprocess.Popen('python printer.py', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
q = queue.Queue()
to = threading.Thread(target=enqueue_stream, args=(p.stdout, q, 1))
te = threading.Thread(target=enqueue_stream, args=(p.stderr, q, 2))
tp = threading.Thread(target=enqueue_process, args=(p, q))
te.start()
to.start()
tp.start()

while True:
    line = q.get()
    if line[0] == 'x':
        break
    if line[0] == '2':  # stderr
        sys.stdout.write("\033[0;31m")  # ANSI red color
    sys.stdout.write(line[1:])
    if line[0] == '2':
        sys.stdout.write("\033[0m")  # reset ANSI code
    sys.stdout.flush()

tp.join()
to.join()
te.join()

Here is a simple solution based on threads which:

  • works on both Linux and Windows (not relying on select).
  • reads both stdout and stderr asynchronouly.
  • doesn’t rely on active polling with arbitrary waiting time (CPU friendly).
  • doesn’t use asyncio (which may conflict with other libraries).
  • runs until the child process terminates.

printer.py

import time
import sys

sys.stdout.write("Hello\n")
sys.stdout.flush()
time.sleep(1)
sys.stdout.write("World!\n")
sys.stdout.flush()
time.sleep(1)
sys.stderr.write("That's an error\n")
sys.stderr.flush()
time.sleep(2)
sys.stdout.write("Actually, I'm fine\n")
sys.stdout.flush()
time.sleep(1)

reader.py

import queue
import subprocess
import sys
import threading


def enqueue_stream(stream, queue, type):
    for line in iter(stream.readline, b''):
        queue.put(str(type) + line.decode('utf-8'))
    stream.close()


def enqueue_process(process, queue):
    process.wait()
    queue.put('x')


p = subprocess.Popen('python printer.py', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
q = queue.Queue()
to = threading.Thread(target=enqueue_stream, args=(p.stdout, q, 1))
te = threading.Thread(target=enqueue_stream, args=(p.stderr, q, 2))
tp = threading.Thread(target=enqueue_process, args=(p, q))
te.start()
to.start()
tp.start()

while True:
    line = q.get()
    if line[0] == 'x':
        break
    if line[0] == '2':  # stderr
        sys.stdout.write("\033[0;31m")  # ANSI red color
    sys.stdout.write(line[1:])
    if line[0] == '2':
        sys.stdout.write("\033[0m")  # reset ANSI code
    sys.stdout.flush()

tp.join()
to.join()
te.join()

回答 13

在此添加答案,因为它提供了在Windows和Unix上设置非阻塞管道的功能。

所有ctypes细节都感谢@techtonik的回答

在Unix和Windows系统上都可以使用经过稍微修改的版本。

  • 与Python3兼容(仅需要很小的更改)
  • 包括posix版本,并定义要使用的异常。

这样,您可以对Unix和Windows代码使用相同的功能和异常。

# pipe_non_blocking.py (module)
"""
Example use:

    p = subprocess.Popen(
            command,
            stdout=subprocess.PIPE,
            )

    pipe_non_blocking_set(p.stdout.fileno())

    try:
        data = os.read(p.stdout.fileno(), 1)
    except PortableBlockingIOError as ex:
        if not pipe_non_blocking_is_error_blocking(ex):
            raise ex
"""


__all__ = (
    "pipe_non_blocking_set",
    "pipe_non_blocking_is_error_blocking",
    "PortableBlockingIOError",
    )

import os


if os.name == "nt":
    def pipe_non_blocking_set(fd):
        # Constant could define globally but avoid polluting the name-space
        # thanks to: /programming/34504970
        import msvcrt

        from ctypes import windll, byref, wintypes, WinError, POINTER
        from ctypes.wintypes import HANDLE, DWORD, BOOL

        LPDWORD = POINTER(DWORD)

        PIPE_NOWAIT = wintypes.DWORD(0x00000001)

        def pipe_no_wait(pipefd):
            SetNamedPipeHandleState = windll.kernel32.SetNamedPipeHandleState
            SetNamedPipeHandleState.argtypes = [HANDLE, LPDWORD, LPDWORD, LPDWORD]
            SetNamedPipeHandleState.restype = BOOL

            h = msvcrt.get_osfhandle(pipefd)

            res = windll.kernel32.SetNamedPipeHandleState(h, byref(PIPE_NOWAIT), None, None)
            if res == 0:
                print(WinError())
                return False
            return True

        return pipe_no_wait(fd)

    def pipe_non_blocking_is_error_blocking(ex):
        if not isinstance(ex, PortableBlockingIOError):
            return False
        from ctypes import GetLastError
        ERROR_NO_DATA = 232

        return (GetLastError() == ERROR_NO_DATA)

    PortableBlockingIOError = OSError
else:
    def pipe_non_blocking_set(fd):
        import fcntl
        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
        return True

    def pipe_non_blocking_is_error_blocking(ex):
        if not isinstance(ex, PortableBlockingIOError):
            return False
        return True

    PortableBlockingIOError = BlockingIOError

为了避免读取不完整的数据,我最终编写了自己的readline生成器(该生成器返回每行的字节字符串)。

它是一个生成器,因此您可以例如…

def non_blocking_readlines(f, chunk=1024):
    """
    Iterate over lines, yielding b'' when nothings left
    or when new data is not yet available.

    stdout_iter = iter(non_blocking_readlines(process.stdout))

    line = next(stdout_iter)  # will be a line or b''.
    """
    import os

    from .pipe_non_blocking import (
            pipe_non_blocking_set,
            pipe_non_blocking_is_error_blocking,
            PortableBlockingIOError,
            )

    fd = f.fileno()
    pipe_non_blocking_set(fd)

    blocks = []

    while True:
        try:
            data = os.read(fd, chunk)
            if not data:
                # case were reading finishes with no trailing newline
                yield b''.join(blocks)
                blocks.clear()
        except PortableBlockingIOError as ex:
            if not pipe_non_blocking_is_error_blocking(ex):
                raise ex

            yield b''
            continue

        while True:
            n = data.find(b'\n')
            if n == -1:
                break

            yield b''.join(blocks) + data[:n + 1]
            data = data[n + 1:]
            blocks.clear()
        blocks.append(data)

Adding this answer here since it provides ability to set non-blocking pipes on Windows and Unix.

All the ctypes details are thanks to @techtonik’s answer.

There is a slightly modified version to be used both on Unix and Windows systems.

  • Python3 compatible (only minor change needed).
  • Includes posix version, and defines exception to use for either.

This way you can use the same function and exception for Unix and Windows code.

# pipe_non_blocking.py (module)
"""
Example use:

    p = subprocess.Popen(
            command,
            stdout=subprocess.PIPE,
            )

    pipe_non_blocking_set(p.stdout.fileno())

    try:
        data = os.read(p.stdout.fileno(), 1)
    except PortableBlockingIOError as ex:
        if not pipe_non_blocking_is_error_blocking(ex):
            raise ex
"""


__all__ = (
    "pipe_non_blocking_set",
    "pipe_non_blocking_is_error_blocking",
    "PortableBlockingIOError",
    )

import os


if os.name == "nt":
    def pipe_non_blocking_set(fd):
        # Constant could define globally but avoid polluting the name-space
        # thanks to: https://stackoverflow.com/questions/34504970
        import msvcrt

        from ctypes import windll, byref, wintypes, WinError, POINTER
        from ctypes.wintypes import HANDLE, DWORD, BOOL

        LPDWORD = POINTER(DWORD)

        PIPE_NOWAIT = wintypes.DWORD(0x00000001)

        def pipe_no_wait(pipefd):
            SetNamedPipeHandleState = windll.kernel32.SetNamedPipeHandleState
            SetNamedPipeHandleState.argtypes = [HANDLE, LPDWORD, LPDWORD, LPDWORD]
            SetNamedPipeHandleState.restype = BOOL

            h = msvcrt.get_osfhandle(pipefd)

            res = windll.kernel32.SetNamedPipeHandleState(h, byref(PIPE_NOWAIT), None, None)
            if res == 0:
                print(WinError())
                return False
            return True

        return pipe_no_wait(fd)

    def pipe_non_blocking_is_error_blocking(ex):
        if not isinstance(ex, PortableBlockingIOError):
            return False
        from ctypes import GetLastError
        ERROR_NO_DATA = 232

        return (GetLastError() == ERROR_NO_DATA)

    PortableBlockingIOError = OSError
else:
    def pipe_non_blocking_set(fd):
        import fcntl
        fl = fcntl.fcntl(fd, fcntl.F_GETFL)
        fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
        return True

    def pipe_non_blocking_is_error_blocking(ex):
        if not isinstance(ex, PortableBlockingIOError):
            return False
        return True

    PortableBlockingIOError = BlockingIOError

To avoid reading incomplete data, I ended up writing my own readline generator (which returns the byte string for each line).

Its a generator so you can for example…

def non_blocking_readlines(f, chunk=1024):
    """
    Iterate over lines, yielding b'' when nothings left
    or when new data is not yet available.

    stdout_iter = iter(non_blocking_readlines(process.stdout))

    line = next(stdout_iter)  # will be a line or b''.
    """
    import os

    from .pipe_non_blocking import (
            pipe_non_blocking_set,
            pipe_non_blocking_is_error_blocking,
            PortableBlockingIOError,
            )

    fd = f.fileno()
    pipe_non_blocking_set(fd)

    blocks = []

    while True:
        try:
            data = os.read(fd, chunk)
            if not data:
                # case were reading finishes with no trailing newline
                yield b''.join(blocks)
                blocks.clear()
        except PortableBlockingIOError as ex:
            if not pipe_non_blocking_is_error_blocking(ex):
                raise ex

            yield b''
            continue

        while True:
            n = data.find(b'\n')
            if n == -1:
                break

            yield b''.join(blocks) + data[:n + 1]
            data = data[n + 1:]
            blocks.clear()
        blocks.append(data)

回答 14

我有原始发问者的问题,但不希望调用线程。我将Jesse的解决方案与管道中的直接read()以及我自己的用于行读取的缓冲区处理程序混合在一起(但是,我的子进程ping总是写完整的行<系统页面大小)。通过仅阅读通过gobject注册的io手表,可以避免繁忙的等待。这些天,我通常在gobject MainLoop中运行代码以避免线程。

def set_up_ping(ip, w):
# run the sub-process
# watch the resultant pipe
p = subprocess.Popen(['/bin/ping', ip], stdout=subprocess.PIPE)
# make stdout a non-blocking file
fl = fcntl.fcntl(p.stdout, fcntl.F_GETFL)
fcntl.fcntl(p.stdout, fcntl.F_SETFL, fl | os.O_NONBLOCK)
stdout_gid = gobject.io_add_watch(p.stdout, gobject.IO_IN, w)
return stdout_gid # for shutting down

观察者是

def watch(f, *other):
print 'reading',f.read()
return True

并且主程序设置ping,然后调用gobject邮件循环。

def main():
set_up_ping('192.168.1.8', watch)
# discard gid as unused here
gobject.MainLoop().run()

其他任何工作都附加到gobject中的回调中。

I have the original questioner’s problem, but did not wish to invoke threads. I mixed Jesse’s solution with a direct read() from the pipe, and my own buffer-handler for line reads (however, my sub-process – ping – always wrote full lines < a system page size). I avoid busy-waiting by only reading in a gobject-registered io watch. These days I usually run code within a gobject MainLoop to avoid threads.

def set_up_ping(ip, w):
# run the sub-process
# watch the resultant pipe
p = subprocess.Popen(['/bin/ping', ip], stdout=subprocess.PIPE)
# make stdout a non-blocking file
fl = fcntl.fcntl(p.stdout, fcntl.F_GETFL)
fcntl.fcntl(p.stdout, fcntl.F_SETFL, fl | os.O_NONBLOCK)
stdout_gid = gobject.io_add_watch(p.stdout, gobject.IO_IN, w)
return stdout_gid # for shutting down

The watcher is

def watch(f, *other):
print 'reading',f.read()
return True

And the main program sets up a ping and then calls gobject mail loop.

def main():
set_up_ping('192.168.1.8', watch)
# discard gid as unused here
gobject.MainLoop().run()

Any other work is attached to callbacks in gobject.


回答 15

在现代Python中,情况要好得多。

这是一个简单的子程序“ hello.py”:

#!/usr/bin/env python3

while True:
    i = input()
    if i == "quit":
        break
    print(f"hello {i}")

和一个与之交互的程序:

import asyncio


async def main():
    proc = await asyncio.subprocess.create_subprocess_exec(
        "./hello.py", stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE
    )
    proc.stdin.write(b"bob\n")
    print(await proc.stdout.read(1024))
    proc.stdin.write(b"alice\n")
    print(await proc.stdout.read(1024))
    proc.stdin.write(b"quit\n")
    await proc.wait()


asyncio.run(main())

打印出来:

b'hello bob\n'
b'hello alice\n'

请注意,实际模式(在此处以及在相关问题中,几乎所有先前的答案也是如此)是将子级的stdout文件描述符设置为非阻塞,然后在某种选择循环中对其进行轮询。这些天,当然,该循环是由asyncio提供的。

Things are a lot better in modern Python.

Here’s a simple child program, “hello.py”:

#!/usr/bin/env python3

while True:
    i = input()
    if i == "quit":
        break
    print(f"hello {i}")

And a program to interact with it:

import asyncio


async def main():
    proc = await asyncio.subprocess.create_subprocess_exec(
        "./hello.py", stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE
    )
    proc.stdin.write(b"bob\n")
    print(await proc.stdout.read(1024))
    proc.stdin.write(b"alice\n")
    print(await proc.stdout.read(1024))
    proc.stdin.write(b"quit\n")
    await proc.wait()


asyncio.run(main())

That prints out:

b'hello bob\n'
b'hello alice\n'

Note that the actual pattern, which is also by almost all of the previous answers, both here and in related questions, is to set the child’s stdout file descriptor to non-blocking and then poll it in some sort of select loop. These days, of course, that loop is provided by asyncio.


回答 16

选择模块可以帮助您确定下一个有用的输入。

但是,您几乎总是对单独的线程感到满意。一个阻止读取标准输入,另一种则在您不希望阻止的位置进行。

The select module helps you determine where the next useful input is.

However, you’re almost always happier with separate threads. One does a blocking read the stdin, another does wherever it is you don’t want blocked.


回答 17

为什么要打扰线程和队列?与readline()不同,BufferedReader.read1()不会阻塞等待\ r \ n,如果有任何输出进入,它将返回ASAP。

#!/usr/bin/python
from subprocess import Popen, PIPE, STDOUT
import io

def __main__():
    try:
        p = Popen( ["ping", "-n", "3", "127.0.0.1"], stdin=PIPE, stdout=PIPE, stderr=STDOUT )
    except: print("Popen failed"); quit()
    sout = io.open(p.stdout.fileno(), 'rb', closefd=False)
    while True:
        buf = sout.read1(1024)
        if len(buf) == 0: break
        print buf,

if __name__ == '__main__':
    __main__()

why bothering thread&queue? unlike readline(), BufferedReader.read1() wont block waiting for \r\n, it returns ASAP if there is any output coming in.

#!/usr/bin/python
from subprocess import Popen, PIPE, STDOUT
import io

def __main__():
    try:
        p = Popen( ["ping", "-n", "3", "127.0.0.1"], stdin=PIPE, stdout=PIPE, stderr=STDOUT )
    except: print("Popen failed"); quit()
    sout = io.open(p.stdout.fileno(), 'rb', closefd=False)
    while True:
        buf = sout.read1(1024)
        if len(buf) == 0: break
        print buf,

if __name__ == '__main__':
    __main__()

回答 18

以我为例,我需要一个日志记录模块,该模块可以捕获后台应用程序的输出并对其进行扩充(添加时间戳,颜色等)。

我最后得到一个执行实际I / O的后台线程。以下代码仅适用于POSIX平台。我剥去了不必要的部分。

如果有人打算长期使用此野兽,请考虑管理开放描述符。就我而言,这不是一个大问题。

# -*- python -*-
import fcntl
import threading
import sys, os, errno
import subprocess

class Logger(threading.Thread):
    def __init__(self, *modules):
        threading.Thread.__init__(self)
        try:
            from select import epoll, EPOLLIN
            self.__poll = epoll()
            self.__evt = EPOLLIN
            self.__to = -1
        except:
            from select import poll, POLLIN
            print 'epoll is not available'
            self.__poll = poll()
            self.__evt = POLLIN
            self.__to = 100
        self.__fds = {}
        self.daemon = True
        self.start()

    def run(self):
        while True:
            events = self.__poll.poll(self.__to)
            for fd, ev in events:
                if (ev&self.__evt) != self.__evt:
                    continue
                try:
                    self.__fds[fd].run()
                except Exception, e:
                    print e

    def add(self, fd, log):
        assert not self.__fds.has_key(fd)
        self.__fds[fd] = log
        self.__poll.register(fd, self.__evt)

class log:
    logger = Logger()

    def __init__(self, name):
        self.__name = name
        self.__piped = False

    def fileno(self):
        if self.__piped:
            return self.write
        self.read, self.write = os.pipe()
        fl = fcntl.fcntl(self.read, fcntl.F_GETFL)
        fcntl.fcntl(self.read, fcntl.F_SETFL, fl | os.O_NONBLOCK)
        self.fdRead = os.fdopen(self.read)
        self.logger.add(self.read, self)
        self.__piped = True
        return self.write

    def __run(self, line):
        self.chat(line, nl=False)

    def run(self):
        while True:
            try: line = self.fdRead.readline()
            except IOError, exc:
                if exc.errno == errno.EAGAIN:
                    return
                raise
            self.__run(line)

    def chat(self, line, nl=True):
        if nl: nl = '\n'
        else: nl = ''
        sys.stdout.write('[%s] %s%s' % (self.__name, line, nl))

def system(command, param=[], cwd=None, env=None, input=None, output=None):
    args = [command] + param
    p = subprocess.Popen(args, cwd=cwd, stdout=output, stderr=output, stdin=input, env=env, bufsize=0)
    p.wait()

ls = log('ls')
ls.chat('go')
system("ls", ['-l', '/'], output=ls)

date = log('date')
date.chat('go')
system("date", output=date)

In my case I needed a logging module that catches the output from the background applications and augments it(adding time-stamps, colors, etc.).

I ended up with a background thread that does the actual I/O. Following code is only for POSIX platforms. I stripped non-essential parts.

If someone is going to use this beast for long runs consider managing open descriptors. In my case it was not a big problem.

# -*- python -*-
import fcntl
import threading
import sys, os, errno
import subprocess

class Logger(threading.Thread):
    def __init__(self, *modules):
        threading.Thread.__init__(self)
        try:
            from select import epoll, EPOLLIN
            self.__poll = epoll()
            self.__evt = EPOLLIN
            self.__to = -1
        except:
            from select import poll, POLLIN
            print 'epoll is not available'
            self.__poll = poll()
            self.__evt = POLLIN
            self.__to = 100
        self.__fds = {}
        self.daemon = True
        self.start()

    def run(self):
        while True:
            events = self.__poll.poll(self.__to)
            for fd, ev in events:
                if (ev&self.__evt) != self.__evt:
                    continue
                try:
                    self.__fds[fd].run()
                except Exception, e:
                    print e

    def add(self, fd, log):
        assert not self.__fds.has_key(fd)
        self.__fds[fd] = log
        self.__poll.register(fd, self.__evt)

class log:
    logger = Logger()

    def __init__(self, name):
        self.__name = name
        self.__piped = False

    def fileno(self):
        if self.__piped:
            return self.write
        self.read, self.write = os.pipe()
        fl = fcntl.fcntl(self.read, fcntl.F_GETFL)
        fcntl.fcntl(self.read, fcntl.F_SETFL, fl | os.O_NONBLOCK)
        self.fdRead = os.fdopen(self.read)
        self.logger.add(self.read, self)
        self.__piped = True
        return self.write

    def __run(self, line):
        self.chat(line, nl=False)

    def run(self):
        while True:
            try: line = self.fdRead.readline()
            except IOError, exc:
                if exc.errno == errno.EAGAIN:
                    return
                raise
            self.__run(line)

    def chat(self, line, nl=True):
        if nl: nl = '\n'
        else: nl = ''
        sys.stdout.write('[%s] %s%s' % (self.__name, line, nl))

def system(command, param=[], cwd=None, env=None, input=None, output=None):
    args = [command] + param
    p = subprocess.Popen(args, cwd=cwd, stdout=output, stderr=output, stdin=input, env=env, bufsize=0)
    p.wait()

ls = log('ls')
ls.chat('go')
system("ls", ['-l', '/'], output=ls)

date = log('date')
date.chat('go')
system("date", output=date)

回答 19

我的问题有点不同,因为我想从正在运行的进程中同时收集stdout和stderr,但最终还是一样,因为我想在小部件中生成其生成的输出。

我不想诉诸使用Queues或其他线程的许多建议的解决方法,因为执行诸如运行另一个脚本并收集其输出之类的常见任务不需要它们。

阅读建议的解决方案和python文档后,我通过以下实现解决了我的问题。是的,它仅适用于POSIX,因为我正在使用select函数调用。

我同意这些文档令人困惑,并且对于这种常见的脚本编写任务而言,实现很尴尬。我认为python的旧版本具有不同的默认值Popen和不同的解释,因此造成了很多混乱。这对于Python 2.7.12和3.5.2似乎都很好。

关键是设置bufsize=1行缓冲,然后universal_newlines=True处理为文本文件,而不是二进制文件,而二进制文件似乎在设置时成为默认文件bufsize=1

class workerThread(QThread):
   def __init__(self, cmd):
      QThread.__init__(self)
      self.cmd = cmd
      self.result = None           ## return code
      self.error = None            ## flag indicates an error
      self.errorstr = ""           ## info message about the error

   def __del__(self):
      self.wait()
      DEBUG("Thread removed")

   def run(self):
      cmd_list = self.cmd.split(" ")   
      try:
         cmd = subprocess.Popen(cmd_list, bufsize=1, stdin=None
                                        , universal_newlines=True
                                        , stderr=subprocess.PIPE
                                        , stdout=subprocess.PIPE)
      except OSError:
         self.error = 1
         self.errorstr = "Failed to execute " + self.cmd
         ERROR(self.errorstr)
      finally:
         VERBOSE("task started...")
      import select
      while True:
         try:
            r,w,x = select.select([cmd.stdout, cmd.stderr],[],[])
            if cmd.stderr in r:
               line = cmd.stderr.readline()
               if line != "":
                  line = line.strip()
                  self.emit(SIGNAL("update_error(QString)"), line)
            if cmd.stdout in r:
               line = cmd.stdout.readline()
               if line == "":
                  break
               line = line.strip()
               self.emit(SIGNAL("update_output(QString)"), line)
         except IOError:
            pass
      cmd.wait()
      self.result = cmd.returncode
      if self.result < 0:
         self.error = 1
         self.errorstr = "Task terminated by signal " + str(self.result)
         ERROR(self.errorstr)
         return
      if self.result:
         self.error = 1
         self.errorstr = "exit code " + str(self.result)
         ERROR(self.errorstr)
         return
      return

ERROR,DEBUG和VERBOSE只是将输出打印到终端的宏。

该解决方案的IMHO 99.99%有效,因为它仍使用阻塞readline功能,因此我们认为子过程很好并且输出了完整的行。

我欢迎反馈以改进解决方案,因为我还是Python的新手。

My problem is a bit different as I wanted to collect both stdout and stderr from a running process, but ultimately the same since I wanted to render the output in a widget as its generated.

I did not want to resort to many of the proposed workarounds using Queues or additional Threads as they should not be necessary to perform such a common task as running another script and collecting its output.

After reading the proposed solutions and python docs I resolved my issue with the implementation below. Yes it only works for POSIX as I’m using the select function call.

I agree that the docs are confusing and the implementation is awkward for such a common scripting task. I believe that older versions of python have different defaults for Popen and different explanations so that created a lot of confusion. This seems to work well for both Python 2.7.12 and 3.5.2.

The key was to set bufsize=1 for line buffering and then universal_newlines=True to process as a text file instead of a binary which seems to become the default when setting bufsize=1.

class workerThread(QThread):
   def __init__(self, cmd):
      QThread.__init__(self)
      self.cmd = cmd
      self.result = None           ## return code
      self.error = None            ## flag indicates an error
      self.errorstr = ""           ## info message about the error

   def __del__(self):
      self.wait()
      DEBUG("Thread removed")

   def run(self):
      cmd_list = self.cmd.split(" ")   
      try:
         cmd = subprocess.Popen(cmd_list, bufsize=1, stdin=None
                                        , universal_newlines=True
                                        , stderr=subprocess.PIPE
                                        , stdout=subprocess.PIPE)
      except OSError:
         self.error = 1
         self.errorstr = "Failed to execute " + self.cmd
         ERROR(self.errorstr)
      finally:
         VERBOSE("task started...")
      import select
      while True:
         try:
            r,w,x = select.select([cmd.stdout, cmd.stderr],[],[])
            if cmd.stderr in r:
               line = cmd.stderr.readline()
               if line != "":
                  line = line.strip()
                  self.emit(SIGNAL("update_error(QString)"), line)
            if cmd.stdout in r:
               line = cmd.stdout.readline()
               if line == "":
                  break
               line = line.strip()
               self.emit(SIGNAL("update_output(QString)"), line)
         except IOError:
            pass
      cmd.wait()
      self.result = cmd.returncode
      if self.result < 0:
         self.error = 1
         self.errorstr = "Task terminated by signal " + str(self.result)
         ERROR(self.errorstr)
         return
      if self.result:
         self.error = 1
         self.errorstr = "exit code " + str(self.result)
         ERROR(self.errorstr)
         return
      return

ERROR, DEBUG and VERBOSE are simply macros that print output to the terminal.

This solution is IMHO 99.99% effective as it still uses the blocking readline function, so we assume the sub process is nice and outputs complete lines.

I welcome feedback to improve the solution as I am still new to Python.


回答 20

我已经基于JF Sebastian的解决方案创建了一个库。您可以使用它。

https://github.com/cenkalti/什么


回答 21

从JF Sebastian的答案以及其他几个方面的工作出发,我组成了一个简单的子流程管理器。它提供对请求的非阻塞读取,以及并行运行多个进程。它不使用任何操作系统特定的调用(据我所知),因此应该可以在任何地方使用。

可以从pypi获得,所以pip install shelljob。有关示例和完整文档,请参考项目页面

Working from J.F. Sebastian’s answer, and several other sources, I’ve put together a simple subprocess manager. It provides the request non-blocking reading, as well as running several processes in parallel. It doesn’t use any OS-specific call (that I’m aware) and thus should work anywhere.

It’s available from pypi, so just pip install shelljob. Refer to the project page for examples and full docs.


回答 22

编辑:此实现仍会阻止。请改用JFSebastian的答案

我尝试了最佳答案,但是线程代码的额外风险和维护令人担忧。

通过io模块(仅限于2.6),我发现BufferedReader。这是我的无线程,非阻塞解决方案。

import io
from subprocess import PIPE, Popen

p = Popen(['myprogram.exe'], stdout=PIPE)

SLEEP_DELAY = 0.001

# Create an io.BufferedReader on the file descriptor for stdout
with io.open(p.stdout.fileno(), 'rb', closefd=False) as buffer:
  while p.poll() == None:
      time.sleep(SLEEP_DELAY)
      while '\n' in bufferedStdout.peek(bufferedStdout.buffer_size):
          line = buffer.readline()
          # do stuff with the line

  # Handle any remaining output after the process has ended
  while buffer.peek():
    line = buffer.readline()
    # do stuff with the line

EDIT: This implementation still blocks. Use J.F.Sebastian’s answer instead.

I tried the top answer, but the additional risk and maintenance of thread code was worrisome.

Looking through the io module (and being limited to 2.6), I found BufferedReader. This is my threadless, non-blocking solution.

import io
from subprocess import PIPE, Popen

p = Popen(['myprogram.exe'], stdout=PIPE)

SLEEP_DELAY = 0.001

# Create an io.BufferedReader on the file descriptor for stdout
with io.open(p.stdout.fileno(), 'rb', closefd=False) as buffer:
  while p.poll() == None:
      time.sleep(SLEEP_DELAY)
      while '\n' in bufferedStdout.peek(bufferedStdout.buffer_size):
          line = buffer.readline()
          # do stuff with the line

  # Handle any remaining output after the process has ended
  while buffer.peek():
    line = buffer.readline()
    # do stuff with the line

回答 23

我最近偶然发现了同一个问题,我需要在非阻塞模式下一次从流中读取一行(在子进程中运行尾部),我想避免下一个问题:不要刻录cpu,不要按一个字节读取流(就像readline一样),等等

这是我的实现 https://gist.github.com/grubberr/5501e1a9760c3eab5e0a 它不支持Windows(民意测验),不处理EOF,但是对我来说很好

I recently stumbled upon on the same problem I need to read one line at time from stream ( tail run in subprocess ) in non-blocking mode I wanted to avoid next problems: not to burn cpu, don’t read stream by one byte (like readline did ), etc

Here is my implementation https://gist.github.com/grubberr/5501e1a9760c3eab5e0a it don’t support windows (poll), don’t handle EOF, but it works for me well


回答 24

这是在子进程中运行交互式命令的示例,并且stdout是使用伪终端进行交互的。您可以参考:https : //stackoverflow.com/a/43012138/3555925

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys
import select
import termios
import tty
import pty
from subprocess import Popen

command = 'bash'
# command = 'docker run -it --rm centos /bin/bash'.split()

# save original tty setting then set it to raw mode
old_tty = termios.tcgetattr(sys.stdin)
tty.setraw(sys.stdin.fileno())

# open pseudo-terminal to interact with subprocess
master_fd, slave_fd = pty.openpty()

# use os.setsid() make it run in a new process group, or bash job control will not be enabled
p = Popen(command,
          preexec_fn=os.setsid,
          stdin=slave_fd,
          stdout=slave_fd,
          stderr=slave_fd,
          universal_newlines=True)

while p.poll() is None:
    r, w, e = select.select([sys.stdin, master_fd], [], [])
    if sys.stdin in r:
        d = os.read(sys.stdin.fileno(), 10240)
        os.write(master_fd, d)
    elif master_fd in r:
        o = os.read(master_fd, 10240)
        if o:
            os.write(sys.stdout.fileno(), o)

# restore tty settings back
termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_tty)

This is a example to run interactive command in subprocess, and the stdout is interactive by using pseudo terminal. You can refer to: https://stackoverflow.com/a/43012138/3555925

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import sys
import select
import termios
import tty
import pty
from subprocess import Popen

command = 'bash'
# command = 'docker run -it --rm centos /bin/bash'.split()

# save original tty setting then set it to raw mode
old_tty = termios.tcgetattr(sys.stdin)
tty.setraw(sys.stdin.fileno())

# open pseudo-terminal to interact with subprocess
master_fd, slave_fd = pty.openpty()

# use os.setsid() make it run in a new process group, or bash job control will not be enabled
p = Popen(command,
          preexec_fn=os.setsid,
          stdin=slave_fd,
          stdout=slave_fd,
          stderr=slave_fd,
          universal_newlines=True)

while p.poll() is None:
    r, w, e = select.select([sys.stdin, master_fd], [], [])
    if sys.stdin in r:
        d = os.read(sys.stdin.fileno(), 10240)
        os.write(master_fd, d)
    elif master_fd in r:
        o = os.read(master_fd, 10240)
        if o:
            os.write(sys.stdout.fileno(), o)

# restore tty settings back
termios.tcsetattr(sys.stdin, termios.TCSADRAIN, old_tty)

回答 25

此解决方案使用 select模块从IO流“读取任何可用数据”。此功能最初会阻塞,直到有可用数据为止,然后才读取可用数据,并且不会进一步阻塞。

考虑到它使用select模块的事实,这仅适用于Unix。

该代码完全符合PEP8。

import select


def read_available(input_stream, max_bytes=None):
    """
    Blocks until any data is available, then all available data is then read and returned.
    This function returns an empty string when end of stream is reached.

    Args:
        input_stream: The stream to read from.
        max_bytes (int|None): The maximum number of bytes to read. This function may return fewer bytes than this.

    Returns:
        str
    """
    # Prepare local variables
    input_streams = [input_stream]
    empty_list = []
    read_buffer = ""

    # Initially block for input using 'select'
    if len(select.select(input_streams, empty_list, empty_list)[0]) > 0:

        # Poll read-readiness using 'select'
        def select_func():
            return len(select.select(input_streams, empty_list, empty_list, 0)[0]) > 0

        # Create while function based on parameters
        if max_bytes is not None:
            def while_func():
                return (len(read_buffer) < max_bytes) and select_func()
        else:
            while_func = select_func

        while True:
            # Read single byte at a time
            read_data = input_stream.read(1)
            if len(read_data) == 0:
                # End of stream
                break
            # Append byte to string buffer
            read_buffer += read_data
            # Check if more data is available
            if not while_func():
                break

    # Return read buffer
    return read_buffer

This solution uses the select module to “read any available data” from an IO stream. This function blocks initially until data is available, but then reads only the data that is available and doesn’t block further.

Given the fact that it uses the select module, this only works on Unix.

The code is fully PEP8-compliant.

import select


def read_available(input_stream, max_bytes=None):
    """
    Blocks until any data is available, then all available data is then read and returned.
    This function returns an empty string when end of stream is reached.

    Args:
        input_stream: The stream to read from.
        max_bytes (int|None): The maximum number of bytes to read. This function may return fewer bytes than this.

    Returns:
        str
    """
    # Prepare local variables
    input_streams = [input_stream]
    empty_list = []
    read_buffer = ""

    # Initially block for input using 'select'
    if len(select.select(input_streams, empty_list, empty_list)[0]) > 0:

        # Poll read-readiness using 'select'
        def select_func():
            return len(select.select(input_streams, empty_list, empty_list, 0)[0]) > 0

        # Create while function based on parameters
        if max_bytes is not None:
            def while_func():
                return (len(read_buffer) < max_bytes) and select_func()
        else:
            while_func = select_func

        while True:
            # Read single byte at a time
            read_data = input_stream.read(1)
            if len(read_data) == 0:
                # End of stream
                break
            # Append byte to string buffer
            read_buffer += read_data
            # Check if more data is available
            if not while_func():
                break

    # Return read buffer
    return read_buffer

回答 26

我也遇到了Jesse所描述的问题,并像BradleyAndy和其他人一样使用“选择”来解决该问题,但是以阻塞模式进行以避免繁忙的循环。它使用虚拟管道作为伪造的标准输入。选择阻塞并等待标准输入或管道准备就绪。按下键时,stdin会取消选择的阻塞,并且可以使用read(1)检索键值。当不同的线程写入管道时,管道将解除对选择的阻塞,并且可以将其视为对stdin的需求已结束的指示。这是一些参考代码:

import sys
import os
from select import select

# -------------------------------------------------------------------------    
# Set the pipe (fake stdin) to simulate a final key stroke
# which will unblock the select statement
readEnd, writeEnd = os.pipe()
readFile = os.fdopen(readEnd)
writeFile = os.fdopen(writeEnd, "w")

# -------------------------------------------------------------------------
def getKey():

    # Wait for stdin or pipe (fake stdin) to be ready
    dr,dw,de = select([sys.__stdin__, readFile], [], [])

    # If stdin is the one ready then read it and return value
    if sys.__stdin__ in dr:
        return sys.__stdin__.read(1)   # For Windows use ----> getch() from module msvcrt

    # Must finish
    else:
        return None

# -------------------------------------------------------------------------
def breakStdinRead():
    writeFile.write(' ')
    writeFile.flush()

# -------------------------------------------------------------------------
# MAIN CODE

# Get key stroke
key = getKey()

# Keyboard input
if key:
    # ... do your stuff with the key value

# Faked keystroke
else:
    # ... use of stdin finished

# -------------------------------------------------------------------------
# OTHER THREAD CODE

breakStdinRead()

I also faced the problem described by Jesse and solved it by using “select” as Bradley, Andy and others did but in a blocking mode to avoid a busy loop. It uses a dummy Pipe as a fake stdin. The select blocks and wait for either stdin or the pipe to be ready. When a key is pressed stdin unblocks the select and the key value can be retrieved with read(1). When a different thread writes to the pipe then the pipe unblocks the select and it can be taken as an indication that the need for stdin is over. Here is some reference code:

import sys
import os
from select import select

# -------------------------------------------------------------------------    
# Set the pipe (fake stdin) to simulate a final key stroke
# which will unblock the select statement
readEnd, writeEnd = os.pipe()
readFile = os.fdopen(readEnd)
writeFile = os.fdopen(writeEnd, "w")

# -------------------------------------------------------------------------
def getKey():

    # Wait for stdin or pipe (fake stdin) to be ready
    dr,dw,de = select([sys.__stdin__, readFile], [], [])

    # If stdin is the one ready then read it and return value
    if sys.__stdin__ in dr:
        return sys.__stdin__.read(1)   # For Windows use ----> getch() from module msvcrt

    # Must finish
    else:
        return None

# -------------------------------------------------------------------------
def breakStdinRead():
    writeFile.write(' ')
    writeFile.flush()

# -------------------------------------------------------------------------
# MAIN CODE

# Get key stroke
key = getKey()

# Keyboard input
if key:
    # ... do your stuff with the key value

# Faked keystroke
else:
    # ... use of stdin finished

# -------------------------------------------------------------------------
# OTHER THREAD CODE

breakStdinRead()

回答 27

尝试wexpect,它是pexpect的Windows替代产品

import wexpect

p = wexpect.spawn('myprogram.exe')
p.stdout.readline('.')               // regex pattern of any character
output_str = p.after()

Try wexpect, which is the windows alternative of pexpect.

import wexpect

p = wexpect.spawn('myprogram.exe')
p.stdout.readline('.')               // regex pattern of any character
output_str = p.after()

回答 28

在类似Unix的系统和Python 3.5+上os.set_blocking,它的功能完全符合其要求。

import os
import time
import subprocess

cmd = 'python3', '-c', 'import time; [(print(i), time.sleep(1)) for i in range(5)]'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
os.set_blocking(p.stdout.fileno(), False)
start = time.time()
while True:
    # first iteration always produces empty byte string in non-blocking mode
    for i in range(2):    
        line = p.stdout.readline()
        print(i, line)
        time.sleep(0.5)
    if time.time() > start + 5:
        break
p.terminate()

输出:

1 b''
2 b'0\n'
1 b''
2 b'1\n'
1 b''
2 b'2\n'
1 b''
2 b'3\n'
1 b''
2 b'4\n'

os.set_blocking评论的是:

0 b'0\n'
1 b'1\n'
0 b'2\n'
1 b'3\n'
0 b'4\n'
1 b''

On Unix-like systems and Python 3.5+ there’s os.set_blocking which does exactly what it says.

import os
import time
import subprocess

cmd = 'python3', '-c', 'import time; [(print(i), time.sleep(1)) for i in range(5)]'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
os.set_blocking(p.stdout.fileno(), False)
start = time.time()
while True:
    # first iteration always produces empty byte string in non-blocking mode
    for i in range(2):    
        line = p.stdout.readline()
        print(i, line)
        time.sleep(0.5)
    if time.time() > start + 5:
        break
p.terminate()

This outputs:

1 b''
2 b'0\n'
1 b''
2 b'1\n'
1 b''
2 b'2\n'
1 b''
2 b'3\n'
1 b''
2 b'4\n'

With os.set_blocking commented it’s:

0 b'0\n'
1 b'1\n'
0 b'2\n'
1 b'3\n'
0 b'4\n'
1 b''

回答 29

这是一个支持python中的非阻塞读取和后台写入的模块:

https://pypi.python.org/pypi/python-nonblock

提供功能

nonblock_read将从流中读取数据(如果有),否则返回一个空字符串(如果流的另一端关闭并且已读取所有可能的数据,则返回None)

您还可以考虑使用python-subprocess2模块,

https://pypi.python.org/pypi/python-subprocess2

这将添加到子流程模块。因此,在从“ subprocess.Popen”返回的对象上添加了附加方法runInBackground。这将启动一个线程并返回一个对象,该对象将在将内容写入stdout / stderr时自动填充,而不会阻塞您的主线程。

请享用!

Here is a module that supports non-blocking reads and background writes in python:

https://pypi.python.org/pypi/python-nonblock

Provides a function,

nonblock_read which will read data from the stream, if available, otherwise return an empty string (or None if the stream is closed on the other side and all possible data has been read)

You may also consider the python-subprocess2 module,

https://pypi.python.org/pypi/python-subprocess2

which adds to the subprocess module. So on the object returned from “subprocess.Popen” is added an additional method, runInBackground. This starts a thread and returns an object which will automatically be populated as stuff is written to stdout/stderr, without blocking your main thread.

Enjoy!


Python3中的StringIO

问题:Python3中的StringIO

我正在使用Python 3.2.1,但无法导入StringIO模块。我使用 io.StringIO和它的作品,但我不能使用它numpygenfromtxt是这样的:

x="1 3\n 4.5 8"        
numpy.genfromtxt(io.StringIO(x))

我收到以下错误:

TypeError: Can't convert 'bytes' object to str implicitly  

当我写的import StringIO时候说

ImportError: No module named 'StringIO'

I am using Python 3.2.1 and I can’t import the StringIO module. I use io.StringIO and it works, but I can’t use it with numpy‘s genfromtxt like this:

x="1 3\n 4.5 8"        
numpy.genfromtxt(io.StringIO(x))

I get the following error:

TypeError: Can't convert 'bytes' object to str implicitly  

and when I write import StringIO it says

ImportError: No module named 'StringIO'

回答 0

当我写导入StringIO时,它说没有这样的模块。

Python 3.0的新功能开始

StringIOcStringIO模块都没有了。而是导入io 模块,分别将io.StringIOio.BytesIO用于文本和数据。


修复一些Python 2代码以使其在Python 3(caveat emptor)中工作的可能有用的方法:

try:
    from StringIO import StringIO ## for Python 2
except ImportError:
    from io import StringIO ## for Python 3

注意:此示例可能与问题的主要内容相切,仅作为一般性地解决缺失StringIO模块时要考虑的内容。 有关消息的更直接解决方案TypeError: Can't convert 'bytes' object to str implicitly,请参阅此答案

when i write import StringIO it says there is no such module.

From What’s New In Python 3.0:

The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.

.


A possibly useful method of fixing some Python 2 code to also work in Python 3 (caveat emptor):

try:
    from StringIO import StringIO ## for Python 2
except ImportError:
    from io import StringIO ## for Python 3

Note: This example may be tangential to the main issue of the question and is included only as something to consider when generically addressing the missing StringIO module. For a more direct solution the the message TypeError: Can't convert 'bytes' object to str implicitly, see this answer.


回答 1

就我而言,我使用了:

from io import StringIO

In my case I have used:

from io import StringIO

回答 2

在Python 3上,numpy.genfromtxt期望字节流。使用以下内容:

numpy.genfromtxt(io.BytesIO(x.encode()))

On Python 3 numpy.genfromtxt expects a bytes stream. Use the following:

numpy.genfromtxt(io.BytesIO(x.encode()))

回答 3

谢谢OP的问题,以及Roman的回答。我不得不花点时间找到它。希望以下内容对其他人有所帮助。

Python 2.7

请参阅:https//docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

import numpy as np
from StringIO import StringIO

data = "1, abc , 2\n 3, xxx, 4"

print type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", dtype="|S3", autostrip=True)
"""
[['1' 'abc' '2']
 ['3' 'xxx' '4']]
"""

print '\n', type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

Python 3.5:

import numpy as np
from io import StringIO
import io

data = "1, abc , 2\n 3, xxx, 4"
#print(data)
"""
1, abc , 2
 3, xxx, 4
"""

#print(type(data))
"""
<class 'str'>
"""

#np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
# TypeError: Can't convert 'bytes' object to str implicitly

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", dtype="|S3", autostrip=True))
"""
[[b'1' b'abc' b'2']
 [b'3' b'xxx' b'4']]
"""

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", autostrip=True))
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

在旁边:

dtype =“ | Sx”,其中x = {1,2,3,…}中的任何一个:

dtypes。Python中S1和S2之间的区别

“ | S1和| S2字符串是数据类型描述符;第一个意味着数组保存长度为1的字符串,第二个长度为2。…”

Thank you OP for your question, and Roman for your answer. I had to search a bit to find this; I hope the following helps others.

Python 2.7

See: https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

import numpy as np
from StringIO import StringIO

data = "1, abc , 2\n 3, xxx, 4"

print type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", dtype="|S3", autostrip=True)
"""
[['1' 'abc' '2']
 ['3' 'xxx' '4']]
"""

print '\n', type(data)
"""
<type 'str'>
"""

print '\n', np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

Python 3.5:

import numpy as np
from io import StringIO
import io

data = "1, abc , 2\n 3, xxx, 4"
#print(data)
"""
1, abc , 2
 3, xxx, 4
"""

#print(type(data))
"""
<class 'str'>
"""

#np.genfromtxt(StringIO(data), delimiter=",", autostrip=True)
# TypeError: Can't convert 'bytes' object to str implicitly

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", dtype="|S3", autostrip=True))
"""
[[b'1' b'abc' b'2']
 [b'3' b'xxx' b'4']]
"""

print('\n')
print(np.genfromtxt(io.BytesIO(data.encode()), delimiter=",", autostrip=True))
"""
[[  1.  nan   2.]
 [  3.  nan   4.]]
"""

Aside:

dtype=”|Sx”, where x = any of { 1, 2, 3, …}:

dtypes. Difference between S1 and S2 in Python

“The |S1 and |S2 strings are data type descriptors; the first means the array holds strings of length 1, the second of length 2. …”


回答 4

您可以从六个模块中使用StringIO

import six
import numpy

x = "1 3\n 4.5 8"
numpy.genfromtxt(six.StringIO(x))

You can use the StringIO from the six module:

import six
import numpy

x = "1 3\n 4.5 8"
numpy.genfromtxt(six.StringIO(x))

回答 5

Roman Shapovalov的代码应该在Python 3.x和2.6 / 2.7中都可以使用。这里还是完整的示例:

import io
import numpy
x = "1 3\n 4.5 8"
numpy.genfromtxt(io.BytesIO(x.encode()))

输出:

array([[ 1. ,  3. ],
       [ 4.5,  8. ]])

Python 3.x的说明:

  • numpy.genfromtxt 接受字节流(将类似文件的对象解释为字节而不是Unicode)。
  • io.BytesIO 接受字节字符串并返回字节流。 io.StringIO另一方面,将采用Unicode字符串并返回Unicode流。
  • x 被分配了一个字符串文字,在Python 3.x中是Unicode字符串。
  • encode()x提取Unicode字符串并从中取出一个字节字符串,从而提供io.BytesIO有效的参数。

Python 2.6 / 2.7的唯一区别是它x是一个字节字符串(假定from __future__ import unicode_literals未使用),然后encode()获取该字节字符串,x并仍然从中提取相同的字节字符串。因此结果是相同的。


由于这是SO最受欢迎的问题之一,因此StringIO,这里有一些有关import语句和不同Python版本的更多说明。

以下是采用字符串并返回流的类:

  • io.BytesIO(Python 2.6、2.7和3.x)-接收一个字节字符串。返回字节流。
  • io.StringIO(Python 2.6、2.7和3.x)-采用Unicode字符串。返回Unicode流。
  • StringIO.StringIO(Python 2.x)-接受字节字符串或Unicode字符串。如果为字节字符串,则返回字节流。如果是Unicode字符串,则返回Unicode流。
  • cStringIO.StringIO(Python 2.x)-的更快版本StringIO.StringIO,但不能采用包含非ASCII字符的Unicode字符串。

请注意,StringIO.StringIO导入为from StringIO import StringIO,然后用作StringIO(...)。要么这样做,要么您执行import StringIO然后使用StringIO.StringIO(...)。模块名称和类名称恰好是相同的。这类似于datetime这种方式。

使用什么,取决于您支持的Python版本:

  • 如果您仅支持Python 3.x:只需使用io.BytesIOio.StringIO取决于您使用的是哪种数据。

  • 如果您同时支持Python 2.6 / 2.7和3.x,或者正尝试将代码从2.6 / 2.7转换到3.x:最简单的选择仍然是使用io.BytesIOio.StringIO。尽管它StringIO.StringIO很灵活,因此在2.6 / 2.7中似乎是首选,但这种灵活性可能会掩盖3.x版本中将出现的错误。例如,我有一些使用StringIO.StringIOio.StringIO取决于Python版本的代码,但实际上是传递一个字节字符串,因此当我在Python 3.x中进行测试时,它失败了,必须加以修复。

    使用的另一个优点io.StringIO是对通用换行符的支持。如果你传递关键字参数newline=''io.StringIO,这将是能够在任何的分割线\n\r\n\r。我发现那StringIO.StringIO会绊倒\r尤其严重。

    请注意,如果您导入BytesIOStringIOsix,你StringIO.StringIO在Python 2.x和相应的类从io在Python 3.x的 如果您同意我之前的评估,实际上这是应该避免的一种情况,而应该six从中io引入。

  • 如果您支持Python 2.5或更低版本和3.x:您将需要StringIO.StringIO2.5或更低版本,因此您最好使用six。但是要意识到,同时支持2.5和3.x通常非常困难,因此,如果可能的话,应该考虑将最低支持的版本提高到2.6。

Roman Shapovalov’s code should work in Python 3.x as well as Python 2.6/2.7. Here it is again with the complete example:

import io
import numpy
x = "1 3\n 4.5 8"
numpy.genfromtxt(io.BytesIO(x.encode()))

Output:

array([[ 1. ,  3. ],
       [ 4.5,  8. ]])

Explanation for Python 3.x:

  • numpy.genfromtxt takes a byte stream (a file-like object interpreted as bytes instead of Unicode).
  • io.BytesIO takes a byte string and returns a byte stream. io.StringIO, on the other hand, would take a Unicode string and and return a Unicode stream.
  • x gets assigned a string literal, which in Python 3.x is a Unicode string.
  • encode() takes the Unicode string x and makes a byte string out of it, thus giving io.BytesIO a valid argument.

The only difference for Python 2.6/2.7 is that x is a byte string (assuming from __future__ import unicode_literals is not used), and then encode() takes the byte string x and still makes the same byte string out of it. So the result is the same.


Since this is one of SO’s most popular questions regarding StringIO, here’s some more explanation on the import statements and different Python versions.

Here are the classes which take a string and return a stream:

  • io.BytesIO (Python 2.6, 2.7, and 3.x) – Takes a byte string. Returns a byte stream.
  • io.StringIO (Python 2.6, 2.7, and 3.x) – Takes a Unicode string. Returns a Unicode stream.
  • StringIO.StringIO (Python 2.x) – Takes a byte string or Unicode string. If byte string, returns a byte stream. If Unicode string, returns a Unicode stream.
  • cStringIO.StringIO (Python 2.x) – Faster version of StringIO.StringIO, but can’t take Unicode strings which contain non-ASCII characters.

Note that StringIO.StringIO is imported as from StringIO import StringIO, then used as StringIO(...). Either that, or you do import StringIO and then use StringIO.StringIO(...). The module name and class name just happen to be the same. It’s similar to datetime that way.

What to use, depending on your supported Python versions:

  • If you only support Python 3.x: Just use io.BytesIO or io.StringIO depending on what kind of data you’re working with.

  • If you support both Python 2.6/2.7 and 3.x, or are trying to transition your code from 2.6/2.7 to 3.x: The easiest option is still to use io.BytesIO or io.StringIO. Although StringIO.StringIO is flexible and thus seems preferred for 2.6/2.7, that flexibility could mask bugs that will manifest in 3.x. For example, I had some code which used StringIO.StringIO or io.StringIO depending on Python version, but I was actually passing a byte string, so when I got around to testing it in Python 3.x it failed and had to be fixed.

    Another advantage of using io.StringIO is the support for universal newlines. If you pass the keyword argument newline='' into io.StringIO, it will be able to split lines on any of \n, \r\n, or \r. I found that StringIO.StringIO would trip up on \r in particular.

    Note that if you import BytesIO or StringIO from six, you get StringIO.StringIO in Python 2.x and the appropriate class from io in Python 3.x. If you agree with my previous paragraphs’ assessment, this is actually one case where you should avoid six and just import from io instead.

  • If you support Python 2.5 or lower and 3.x: You’ll need StringIO.StringIO for 2.5 or lower, so you might as well use six. But realize that it’s generally very difficult to support both 2.5 and 3.x, so you should consider bumping your lowest supported version to 2.6 if at all possible.


回答 6

为了使此处的示例可 与Python 3.5.2一起使用,可以重写如下:

import io
data =io.BytesIO(b"1, 2, 3\n4, 5, 6") 
import numpy
numpy.genfromtxt(data, delimiter=",")

进行更改的原因可能是文件的内容以数据(字节)为单位,除非经过某种方式解码,否则它们不会生成文本。genfrombytes可能比genfromtxt

In order to make examples from here work with Python 3.5.2, you can rewrite as follows :

import io
data =io.BytesIO(b"1, 2, 3\n4, 5, 6") 
import numpy
numpy.genfromtxt(data, delimiter=",")

The reason for the change may be that the content of a file is in data (bytes) which do not make text until being decoded somehow. genfrombytes may be a better name than genfromtxt.


回答 7

尝试这个

从StringIO导入StringIO

x =“ 1 3 \ n 4.5 8”

numpy.genfromtxt(StringIO(x))

try this

from StringIO import StringIO

x=”1 3\n 4.5 8″

numpy.genfromtxt(StringIO(x))