标签归档:file

如何修改文本文件?

问题:如何修改文本文件?

我正在使用Python,并且想在不删除或复制文件的情况下将字符串插入文本文件。我怎样才能做到这一点?

I’m using Python, and would like to insert a string into a text file without deleting or copying the file. How can I do that?


回答 0

不幸的是,没有重写的方法就无法插入文件的中间。如先前的张贴者所指出的,您可以将文件追加到文件中或使用“搜索”覆盖文件的一部分,但是如果要在文件的开头或中间添加内容,则必须重写它。

这是操作系统,而不是Python。所有语言均相同。

我通常要做的是从文件中读取,进行修改并将其写到名为myfile.txt.tmp或类似名称的新文件中。这比将整个文件读入内存更好,因为文件可能太大了。临时文件完成后,我将其重命名为原始文件。

这是一种很好的安全方法,因为如果文件写入由于任何原因而崩溃或中止,您仍然可以拥有原始文件。

Unfortunately there is no way to insert into the middle of a file without re-writing it. As previous posters have indicated, you can append to a file or overwrite part of it using seek but if you want to add stuff at the beginning or the middle, you’ll have to rewrite it.

This is an operating system thing, not a Python thing. It is the same in all languages.

What I usually do is read from the file, make the modifications and write it out to a new file called myfile.txt.tmp or something like that. This is better than reading the whole file into memory because the file may be too large for that. Once the temporary file is completed, I rename it the same as the original file.

This is a good, safe way to do it because if the file write crashes or aborts for any reason, you still have your untouched original file.


Python-何时使用文件vs打开

问题:Python-何时使用文件vs打开

fileopenPython 和有什么不一样?我什么时候应该使用哪个?(假设我处于2.5级)

What’s the difference between file and open in Python? When should I use which one? (Say I’m in 2.5)


回答 0

您应该始终使用open()

文档所述:

打开文件时,最好使用open()而不是直接调用此构造函数。文件更适合类型测试(例如,编写“ isinstance(f,file)”)。

另外,自Python 3.0起file() 已被删除

You should always use open().

As the documentation states:

When opening a file, it’s preferable to use open() instead of invoking this constructor directly. file is more suited to type testing (for example, writing “isinstance(f, file)”).

Also, file() has been removed since Python 3.0.


回答 1

原因有两个:python哲学“应该有一种实现方法”并且file正在消失。

file是实际类型(使用例如file('myfile.txt')调用其构造函数)。open是工厂函数,它将返回文件对象。

在python 3.0 file中,它将从内置变为由io库中的多个类实现(有点类似于带有缓冲读取器的Java等)。

Two reasons: The python philosophy of “There ought to be one way to do it” and file is going away.

file is the actual type (using e.g. file('myfile.txt') is calling its constructor). open is a factory function that will return a file object.

In python 3.0 file is going to move from being a built-in to being implemented by multiple classes in the io library (somewhat similar to Java with buffered readers, etc.)


回答 2

file()是一种类型,例如int或列表。open()是用于打开文件的函数,它将返回一个file对象。

这是何时应使用open的示例:

f = open(filename, 'r')
for line in f:
    process(line)
f.close()

这是何时应使用文件的示例:

class LoggingFile(file):
    def write(self, data):
        sys.stderr.write("Wrote %d bytes\n" % len(data))
        super(LoggingFile, self).write(data)

如您所见,存在两者的充分理由和明确的用例。

file() is a type, like an int or a list. open() is a function for opening files, and will return a file object.

This is an example of when you should use open:

f = open(filename, 'r')
for line in f:
    process(line)
f.close()

This is an example of when you should use file:

class LoggingFile(file):
    def write(self, data):
        sys.stderr.write("Wrote %d bytes\n" % len(data))
        super(LoggingFile, self).write(data)

As you can see, there’s a good reason for both to exist, and a clear use-case for both.


回答 3

在功能上,两者是相同的;无论如何open都会调用file,因此当前的区别在于样式。在Python文档建议使用open

打开文件时,最好使用open()而不是直接调用文件构造函数。

原因是在将来的版本中不能保证它们是相同的(open将成为工厂函数,根据其打开的路径返回不同类型的对象)。

Functionally, the two are the same; open will call file anyway, so currently the difference is a matter of style. The Python docs recommend using open.

When opening a file, it’s preferable to use open() instead of invoking the file constructor directly.

The reason is that in future versions they is not guaranteed to be the same (open will become a factory function, which returns objects of different types depending on the path it’s opening).


回答 4

仅使用open()打开文件。file()实际上在3.0中已被删除,目前不推荐使用。他们之间有一种奇怪的关系,但是file()现在正在进行中,因此不再需要担心。

以下来自Python 2.6文档。[括号内的内容]由我添加。

打开文件时,最好使用open()而不是直接调用此[file()]构造函数。文件更适合类型测试(例如,编写isinstance(f,file)

Only ever use open() for opening files. file() is actually being removed in 3.0, and it’s deprecated at the moment. They’ve had a sort of strange relationship, but file() is going now, so there’s no need to worry anymore.

The following is from the Python 2.6 docs. [bracket stuff] added by me.

When opening a file, it’s preferable to use open() instead of invoking this [file()] constructor directly. file is more suited to type testing (for example, writing isinstance(f, file)


回答 5

Van Rossum先生说,尽管open()当前是file()的别名,但您应该使用open(),因为将来可能会改变。

According to Mr Van Rossum, although open() is currently an alias for file() you should use open() because this might change in the future.


这是从哪里来的:-*-编码:utf-8-*-

问题:这是从哪里来的:-*-编码:utf-8-*-

Python将以下内容识别为定义文件编码的指令:

# -*- coding: utf-8 -*-

我确实在(-*- var: value -*-)之前看到过这种说明。它从何而来?完整规范是什么,例如,值可以包含空格,特殊符号,换行符,甚至-*-本身吗?

我的程序将编写纯文本文件,我想使用这种格式在其中包含一些元数据。

Python recognizes the following as instruction which defines file’s encoding:

# -*- coding: utf-8 -*-

I definitely saw this kind of instructions before (-*- var: value -*-). Where does it come from? What is the full specification, e.g. can the value include spaces, special symbols, newlines, even -*- itself?

My program will be writing plain text files and I’d like to include some metadata in them using this format.


回答 0

这种指定Python文件编码的方式来自PEP 0263-定义Python源代码编码

GNU Emacs也可以识别它(请参阅Python语言参考,2.1.4编码声明),尽管我不知道它是否是第一个使用该语法的程序。

This way of specifying the encoding of a Python file comes from PEP 0263 – Defining Python Source Code Encodings.

It is also recognized by GNU Emacs (see Python Language Reference, 2.1.4 Encoding declarations), though I don’t know if it was the first program to use that syntax.


回答 1

# -*- coding: utf-8 -*-是Python 2的东西。在Python 3+中,源文件默认编码已经是UTF-8,并且该行是无用的。

请参阅:我应该在Python 3中使用编码声明吗?

pyupgrade是一个可以在代码上运行的工具,用于从Python 2中删除这些注释和其他不再有用的遗留物,例如让所有类都继承自object

# -*- coding: utf-8 -*- is a Python 2 thing. In Python 3+, the default encoding of source files is already UTF-8 and that line is useless.

See: Should I use encoding declaration in Python 3?

pyupgrade is a tool you can run on your code to remove those comments and other no-longer-useful leftovers from Python 2, like having all your classes inherit from object.


回答 2

这就是所谓的文件局部变量,Emacs可以理解并相应地进行设置。请参阅Emacs手册中的相应部分 -您可以在文件的页眉或页脚中定义它们

This is so called file local variables, that are understood by Emacs and set correspondingly. See corresponding section in Emacs manual – you can define them either in header or in footer of file


回答 3

在PyCharm中,我将其省略。它将关闭底部的UTF-8指示器,并警告该编码为硬编码。不要以为您需要上面提到的PyCharm评论。

In PyCharm, I’d leave it out. It turns off the UTF-8 indicator at the bottom with a warning that the encoding is hard-coded. Don’t think you need the PyCharm comment mentioned above.


从另一个文件导入变量?

问题:从另一个文件导入变量?

如何将变量从一个文件导入到另一个文件?

示例:file1具有变量x1以及x2如何将其传递给file2

如何将所有变量从一个导入到另一个?

How can I import variables from one file to another?

example: file1 has the variables x1 and x2 how to pass them to file2?

How can I import all of the variables from one to another?


回答 0

from file1 import *  

将导入file1中的所有对象和方法

from file1 import *  

will import all objects and methods in file1


回答 1

导入file1内部file2

要从文件1导入所有变量而不泛洪文件2的命名空间,请使用:

import file1

#now use file1.x1, file2.x2, ... to access those variables

要将所有变量从file1导入到file2的命名空间(不推荐):

from file1 import *
#now use x1, x2..

文档

虽然from module import *在模块级别使用是有效的,但通常不是一个好主意。首先,它失去了Python否则具有的重要属性-您可以知道每个顶级名称在您喜欢的编辑器中通过简单的“搜索”功能定义的位置。如果某些模块增加了其他功能或类,将来还会给自己带来麻烦。

Import file1 inside file2:

To import all variables from file1 without flooding file2’s namespace, use:

import file1

#now use file1.x1, file2.x2, ... to access those variables

To import all variables from file1 to file2’s namespace( not recommended):

from file1 import *
#now use x1, x2..

From the docs:

While it is valid to use from module import * at module level it is usually a bad idea. For one, this loses an important property Python otherwise has — you can know where each toplevel name is defined by a simple “search” function in your favourite editor. You also open yourself to trouble in the future, if some module grows additional functions or classes.


回答 2

最好显式导入x1x2

from file1 import x1, x2

这样可以避免file1在中工作时与变量和函数发生不必要的命名空间冲突file2

但是,如果您确实需要,可以导入所有变量:

from file1 import * 

Best to import x1 and x2 explicitly:

from file1 import x1, x2

This allows you to avoid unnecessary namespace conflicts with variables and functions from file1 while working in file2.

But if you really want, you can import all the variables:

from file1 import * 

回答 3

实际上,使用以下命令导入变量并不完全相同:

from file1 import x1
print(x1)

import file1
print(file1.x1)

在导入时,x1和file1.x1的值完全相同,但它们不是相同的变量。例如,在file1中调用一个修改x1的函数,然后尝试从主文件中打印该变量:您将看不到修改后的值。

Actually this is not really the same to import a variable with:

from file1 import x1
print(x1)

and

import file1
print(file1.x1)

Altough at import time x1 and file1.x1 have the same value, they are not the same variables. For instance, call a function in file1 that modifies x1 and then try to print the variable from the main file: you will not see the modified value.


回答 4

马克的回答是正确的。实际上,您可以打印变量的内存地址,print(hex(id(libvar))并且可以看到地址是不同的。

# mylib.py
libvar = None
def lib_method():
    global libvar
    print(hex(id(libvar)))

# myapp.py
from mylib import libvar, lib_method
import mylib

lib_method()
print(hex(id(libvar)))
print(hex(id(mylib.libvar)))

Marc response is correct. Actually, you can print the memory address for the variables print(hex(id(libvar)) and you can see the addresses are different.

# mylib.py
libvar = None
def lib_method():
    global libvar
    print(hex(id(libvar)))

# myapp.py
from mylib import libvar, lib_method
import mylib

lib_method()
print(hex(id(libvar)))
print(hex(id(mylib.libvar)))

回答 5

script1.py

title="Hello world"

script2.py是我们使用script1变量的地方

方法1:

import script1
print(script1.title)

方法2:

from script1 import title
print(title)

script1.py

title="Hello world"

script2.py is where we using script1 variable

Method 1:

import script1
print(script1.title)

Method 2:

from script1 import title
print(title)

回答 6

Python您可以访问的其他文件的内容像,就好像它们
是某种库,比起像Java或任何OOP语言基础等语言,这是真的很酷;

这使得可以访问文件的内容或将其导入以对其进行处理或对其进行任何处理;这就是为什么Python高度首选数据科学和机器学习等语言的主要原因;

这是 project structure

我从.env file哪里访问API links秘密密钥和秘密密钥所在的位置。

总体结构:

from <File-Name> import *

In Python you can access the contents of other files like as if they
are some kind of a library, compared to other languages like java or any oop base languages , This is really cool ;

This makes accessing the contents of the file or import it to to process it or to do anything with it ; And that is the Main reason why Python is highly preferred Language for Data Science and Machine Learning etc. ;

And this is the picture of project structure

Where I am accessing variables from .env file where the API links and Secret keys reside .

General Structure:

from <File-Name> import *

回答 7

first.py:

a=5

second.py:

import first
print(first.a)

结果将是5。

first.py:

a=5

second.py:

import first
print(first.a)

The result will be 5.


在Python的相对位置打开文件

问题:在Python的相对位置打开文件

假设python代码在以前的Windows目录“ main”中未知的位置执行,并且在运行时将代码安装在任何地方,都需要访问目录“ main / 2091 / data.txt”。

我应该如何使用open(location)函数?应该在什么位置?

编辑:

我发现下面的简单代码可以工作..它有什么缺点吗?

    file="\2091\sample.txt"
    path=os.getcwd()+file
    fp=open(path,'r+');

Suppose python code is executed in not known by prior windows directory say ‘main’ , and wherever code is installed when it runs it needs to access to directory ‘main/2091/data.txt’ .

how should I use open(location) function? what should be location ?

Edit :

I found that below simple code will work..does it have any disadvantages ?

    file="\2091\sample.txt"
    path=os.getcwd()+file
    fp=open(path,'r+');

回答 0

使用这种类型的东西时,您需要注意实际的工作目录是什么。例如,您可能无法从文件所在的目录中运行脚本。在这种情况下,您不能仅使用相对路径本身。

如果确定所需文件位于脚本实际所在的子目录中,则可以__file__在此处使用帮助。 __file__是您正在运行的脚本所在的完整路径。

因此,您可以摆弄这样的东西:

import os
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

With this type of thing you need to be careful what your actual working directory is. For example, you may not run the script from the directory the file is in. In this case, you can’t just use a relative path by itself.

If you are sure the file you want is in a subdirectory beneath where the script is actually located, you can use __file__ to help you out here. __file__ is the full path to where the script you are running is located.

So you can fiddle with something like this:

import os
script_dir = os.path.dirname(__file__) #<-- absolute dir the script is in
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

回答 1

这段代码可以正常工作:

import os


def readFile(filename):
    filehandle = open(filename)
    print filehandle.read()
    filehandle.close()



fileDir = os.path.dirname(os.path.realpath('__file__'))
print fileDir

#For accessing the file in the same folder
filename = "same.txt"
readFile(filename)

#For accessing the file in a folder contained in the current folder
filename = os.path.join(fileDir, 'Folder1.1/same.txt')
readFile(filename)

#For accessing the file in the parent folder of the current folder
filename = os.path.join(fileDir, '../same.txt')
readFile(filename)

#For accessing the file inside a sibling folder.
filename = os.path.join(fileDir, '../Folder2/same.txt')
filename = os.path.abspath(os.path.realpath(filename))
print filename
readFile(filename)

This code works fine:

import os


def readFile(filename):
    filehandle = open(filename)
    print filehandle.read()
    filehandle.close()



fileDir = os.path.dirname(os.path.realpath('__file__'))
print fileDir

#For accessing the file in the same folder
filename = "same.txt"
readFile(filename)

#For accessing the file in a folder contained in the current folder
filename = os.path.join(fileDir, 'Folder1.1/same.txt')
readFile(filename)

#For accessing the file in the parent folder of the current folder
filename = os.path.join(fileDir, '../same.txt')
readFile(filename)

#For accessing the file inside a sibling folder.
filename = os.path.join(fileDir, '../Folder2/same.txt')
filename = os.path.abspath(os.path.realpath(filename))
print filename
readFile(filename)

回答 2

我创建一个帐户只是为了澄清我认为在Russ原始回复中发现的差异。

作为参考,他的原始答案是:

import os
script_dir = os.path.dirname(__file__)
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

这是一个很好的答案,因为它试图动态创建所需文件的绝对系统路径。

Cory Mawhorter注意到这__file__是相对路径(在我的系统上也是这样),建议使用os.path.abspath(__file__)os.path.abspath,但是,返回当前脚本的绝对路径(即/path/to/dir/foobar.py

要使用此方法(以及我最终如何使用它),必须从路径末尾删除脚本名称:

import os
script_path = os.path.abspath(__file__) # i.e. /path/to/dir/foobar.py
script_dir = os.path.split(script_path)[0] #i.e. /path/to/dir/
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

所得的abs_file_path(在此示例中)变为: /path/to/dir/2091/data.txt

I created an account just so I could clarify a discrepancy I think I found in Russ’s original response.

For reference, his original answer was:

import os
script_dir = os.path.dirname(__file__)
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

This is a great answer because it is trying to dynamically creates an absolute system path to the desired file.

Cory Mawhorter noticed that __file__ is a relative path (it is as well on my system) and suggested using os.path.abspath(__file__). os.path.abspath, however, returns the absolute path of your current script (i.e. /path/to/dir/foobar.py)

To use this method (and how I eventually got it working) you have to remove the script name from the end of the path:

import os
script_path = os.path.abspath(__file__) # i.e. /path/to/dir/foobar.py
script_dir = os.path.split(script_path)[0] #i.e. /path/to/dir/
rel_path = "2091/data.txt"
abs_file_path = os.path.join(script_dir, rel_path)

The resulting abs_file_path (in this example) becomes: /path/to/dir/2091/data.txt


回答 3

这取决于您使用的操作系统。如果您想要一个与Windows和* nix兼容的解决方案,例如:

from os import path

file_path = path.relpath("2091/data.txt")
with open(file_path) as f:
    <do stuff>

应该工作正常。

path模块能够格式化其正在运行的任何操作系统的路径。另外,只要您具有正确的权限,python就能很好地处理相对路径。

编辑

正如kindall在评论中提到的那样,python仍然可以在unix样式和Windows样式路径之间进行转换,因此,即使是更简单的代码也可以使用:

with open("2091/data/txt") as f:
    <do stuff>

话虽如此,该path模块仍然具有一些有用的功能。

It depends on what operating system you’re using. If you want a solution that is compatible with both Windows and *nix something like:

from os import path

file_path = path.relpath("2091/data.txt")
with open(file_path) as f:
    <do stuff>

should work fine.

The path module is able to format a path for whatever operating system it’s running on. Also, python handles relative paths just fine, so long as you have correct permissions.

Edit:

As mentioned by kindall in the comments, python can convert between unix-style and windows-style paths anyway, so even simpler code will work:

with open("2091/data/txt") as f:
    <do stuff>

That being said, the path module still has some useful functions.


回答 4

我花了很多时间发现为什么我的代码找不到在Windows系统上运行Python 3的文件。所以我加了。之前/一切正常:

import os

script_dir = os.path.dirname(__file__)
file_path = os.path.join(script_dir, './output03.txt')
print(file_path)
fptr = open(file_path, 'w')

I spend a lot time to discover why my code could not find my file running Python 3 on the Windows system. So I added . before / and everything worked fine:

import os

script_dir = os.path.dirname(__file__)
file_path = os.path.join(script_dir, './output03.txt')
print(file_path)
fptr = open(file_path, 'w')

回答 5

码:

import os
script_path = os.path.abspath(__file__) 
path_list = script_path.split(os.sep)
script_directory = path_list[0:len(path_list)-1]
rel_path = "main/2091/data.txt"
path = "/".join(script_directory) + "/" + rel_path

说明:

导入库:

import os

用于__file__获取当前脚本的路径:

script_path = os.path.abspath(__file__)

将脚本路径分为多个项目:

path_list = script_path.split(os.sep)

删除列表中的最后一项(实际的脚本文件):

script_directory = path_list[0:len(path_list)-1]

添加相对文件的路径:

rel_path = "main/2091/data.txt

加入列表项,并添加相对路径的文件:

path = "/".join(script_directory) + "/" + rel_path

现在,您可以设置要对文件执行的任何操作,例如:

file = open(path)

Code:

import os
script_path = os.path.abspath(__file__) 
path_list = script_path.split(os.sep)
script_directory = path_list[0:len(path_list)-1]
rel_path = "main/2091/data.txt"
path = "/".join(script_directory) + "/" + rel_path

Explanation:

Import library:

import os

Use __file__ to attain the current script’s path:

script_path = os.path.abspath(__file__)

Separates the script path into multiple items:

path_list = script_path.split(os.sep)

Remove the last item in the list (the actual script file):

script_directory = path_list[0:len(path_list)-1]

Add the relative file’s path:

rel_path = "main/2091/data.txt

Join the list items, and addition the relative path’s file:

path = "/".join(script_directory) + "/" + rel_path

Now you are set to do whatever you want with the file, such as, for example:

file = open(path)

回答 6

如果文件在您的父文件夹中,例如。follower.txt,您可以简单地使用open('../follower.txt', 'r').read()

If the file is in your parent folder, eg. follower.txt, you can simply use open('../follower.txt', 'r').read()


回答 7

试试这个:

from pathlib import Path

data_folder = Path("/relative/path")
file_to_open = data_folder / "file.pdf"

f = open(file_to_open)

print(f.read())

Python 3.4引入了一个新的用于处理文件和路径的标准库,称为pathlib。这个对我有用!

Try this:

from pathlib import Path

data_folder = Path("/relative/path")
file_to_open = data_folder / "file.pdf"

f = open(file_to_open)

print(f.read())

Python 3.4 introduced a new standard library for dealing with files and paths called pathlib. It works for me!


回答 8

不确定是否到处都能使用。

我在ubuntu中使用ipython。

如果要读取当前文件夹的子目录中的文件:

/current-folder/sub-directory/data.csv

您的脚本在当前文件夹中,只需尝试以下操作:

import pandas as pd
path = './sub-directory/data.csv'
pd.read_csv(path)

Not sure if this work everywhere.

I’m using ipython in ubuntu.

If you want to read file in current folder’s sub-directory:

/current-folder/sub-directory/data.csv

your script is in current-folder simply try this:

import pandas as pd
path = './sub-directory/data.csv'
pd.read_csv(path)

回答 9

Python只是将您提供的文件名传递给操作系统,然后将其打开。如果您的操作系统支持相对路径main/2091/data.txt(如:提示),则可以正常工作。

您可能会发现,回答此类问题的最简单方法是尝试一下,看看会发生什么。

Python just passes the filename you give it to the operating system, which opens it. If your operating system supports relative paths like main/2091/data.txt (hint: it does), then that will work fine.

You may find that the easiest way to answer a question like this is to try it and see what happens.


回答 10

import os
def file_path(relative_path):
    dir = os.path.dirname(os.path.abspath(__file__))
    split_path = relative_path.split("/")
    new_path = os.path.join(dir, *split_path)
    return new_path

with open(file_path("2091/data.txt"), "w") as f:
    f.write("Powerful you have become.")
import os
def file_path(relative_path):
    dir = os.path.dirname(os.path.abspath(__file__))
    split_path = relative_path.split("/")
    new_path = os.path.join(dir, *split_path)
    return new_path

with open(file_path("2091/data.txt"), "w") as f:
    f.write("Powerful you have become.")

回答 11

当我还是初学者时,我发现这些描述有些令人生畏。一开始我会尝试 For Windows

f= open('C:\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f) 

这将引发一个syntax error。我曾经很困惑。然后在Google上进行一些冲浪之后。找到了发生错误的原因。写给初学者

这是因为要以Unicode读取路径,您只需\在启动文件路径时添加一个

f= open('C:\\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f)

现在,它可以\在启动目录之前添加。

When I was a beginner I found these descriptions a bit intimidating. As at first I would try For Windows

f= open('C:\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f) 

and this would raise an syntax error. I used get confused alot. Then after some surfing across google. found why the error occurred. Writing this for beginners

It’s because for path to be read in Unicode you simple add a \ when starting file path

f= open('C:\\Users\chidu\Desktop\Skipper New\Special_Note.txt','w+')
print(f)

And now it works just add \ before starting the directory.


Python列表目录,子目录和文件

问题:Python列表目录,子目录和文件

我试图制作一个脚本来列出给定目录中的所有目录,子目录和文件。
我尝试了这个:

import sys,os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r,d,f in os.walk(path):
    for file in f:
        print os.path.join(root,file)

不幸的是,它无法正常工作。
我得到所有文件,但没有完整的路径。

例如,如果dir结构为:

/home/patate/directory/targetdirectory/123/456/789/file.txt

它会打印:

/home/patate/directory/targetdirectory/file.txt

我需要的是第一个结果。任何帮助将不胜感激!谢谢。

I’m trying to make a script to list all directory, subdirectory, and files in a given directory.
I tried this:

import sys,os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r,d,f in os.walk(path):
    for file in f:
        print os.path.join(root,file)

Unfortunatly it doesn’t work properly.
I get all the files, but not their complete paths.

For example if the dir struct would be:

/home/patate/directory/targetdirectory/123/456/789/file.txt

It would print:

/home/patate/directory/targetdirectory/file.txt

What I need is the first result. Any help would be greatly appreciated! Thanks.


回答 0

使用os.path.join来连接的目录和文件

for path, subdirs, files in os.walk(root):
    for name in files:
        print os.path.join(path, name)

请注意,path并没有root在串联中使用,因为使用root会不正确。


在Python 3.4中,添加了pathlib模块以简化路径操作。因此,等效于os.path.join

pathlib.PurePath(path, name)

好处pathlib是您可以在路径上使用各种有用的方法。如果使用具体的Path变体,您还可以通过它们进行实际的OS调用,例如,进入目录,删除路径,打开其指向的文件等等。

Use os.path.join to concatenate the directory and file name:

for path, subdirs, files in os.walk(root):
    for name in files:
        print os.path.join(path, name)

Note the usage of path and not root in the concatenation, since using root would be incorrect.


In Python 3.4, the pathlib module was added for easier path manipulations. So the equivalent to os.path.join would be:

pathlib.PurePath(path, name)

The advantage of pathlib is that you can use a variety of useful methods on paths. If you use the concrete Path variant you can also do actual OS calls through them, like changing into a directory, deleting the path, opening the file it points to and much more.


回答 1

以防万一…获取目录和子目录中与某个模式匹配的所有文件(例如* .py):

import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print os.path.join(path, name)

Just in case… Getting all files in the directory and subdirectories matching some pattern (*.py for example):

import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print os.path.join(path, name)

回答 2

这里是单线:

import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
# Meta comment to ease selecting text

最外面的val for sublist in ...循环将列表平整为一维。该j循环收集每个文件名前缀的列表,并将其加入到当前的路径。最后,i循环遍历所有目录和子目录。

本示例./os.walk(...)调用中使用了硬编码的路径,您可以补充所需的任何路径字符串。

注意:os.path.expanduser和/或os.path.expandvars可用于路径字符串,例如~/

扩展此示例:

它易于添加文件基本名测试和目录名测试。

例如,测试*.jpg文件:

... for j in i[2] if j.endswith('.jpg')] ...

此外,排除.git目录:

... for i in os.walk('./') if '.git' not in i[0].split('/')]

Here is a one-liner:

import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
# Meta comment to ease selecting text

The outer most val for sublist in ... loop flattens the list to be one dimensional. The j loop collects a list of every file basename and joins it to the current path. Finally, the i loop iterates over all directories and sub directories.

This example uses the hard-coded path ./ in the os.walk(...) call, you can supplement any path string you like.

Note: os.path.expanduser and/or os.path.expandvars can be used for paths strings like ~/

Extending this example:

Its easy to add in file basename tests and directoryname tests.

For Example, testing for *.jpg files:

... for j in i[2] if j.endswith('.jpg')] ...

Additionally, excluding the .git directory:

... for i in os.walk('./') if '.git' not in i[0].split('/')]

回答 3

无法发表评论,请在此处写答案。这是我所看到的最清晰的一行:

import os
[os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]

Couldn’t comment so writing answer here. This is the clearest one-line I have seen:

import os
[os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]

回答 4

您可以看一下我制作的这个样本。它使用了os.path.walk函数,该函数已被提倡,使用列表存储所有文件路径

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"
def fileWalker(ext,dirname,names):
    '''
    checks files in names'''
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f,pat):
            ext[1].append(os.path.join(dirname,f))


def writeTo(fList):

    with open(where_to,"w") as f:
        for di_r in fList:
            f.write(di_r + "\n")






if __name__ == '__main__':
    li = []
    os.path.walk(root,fileWalker,[ex,li])

    writeTo(li)

You can take a look at this sample I made. It uses the os.path.walk function which is deprecated beware.Uses a list to store all the filepaths

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"
def fileWalker(ext,dirname,names):
    '''
    checks files in names'''
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f,pat):
            ext[1].append(os.path.join(dirname,f))


def writeTo(fList):

    with open(where_to,"w") as f:
        for di_r in fList:
            f.write(di_r + "\n")






if __name__ == '__main__':
    li = []
    os.path.walk(root,fileWalker,[ex,li])

    writeTo(li)

回答 5

一线简单一点:

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

A bit simpler one-liner:

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

回答 6

由于这里的每个示例都仅使用walk(with join),因此我想展示一个不错的示例并与进行比较listdir

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

因此,如您所见,该listdir版本效率更高。(那join很慢)

Since every example here is just using walk (with join), i’d like to show a nice example and comparison with listdir:

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

So as you can see for yourself, the listdir version is much more efficient. (and that join is slow)


从列表中随机选择50个项目写入文件

问题:从列表中随机选择50个项目写入文件

到目前为止,我已经弄清楚了如何导入文件,创建新文件以及使列表随机化。

我在从列表中随机选择50个项目以写入文件时遇到麻烦吗?

def randomizer(input,output1='random_1.txt',output2='random_2.txt',output3='random_3.txt',output4='random_total.txt'):

#Input file 
    query=open(input,'r').read().split()
    dir,file=os.path.split(input)

    temp1 = os.path.join(dir,output1)
    temp2 = os.path.join(dir,output2)
    temp3 = os.path.join(dir,output3)
    temp4 = os.path.join(dir,output4)


    out_file4=open(temp4,'w')

    random.shuffle(query)

    for item in query:
        out_file4.write(item+'\n')   

因此,如果总随机文件为

example:

random_total = ['9','2','3','1','5','6','8','7','0','4']

我想要3个文件(out_file1 | 2 | 3),其中第一个随机集为3,第二个随机集为3,第三个随机集为3(对于此示例,但我要创建的文件应该有50个)

random_1 = ['9','2','3']
random_2 = ['1','5','6']
random_3 = ['8','7','0']

因此,不会包含最后一个“ 4”,这很好。

如何从随机选择的列表中选择50?

更好的是,如何从原始列表中随机选择50个?

So far I have figured out how to import the file, create new files, and randomize the list.

I’m having trouble selecting only 50 items from the list randomly to write to a file?

def randomizer(input,output1='random_1.txt',output2='random_2.txt',output3='random_3.txt',output4='random_total.txt'):

#Input file 
    query=open(input,'r').read().split()
    dir,file=os.path.split(input)

    temp1 = os.path.join(dir,output1)
    temp2 = os.path.join(dir,output2)
    temp3 = os.path.join(dir,output3)
    temp4 = os.path.join(dir,output4)


    out_file4=open(temp4,'w')

    random.shuffle(query)

    for item in query:
        out_file4.write(item+'\n')   

So if the total randomization file was

example:

random_total = ['9','2','3','1','5','6','8','7','0','4']

I would want 3 files (out_file1|2|3) with the first random set of 3, second random set of 3, and third random set of 3 (for this example, but the one I want to create should have 50)

random_1 = ['9','2','3']
random_2 = ['1','5','6']
random_3 = ['8','7','0']

So the last ‘4’ will not be included which is fine.

How can I select 50 from the list that I randomized ?

Even better, how could I select 50 at random from the original list ?


回答 0

如果列表按随机顺序排列,则可以只取前50个。

否则,使用

import random
random.sample(the_list, 50)

random.sample 帮助文字:

sample(self, population, k) method of random.Random instance
    Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

If the list is in random order, you can just take the first 50.

Otherwise, use

import random
random.sample(the_list, 50)

random.sample help text:

sample(self, population, k) method of random.Random instance
    Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

回答 1

选择随机项目的一种简单方法是先随机播放然后切片。

import random
a = [1,2,3,4,5,6,7,8,9]
random.shuffle(a)
print a[:4] # prints 4 random variables

One easy way to select random items is to shuffle then slice.

import random
a = [1,2,3,4,5,6,7,8,9]
random.shuffle(a)
print a[:4] # prints 4 random variables

回答 2

我认为random.choice()是更好的选择。

import numpy as np

mylist = [13,23,14,52,6,23]

np.random.choice(mylist, 3, replace=False)

该函数从列表中返回3个随机选择的值的数组

I think random.choice() is a better option.

import numpy as np

mylist = [13,23,14,52,6,23]

np.random.choice(mylist, 3, replace=False)

the function returns an array of 3 randomly chosen values from the list


回答 3

假设您的列表包含100个元素,并且您想随机选择50个元素。以下是要遵循的步骤:

  1. 导入库
  2. 为随机数生成器创建种子,我将其放在2
  3. 准备一个数字列表,从中随机抽取
  4. 从数字列表中进行随机选择

码:

from random import seed
from random import choice

seed(2)
numbers = [i for i in range(100)]

print(numbers)

for _ in range(50):
    selection = choice(numbers)
    print(selection)

Say your list has 100 elements and you want to pick 50 of them in a random way. Here are the steps to follow:

  1. Import the libraries
  2. Create the seed for random number generator, I have put it at 2
  3. Prepare a list of numbers from which to pick up in a random way
  4. Make the random choices from the numbers list

Code:

from random import seed
from random import choice

seed(2)
numbers = [i for i in range(100)]

print(numbers)

for _ in range(50):
    selection = choice(numbers)
    print(selection)

如何以相反的顺序读取文件?

问题:如何以相反的顺序读取文件?

如何使用python以相反的顺序读取文件?我想从最后一行读取文件。

How to read a file in reverse order using python? I want to read a file from last line to first line.


回答 0

for line in reversed(open("filename").readlines()):
    print line.rstrip()

在Python 3中:

for line in reversed(list(open("filename"))):
    print(line.rstrip())
for line in reversed(open("filename").readlines()):
    print line.rstrip()

And in Python 3:

for line in reversed(list(open("filename"))):
    print(line.rstrip())

回答 1

作为生成器编写的正确,有效的答案。

import os

def reverse_readline(filename, buf_size=8192):
    """A generator that returns the lines of a file in reverse order"""
    with open(filename) as fh:
        segment = None
        offset = 0
        fh.seek(0, os.SEEK_END)
        file_size = remaining_size = fh.tell()
        while remaining_size > 0:
            offset = min(file_size, offset + buf_size)
            fh.seek(file_size - offset)
            buffer = fh.read(min(remaining_size, buf_size))
            remaining_size -= buf_size
            lines = buffer.split('\n')
            # The first line of the buffer is probably not a complete line so
            # we'll save it and append it to the last line of the next buffer
            # we read
            if segment is not None:
                # If the previous chunk starts right from the beginning of line
                # do not concat the segment to the last line of new chunk.
                # Instead, yield the segment first 
                if buffer[-1] != '\n':
                    lines[-1] += segment
                else:
                    yield segment
            segment = lines[0]
            for index in range(len(lines) - 1, 0, -1):
                if lines[index]:
                    yield lines[index]
        # Don't yield None if the file was empty
        if segment is not None:
            yield segment

A correct, efficient answer written as a generator.

import os

def reverse_readline(filename, buf_size=8192):
    """A generator that returns the lines of a file in reverse order"""
    with open(filename) as fh:
        segment = None
        offset = 0
        fh.seek(0, os.SEEK_END)
        file_size = remaining_size = fh.tell()
        while remaining_size > 0:
            offset = min(file_size, offset + buf_size)
            fh.seek(file_size - offset)
            buffer = fh.read(min(remaining_size, buf_size))
            remaining_size -= buf_size
            lines = buffer.split('\n')
            # The first line of the buffer is probably not a complete line so
            # we'll save it and append it to the last line of the next buffer
            # we read
            if segment is not None:
                # If the previous chunk starts right from the beginning of line
                # do not concat the segment to the last line of new chunk.
                # Instead, yield the segment first 
                if buffer[-1] != '\n':
                    lines[-1] += segment
                else:
                    yield segment
            segment = lines[0]
            for index in range(len(lines) - 1, 0, -1):
                if lines[index]:
                    yield lines[index]
        # Don't yield None if the file was empty
        if segment is not None:
            yield segment

回答 2

这样的事情怎么样:

import os


def readlines_reverse(filename):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        position = qfile.tell()
        line = ''
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n":
                yield line[::-1]
                line = ''
            else:
                line += next_char
            position -= 1
        yield line[::-1]


if __name__ == '__main__':
    for qline in readlines_reverse(raw_input()):
        print qline

由于文件是按相反的顺序逐个字符读取的,因此即使在非常大的文件中也可以使用,只要将单独的行放入内存中即可。

How about something like this:

import os


def readlines_reverse(filename):
    with open(filename) as qfile:
        qfile.seek(0, os.SEEK_END)
        position = qfile.tell()
        line = ''
        while position >= 0:
            qfile.seek(position)
            next_char = qfile.read(1)
            if next_char == "\n":
                yield line[::-1]
                line = ''
            else:
                line += next_char
            position -= 1
        yield line[::-1]


if __name__ == '__main__':
    for qline in readlines_reverse(raw_input()):
        print qline

Since the file is read character by character in reverse order, it will work even on very large files, as long as individual lines fit into memory.


回答 3

您也可以使用python模块 file_read_backwards

通过pip install file_read_backwards(v1.2.1)安装后,您可以通过以下方式以内存高效的方式向后(逐行)读取整个文件:

#!/usr/bin/env python2.7

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
    for l in frb:
         print l

它支持“ utf-8”,“ latin-1”和“ ascii”编码。

也支持python3。可以在http://file-read-backwards.readthedocs.io/en/latest/readme.html上找到更多文档。

You can also use python module file_read_backwards.

After installing it, via pip install file_read_backwards (v1.2.1), you can read the entire file backwards (line-wise) in a memory efficient manner via:

#!/usr/bin/env python2.7

from file_read_backwards import FileReadBackwards

with FileReadBackwards("/path/to/file", encoding="utf-8") as frb:
    for l in frb:
         print l

It supports “utf-8″,”latin-1”, and “ascii” encodings.

Support is also available for python3. Further documentation can be found at http://file-read-backwards.readthedocs.io/en/latest/readme.html


回答 4

for line in reversed(open("file").readlines()):
    print line.rstrip()

如果您使用的是Linux,则可以使用taccommand。

$ tac file

您可以在ActiveState的此处此处找到2个食谱

for line in reversed(open("file").readlines()):
    print line.rstrip()

If you are on linux, you can use tac command.

$ tac file

2 recipes you can find in ActiveState here and here


回答 5

import re

def filerev(somefile, buffer=0x20000):
  somefile.seek(0, os.SEEK_END)
  size = somefile.tell()
  lines = ['']
  rem = size % buffer
  pos = max(0, (size // buffer - 1) * buffer)
  while pos >= 0:
    somefile.seek(pos, os.SEEK_SET)
    data = somefile.read(rem + buffer) + lines[0]
    rem = 0
    lines = re.findall('[^\n]*\n?', data)
    ix = len(lines) - 2
    while ix > 0:
      yield lines[ix]
      ix -= 1
    pos -= buffer
  else:
    yield lines[0]

with open(sys.argv[1], 'r') as f:
  for line in filerev(f):
    sys.stdout.write(line)
import re

def filerev(somefile, buffer=0x20000):
  somefile.seek(0, os.SEEK_END)
  size = somefile.tell()
  lines = ['']
  rem = size % buffer
  pos = max(0, (size // buffer - 1) * buffer)
  while pos >= 0:
    somefile.seek(pos, os.SEEK_SET)
    data = somefile.read(rem + buffer) + lines[0]
    rem = 0
    lines = re.findall('[^\n]*\n?', data)
    ix = len(lines) - 2
    while ix > 0:
      yield lines[ix]
      ix -= 1
    pos -= buffer
  else:
    yield lines[0]

with open(sys.argv[1], 'r') as f:
  for line in filerev(f):
    sys.stdout.write(line)

回答 6

接受的答案不适用于文件大而内存不足的情况(这种情况很少见)。

正如其他人所指出的那样,@ srohde的答案看起来不错,但存在下一个问题:

  • 当我们可以传递文件对象并将其留给用户来决定应以哪种编码方式读取文件时,打开文件看起来是多余的,
  • 即使我们重构为接受文件对象,它也不适用于所有编码:我们可以选择具有utf-8编码和非ascii内容的文件,例如

    й

    通过buf_size等于1并将具有

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: invalid start byte

    当然文字可能更大,但 buf_size可能会被拾取,从而导致上述混淆错误,

  • 我们无法指定自定义行分隔符,
  • 我们不能选择保留行分隔符。

因此,考虑到所有这些问题,我编写了单独的函数:

  • 一种适用于字节流的
  • 第二个用于文本流,并将其基础字节流委托给第一个,并解码结果行。

首先,让我们定义下一个实用程序函数:

ceil_division用于天花板分隔(与标准//地板分隔相反,更多信息可以在此线程中找到)

def ceil_division(left_number, right_number):
    """
    Divides given numbers with ceiling.
    """
    return -(-left_number // right_number)

split 通过从右端给定的分隔符分割字符串并保持其能力:

def split(string, separator, keep_separator):
    """
    Splits given string by given separator.
    """
    parts = string.split(separator)
    if keep_separator:
        *parts, last_part = parts
        parts = [part + separator for part in parts]
        if last_part:
            return parts + [last_part]
    return parts

read_batch_from_end 从二进制流的右端读取批处理

def read_batch_from_end(byte_stream, size, end_position):
    """
    Reads batch from the end of given byte stream.
    """
    if end_position > size:
        offset = end_position - size
    else:
        offset = 0
        size = end_position
    byte_stream.seek(offset)
    return byte_stream.read(size)

之后,我们可以定义函数以相反的顺序读取字节流,例如

import functools
import itertools
import os
from operator import methodcaller, sub


def reverse_binary_stream(byte_stream, batch_size=None,
                          lines_separator=None,
                          keep_lines_separator=True):
    if lines_separator is None:
        lines_separator = (b'\r', b'\n', b'\r\n')
        lines_splitter = methodcaller(str.splitlines.__name__,
                                      keep_lines_separator)
    else:
        lines_splitter = functools.partial(split,
                                           separator=lines_separator,
                                           keep_separator=keep_lines_separator)
    stream_size = byte_stream.seek(0, os.SEEK_END)
    if batch_size is None:
        batch_size = stream_size or 1
    batches_count = ceil_division(stream_size, batch_size)
    remaining_bytes_indicator = itertools.islice(
            itertools.accumulate(itertools.chain([stream_size],
                                                 itertools.repeat(batch_size)),
                                 sub),
            batches_count)
    try:
        remaining_bytes_count = next(remaining_bytes_indicator)
    except StopIteration:
        return

    def read_batch(position):
        result = read_batch_from_end(byte_stream,
                                     size=batch_size,
                                     end_position=position)
        while result.startswith(lines_separator):
            try:
                position = next(remaining_bytes_indicator)
            except StopIteration:
                break
            result = (read_batch_from_end(byte_stream,
                                          size=batch_size,
                                          end_position=position)
                      + result)
        return result

    batch = read_batch(remaining_bytes_count)
    segment, *lines = lines_splitter(batch)
    yield from reverse(lines)
    for remaining_bytes_count in remaining_bytes_indicator:
        batch = read_batch(remaining_bytes_count)
        lines = lines_splitter(batch)
        if batch.endswith(lines_separator):
            yield segment
        else:
            lines[-1] += segment
        segment, *lines = lines
        yield from reverse(lines)
    yield segment

最后,可以将文本文件反转功能定义如下:

import codecs


def reverse_file(file, batch_size=None, 
                 lines_separator=None,
                 keep_lines_separator=True):
    encoding = file.encoding
    if lines_separator is not None:
        lines_separator = lines_separator.encode(encoding)
    yield from map(functools.partial(codecs.decode,
                                     encoding=encoding),
                   reverse_binary_stream(
                           file.buffer,
                           batch_size=batch_size,
                           lines_separator=lines_separator,
                           keep_lines_separator=keep_lines_separator))

测验

准备工作

我已经使用fsutilcommand生成了4个文件:

  1. 没有内容的empty.txt,大小为0MB
  2. tiny.txt,大小为1MB
  3. small.txt,大小为10MB
  4. large.txt,大小为50MB

我也重构了@srohde解决方案以使用文件对象而不是文件路径。

测试脚本

from timeit import Timer

repeats_count = 7
number = 1
create_setup = ('from collections import deque\n'
                'from __main__ import reverse_file, reverse_readline\n'
                'file = open("{}")').format
srohde_solution = ('with file:\n'
                   '    deque(reverse_readline(file,\n'
                   '                           buf_size=8192),'
                   '          maxlen=0)')
azat_ibrakov_solution = ('with file:\n'
                         '    deque(reverse_file(file,\n'
                         '                       lines_separator="\\n",\n'
                         '                       keep_lines_separator=False,\n'
                         '                       batch_size=8192), maxlen=0)')
print('reversing empty file by "srohde"',
      min(Timer(srohde_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing empty file by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))

注意:我已经使用了collections.deque类来耗尽生成器。

产出

对于Windows 10上的PyPy 3.5:

reversing empty file by "srohde" 8.31e-05
reversing empty file by "Azat Ibrakov" 0.00016090000000000028
reversing tiny file (1MB) by "srohde" 0.160081
reversing tiny file (1MB) by "Azat Ibrakov" 0.09594989999999998
reversing small file (10MB) by "srohde" 8.8891863
reversing small file (10MB) by "Azat Ibrakov" 5.323388100000001
reversing large file (50MB) by "srohde" 186.5338368
reversing large file (50MB) by "Azat Ibrakov" 99.07450229999998

对于Windows 10上的CPython 3.5:

reversing empty file by "srohde" 3.600000000000001e-05
reversing empty file by "Azat Ibrakov" 4.519999999999958e-05
reversing tiny file (1MB) by "srohde" 0.01965560000000001
reversing tiny file (1MB) by "Azat Ibrakov" 0.019207699999999994
reversing small file (10MB) by "srohde" 3.1341862999999996
reversing small file (10MB) by "Azat Ibrakov" 3.0872588000000007
reversing large file (50MB) by "srohde" 82.01206720000002
reversing large file (50MB) by "Azat Ibrakov" 82.16775059999998

因此,我们可以看到它的性能像原始解决方案一样,但它更具通用性,并且没有上面列出的缺点。


广告

我已经将此添加到具有许多经过良好测试的功能/迭代实用程序0.3.0lz软件包版本(需要Python 3.5 +)中。

可以像

 import io
 from lz.iterating import reverse
 ...
 with open('path/to/file') as file:
     for line in reverse(file, batch_size=io.DEFAULT_BUFFER_SIZE):
         print(line)

它支持所有标准编码(可能utf-7是因为我很难定义一种策略来生成可编码的字符串)。

Accepted answer won’t work for cases with large files that won’t fit in memory (which is not a rare case).

As it was noted by others, @srohde answer looks good, but it has next issues:

  • openning file looks redundant, when we can pass file object & leave it to user to decide in which encoding it should be read,
  • even if we refactor to accept file object, it won’t work for all encodings: we can choose file with utf-8 encoding and non-ascii contents like

    й
    

    pass buf_size equal to 1 and will have

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xb9 in position 0: invalid start byte
    

    of course text may be larger but buf_size may be picked up so it’ll lead to obfuscated error like above,

  • we can’t specify custom line separator,
  • we can’t choose to keep line separator.

So considering all these concerns I’ve written separate functions:

  • one which works with byte streams,
  • second one which works with text streams and delegates its underlying byte stream to the first one and decodes resulting lines.

First of all let’s define next utility functions:

ceil_division for making division with ceiling (in contrast with standard // division with floor, more info can be found in this thread)

def ceil_division(left_number, right_number):
    """
    Divides given numbers with ceiling.
    """
    return -(-left_number // right_number)

split for splitting string by given separator from right end with ability to keep it:

def split(string, separator, keep_separator):
    """
    Splits given string by given separator.
    """
    parts = string.split(separator)
    if keep_separator:
        *parts, last_part = parts
        parts = [part + separator for part in parts]
        if last_part:
            return parts + [last_part]
    return parts

read_batch_from_end to read batch from the right end of binary stream

def read_batch_from_end(byte_stream, size, end_position):
    """
    Reads batch from the end of given byte stream.
    """
    if end_position > size:
        offset = end_position - size
    else:
        offset = 0
        size = end_position
    byte_stream.seek(offset)
    return byte_stream.read(size)

After that we can define function for reading byte stream in reverse order like

import functools
import itertools
import os
from operator import methodcaller, sub


def reverse_binary_stream(byte_stream, batch_size=None,
                          lines_separator=None,
                          keep_lines_separator=True):
    if lines_separator is None:
        lines_separator = (b'\r', b'\n', b'\r\n')
        lines_splitter = methodcaller(str.splitlines.__name__,
                                      keep_lines_separator)
    else:
        lines_splitter = functools.partial(split,
                                           separator=lines_separator,
                                           keep_separator=keep_lines_separator)
    stream_size = byte_stream.seek(0, os.SEEK_END)
    if batch_size is None:
        batch_size = stream_size or 1
    batches_count = ceil_division(stream_size, batch_size)
    remaining_bytes_indicator = itertools.islice(
            itertools.accumulate(itertools.chain([stream_size],
                                                 itertools.repeat(batch_size)),
                                 sub),
            batches_count)
    try:
        remaining_bytes_count = next(remaining_bytes_indicator)
    except StopIteration:
        return

    def read_batch(position):
        result = read_batch_from_end(byte_stream,
                                     size=batch_size,
                                     end_position=position)
        while result.startswith(lines_separator):
            try:
                position = next(remaining_bytes_indicator)
            except StopIteration:
                break
            result = (read_batch_from_end(byte_stream,
                                          size=batch_size,
                                          end_position=position)
                      + result)
        return result

    batch = read_batch(remaining_bytes_count)
    segment, *lines = lines_splitter(batch)
    yield from reverse(lines)
    for remaining_bytes_count in remaining_bytes_indicator:
        batch = read_batch(remaining_bytes_count)
        lines = lines_splitter(batch)
        if batch.endswith(lines_separator):
            yield segment
        else:
            lines[-1] += segment
        segment, *lines = lines
        yield from reverse(lines)
    yield segment

and finally a function for reversing text file can be defined like:

import codecs


def reverse_file(file, batch_size=None, 
                 lines_separator=None,
                 keep_lines_separator=True):
    encoding = file.encoding
    if lines_separator is not None:
        lines_separator = lines_separator.encode(encoding)
    yield from map(functools.partial(codecs.decode,
                                     encoding=encoding),
                   reverse_binary_stream(
                           file.buffer,
                           batch_size=batch_size,
                           lines_separator=lines_separator,
                           keep_lines_separator=keep_lines_separator))

Tests

Preparations

I’ve generated 4 files using fsutil command:

  1. empty.txt with no contents, size 0MB
  2. tiny.txt with size of 1MB
  3. small.txt with size of 10MB
  4. large.txt with size of 50MB

also I’ve refactored @srohde solution to work with file object instead of file path.

Test script

from timeit import Timer

repeats_count = 7
number = 1
create_setup = ('from collections import deque\n'
                'from __main__ import reverse_file, reverse_readline\n'
                'file = open("{}")').format
srohde_solution = ('with file:\n'
                   '    deque(reverse_readline(file,\n'
                   '                           buf_size=8192),'
                   '          maxlen=0)')
azat_ibrakov_solution = ('with file:\n'
                         '    deque(reverse_file(file,\n'
                         '                       lines_separator="\\n",\n'
                         '                       keep_lines_separator=False,\n'
                         '                       batch_size=8192), maxlen=0)')
print('reversing empty file by "srohde"',
      min(Timer(srohde_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing empty file by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('empty.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing tiny file (1MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('tiny.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing small file (10MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('small.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "srohde"',
      min(Timer(srohde_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))
print('reversing large file (50MB) by "Azat Ibrakov"',
      min(Timer(azat_ibrakov_solution,
                create_setup('large.txt')).repeat(repeats_count, number)))

Note: I’ve used collections.deque class to exhaust generator.

Outputs

For PyPy 3.5 on Windows 10:

reversing empty file by "srohde" 8.31e-05
reversing empty file by "Azat Ibrakov" 0.00016090000000000028
reversing tiny file (1MB) by "srohde" 0.160081
reversing tiny file (1MB) by "Azat Ibrakov" 0.09594989999999998
reversing small file (10MB) by "srohde" 8.8891863
reversing small file (10MB) by "Azat Ibrakov" 5.323388100000001
reversing large file (50MB) by "srohde" 186.5338368
reversing large file (50MB) by "Azat Ibrakov" 99.07450229999998

For CPython 3.5 on Windows 10:

reversing empty file by "srohde" 3.600000000000001e-05
reversing empty file by "Azat Ibrakov" 4.519999999999958e-05
reversing tiny file (1MB) by "srohde" 0.01965560000000001
reversing tiny file (1MB) by "Azat Ibrakov" 0.019207699999999994
reversing small file (10MB) by "srohde" 3.1341862999999996
reversing small file (10MB) by "Azat Ibrakov" 3.0872588000000007
reversing large file (50MB) by "srohde" 82.01206720000002
reversing large file (50MB) by "Azat Ibrakov" 82.16775059999998

So as we can see it performs like original solution, but is more general and free of its disadvantages listed above.


Advertisement

I’ve added this to 0.3.0 version of lz package (requires Python 3.5+) that have many well-tested functional/iterating utilities.

Can be used like

 import io
 from lz.iterating import reverse
 ...
 with open('path/to/file') as file:
     for line in reverse(file, batch_size=io.DEFAULT_BUFFER_SIZE):
         print(line)

It supports all standard encodings (maybe except utf-7 since it is hard for me to define a strategy for generating strings encodable with it).


回答 7

在这里,您可以找到我的实现,可以通过更改“ buffer”变量来限制ram的使用,这是一个程序在开始时打印空行的错误。

如果没有新行超过缓冲区字节,内存使用也会增加,“ leak”变量将一直增加,直到看到新行(“ \ n”)为止。

这也适用于大于我的总内存的16 GB文件。

import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()

division, remainder = divmod(filesize, buffer)
line_leak=''

for chunk_counter in range(1,division + 2):
    if division - chunk_counter < 0:
        f.seek(0, os.SEEK_SET)
        chunk = f.read(remainder)
    elif division - chunk_counter >= 0:
        f.seek(-(buffer*chunk_counter), os.SEEK_END)
        chunk = f.read(buffer)

    chunk_lines_reversed = list(reversed(chunk.split('\n')))
    if line_leak: # add line_leak from previous chunk to beginning
        chunk_lines_reversed[0] += line_leak

    # after reversed, save the leakedline for next chunk iteration
    line_leak = chunk_lines_reversed.pop()

    if chunk_lines_reversed:
        print "\n".join(chunk_lines_reversed)
    # print the last leaked line
    if division - chunk_counter < 0:
        print line_leak

Here you can find my my implementation, you can limit the ram usage by changing the “buffer” variable, there is a bug that the program prints an empty line in the beginning.

And also ram usage may be increase if there is no new lines for more than buffer bytes, “leak” variable will increase until seeing a new line (“\n”).

This is also working for 16 GB files which is bigger then my total memory.

import os,sys
buffer = 1024*1024 # 1MB
f = open(sys.argv[1])
f.seek(0, os.SEEK_END)
filesize = f.tell()

division, remainder = divmod(filesize, buffer)
line_leak=''

for chunk_counter in range(1,division + 2):
    if division - chunk_counter < 0:
        f.seek(0, os.SEEK_SET)
        chunk = f.read(remainder)
    elif division - chunk_counter >= 0:
        f.seek(-(buffer*chunk_counter), os.SEEK_END)
        chunk = f.read(buffer)

    chunk_lines_reversed = list(reversed(chunk.split('\n')))
    if line_leak: # add line_leak from previous chunk to beginning
        chunk_lines_reversed[0] += line_leak

    # after reversed, save the leakedline for next chunk iteration
    line_leak = chunk_lines_reversed.pop()

    if chunk_lines_reversed:
        print "\n".join(chunk_lines_reversed)
    # print the last leaked line
    if division - chunk_counter < 0:
        print line_leak

回答 8

感谢您的回答@srohde。它使用“ is”运算符检查换行符时有一个小错误,并且我无法对信誉为1的答案进行评论。我也想管理外部文件的打开,因为这使我可以将杂物嵌入luigi任务。

我需要更改的形式如下:

with open(filename) as fp:
    for line in fp:
        #print line,  # contains new line
        print '>{}<'.format(line)

我想更改为:

with open(filename) as fp:
    for line in reversed_fp_iter(fp, 4):
        #print line,  # contains new line
        print '>{}<'.format(line)

这是修改后的答案,需要文件句柄并保留换行符:

def reversed_fp_iter(fp, buf_size=8192):
    """a generator that returns the lines of a file in reverse order
    ref: https://stackoverflow.com/a/23646049/8776239
    """
    segment = None  # holds possible incomplete segment at the beginning of the buffer
    offset = 0
    fp.seek(0, os.SEEK_END)
    file_size = remaining_size = fp.tell()
    while remaining_size > 0:
        offset = min(file_size, offset + buf_size)
        fp.seek(file_size - offset)
        buffer = fp.read(min(remaining_size, buf_size))
        remaining_size -= buf_size
        lines = buffer.splitlines(True)
        # the first line of the buffer is probably not a complete line so
        # we'll save it and append it to the last line of the next buffer
        # we read
        if segment is not None:
            # if the previous chunk starts right from the beginning of line
            # do not concat the segment to the last line of new chunk
            # instead, yield the segment first
            if buffer[-1] == '\n':
                #print 'buffer ends with newline'
                yield segment
            else:
                lines[-1] += segment
                #print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
        segment = lines[0]
        for index in range(len(lines) - 1, 0, -1):
            if len(lines[index]):
                yield lines[index]
    # Don't yield None if the file was empty
    if segment is not None:
        yield segment

Thanks for the answer @srohde. It has a small bug checking for newline character with ‘is’ operator, and I could not comment on the answer with 1 reputation. Also I’d like to manage file open outside because that enables me to embed my ramblings for luigi tasks.

What I needed to change has the form:

with open(filename) as fp:
    for line in fp:
        #print line,  # contains new line
        print '>{}<'.format(line)

I’d love to change to:

with open(filename) as fp:
    for line in reversed_fp_iter(fp, 4):
        #print line,  # contains new line
        print '>{}<'.format(line)

Here is a modified answer that wants a file handle and keeps newlines:

def reversed_fp_iter(fp, buf_size=8192):
    """a generator that returns the lines of a file in reverse order
    ref: https://stackoverflow.com/a/23646049/8776239
    """
    segment = None  # holds possible incomplete segment at the beginning of the buffer
    offset = 0
    fp.seek(0, os.SEEK_END)
    file_size = remaining_size = fp.tell()
    while remaining_size > 0:
        offset = min(file_size, offset + buf_size)
        fp.seek(file_size - offset)
        buffer = fp.read(min(remaining_size, buf_size))
        remaining_size -= buf_size
        lines = buffer.splitlines(True)
        # the first line of the buffer is probably not a complete line so
        # we'll save it and append it to the last line of the next buffer
        # we read
        if segment is not None:
            # if the previous chunk starts right from the beginning of line
            # do not concat the segment to the last line of new chunk
            # instead, yield the segment first
            if buffer[-1] == '\n':
                #print 'buffer ends with newline'
                yield segment
            else:
                lines[-1] += segment
                #print 'enlarged last line to >{}<, len {}'.format(lines[-1], len(lines))
        segment = lines[0]
        for index in range(len(lines) - 1, 0, -1):
            if len(lines[index]):
                yield lines[index]
    # Don't yield None if the file was empty
    if segment is not None:
        yield segment

回答 9

一个简单的函数来创建另一个反转的文件(仅适用于Linux):

import os
def tac(file1, file2):
     print(os.system('tac %s > %s' % (file1,file2)))

如何使用

tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')

a simple function to create a second file reversed (linux only):

import os
def tac(file1, file2):
     print(os.system('tac %s > %s' % (file1,file2)))

how to use

tac('ordered.csv', 'reversed.csv')
f = open('reversed.csv')

回答 10

如果您担心文件大小/内存使用情况,则可以对文件进行内存映射并向后扫描换行符,这是一种解决方案:

如何在文本文件中搜索字符串?

If you are concerned about file size / memory usage, memory-mapping the file and scanning backwards for newlines is a solution:

How to search for a string in text files?


回答 11

使用open(“ filename”)as f:

    print(f.read()[::-1])

with open(“filename”) as f:

    print(f.read()[::-1])

回答 12

def reverse_lines(filename):
    y=open(filename).readlines()
    return y[::-1]
def reverse_lines(filename):
    y=open(filename).readlines()
    return y[::-1]

回答 13

with处理文件时请务必使用,因为它可以为您处理所有事情:

with open('filename', 'r') as f:
    for line in reversed(f.readlines()):
        print line

或在Python 3中:

with open('filename', 'r') as f:
    for line in reversed(list(f.readlines())):
        print(line)

Always use with when working with files as it handles everything for you:

with open('filename', 'r') as f:
    for line in reversed(f.readlines()):
        print line

Or in Python 3:

with open('filename', 'r') as f:
    for line in reversed(list(f.readlines())):
        print(line)

回答 14

您需要首先以读取格式打开文件,将其保存到变量,然后以写入格式打开第二个文件,在该文件中您将使用[::-1]切片来写入或附加变量,从而完全反转文件。您还可以使用readlines()使其成为一行列表,您可以对其进行操作

def copy_and_reverse(filename, newfile):
    with open(filename) as file:
        text = file.read()
    with open(newfile, "w") as file2:
        file2.write(text[::-1])

you would need to first open your file in read format, save it to a variable, then open the second file in write format where you would write or append the variable using a the [::-1] slice, completely reversing the file. You can also use readlines() to make it into a list of lines, which you can manipulate

def copy_and_reverse(filename, newfile):
    with open(filename) as file:
        text = file.read()
    with open(newfile, "w") as file2:
        file2.write(text[::-1])

回答 15

大多数答案都需要在执行任何操作之前先读取整个文件。此样本从头开始读取越来越大的样本

在编写此答案时,我仅看到MuratYükselen的答案。几乎一样,我认为这是一件好事。下面的示例还处理\ r,并在每个步骤中增加其缓冲区大小。我也有一些单元测试来备份此代码。

def readlines_reversed(f):
    """ Iterate over the lines in a file in reverse. The file must be
    open in 'rb' mode. Yields the lines unencoded (as bytes), including the
    newline character. Produces the same result as readlines, but reversed.
    If this is used to reverse the line in a file twice, the result is
    exactly the same.
    """
    head = b""
    f.seek(0, 2)
    t = f.tell()
    buffersize, maxbuffersize = 64, 4096
    while True:
        if t <= 0:
            break
        # Read next block
        buffersize = min(buffersize * 2, maxbuffersize)
        tprev = t
        t = max(0, t - buffersize)
        f.seek(t)
        lines = f.read(tprev - t).splitlines(True)
        # Align to line breaks
        if not lines[-1].endswith((b"\n", b"\r")):
            lines[-1] += head  # current tail is previous head
        elif head == b"\n" and lines[-1].endswith(b"\r"):
            lines[-1] += head  # Keep \r\n together
        elif head:
            lines.append(head)
        head = lines.pop(0)  # can be '\n' (ok)
        # Iterate over current block in reverse
        for line in reversed(lines):
            yield line
    if head:
        yield head

Most of the answers need to read the whole file before doing anything. This sample reads increasingly large samples from the end.

I only saw Murat Yükselen’s answer while writing this answer. It’s nearly the same, which I suppose is a good thing. The sample below also deals with \r and increases its buffersize at each step. I also have some unit tests to back this code up.

def readlines_reversed(f):
    """ Iterate over the lines in a file in reverse. The file must be
    open in 'rb' mode. Yields the lines unencoded (as bytes), including the
    newline character. Produces the same result as readlines, but reversed.
    If this is used to reverse the line in a file twice, the result is
    exactly the same.
    """
    head = b""
    f.seek(0, 2)
    t = f.tell()
    buffersize, maxbuffersize = 64, 4096
    while True:
        if t <= 0:
            break
        # Read next block
        buffersize = min(buffersize * 2, maxbuffersize)
        tprev = t
        t = max(0, t - buffersize)
        f.seek(t)
        lines = f.read(tprev - t).splitlines(True)
        # Align to line breaks
        if not lines[-1].endswith((b"\n", b"\r")):
            lines[-1] += head  # current tail is previous head
        elif head == b"\n" and lines[-1].endswith(b"\r"):
            lines[-1] += head  # Keep \r\n together
        elif head:
            lines.append(head)
        head = lines.pop(0)  # can be '\n' (ok)
        # Iterate over current block in reverse
        for line in reversed(lines):
            yield line
    if head:
        yield head

回答 16

逐行读取文件,然后以相反顺序将其添加到列表中。

这是代码示例:

reverse = []
with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()
         reverse[0:0] = line

Read the file line by line and then add it on a list in reverse order.

Here is an example of code :

reverse = []
with open("file.txt", "r") as file:
    for line in file:
        line = line.strip()
         reverse[0:0] = line

回答 17

import sys
f = open(sys.argv[1] , 'r')
for line in f.readlines()[::-1]:
    print line
import sys
f = open(sys.argv[1] , 'r')
for line in f.readlines()[::-1]:
    print line

回答 18

def previous_line(self, opened_file):
        opened_file.seek(0, os.SEEK_END)
        position = opened_file.tell()
        buffer = bytearray()
        while position >= 0:
            opened_file.seek(position)
            position -= 1
            new_byte = opened_file.read(1)
            if new_byte == self.NEW_LINE:
                parsed_string = buffer.decode()
                yield parsed_string
                buffer = bytearray()
            elif new_byte == self.EMPTY_BYTE:
                continue
            else:
                new_byte_array = bytearray(new_byte)
                new_byte_array.extend(buffer)
                buffer = new_byte_array
        yield None

使用:

opened_file = open(filepath, "rb")
iterator = self.previous_line(opened_file)
line = next(iterator) #one step
close(opened_file)
def previous_line(self, opened_file):
        opened_file.seek(0, os.SEEK_END)
        position = opened_file.tell()
        buffer = bytearray()
        while position >= 0:
            opened_file.seek(position)
            position -= 1
            new_byte = opened_file.read(1)
            if new_byte == self.NEW_LINE:
                parsed_string = buffer.decode()
                yield parsed_string
                buffer = bytearray()
            elif new_byte == self.EMPTY_BYTE:
                continue
            else:
                new_byte_array = bytearray(new_byte)
                new_byte_array.extend(buffer)
                buffer = new_byte_array
        yield None

to use:

opened_file = open(filepath, "rb")
iterator = self.previous_line(opened_file)
line = next(iterator) #one step
close(opened_file)

回答 19

我不得不在一段时间前使用以下代码。它通过管道传递给外壳。恐怕我没有完整的脚本了。如果您使用的是unixish操作系统,则可以使用“ tac”,但是,例如在Mac OSX上,tac命令不起作用,请使用tail -r。以下代码段测试了您所使用的平台,并相应地调整了命令

# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.

if sys.platform == "darwin":
    command += "|tail -r"
elif sys.platform == "linux2":
    command += "|tac"
else:
    raise EnvironmentError('Platform %s not supported' % sys.platform)

I had to do this some time ago and used the below code. It pipes to the shell. I am afraid i do not have the complete script anymore. If you are on a unixish operating system, you can use “tac”, however on e.g. Mac OSX tac command does not work, use tail -r. The below code snippet tests for which platform you’re on, and adjusts the command accordingly

# We need a command to reverse the line order of the file. On Linux this
# is 'tac', on OSX it is 'tail -r'
# 'tac' is not supported on osx, 'tail -r' is not supported on linux.

if sys.platform == "darwin":
    command += "|tail -r"
elif sys.platform == "linux2":
    command += "|tac"
else:
    raise EnvironmentError('Platform %s not supported' % sys.platform)

python请求文件上传

问题:python请求文件上传

我正在执行一个使用Python请求库上传文件的简单任务。我搜索了Stack Overflow,似乎没有人遇到相同的问题,即服务器未收到该文件:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

我用文件名填充了’upload_file’关键字的值,因为如果我将其保留为空白,则表示

Error - You must select a file to upload!

现在我明白了

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

仅当文件为空时才会出现。因此,我对如何成功发送文件感到困惑。我知道该文件有效,因为如果我访问此网站并手动填写表格,它将返回一个很好的匹配对象列表,这就是我想要的。我非常感谢所有提示。

其他一些相关的线程(但不能回答我的问题):

I’m performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:

import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)

I’m filling the value of ‘upload_file’ keyword with my filename, because if I leave it blank, it says

Error - You must select a file to upload!

And now I get

File  file.txt  of size    bytes is  uploaded successfully!
Query service results:  There were 0 lines.

Which comes up only if the file is empty. So I’m stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I’m after. I’d really appreciate all hints.

Some other threads related (but not answering my problem):


回答 0

如果upload_file要作为文件,请使用:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

并且requests将派遣一个多部分表单POST体与upload_file字段设置为内容file.txt的文件。

文件名将包含在特定字段的mime标头中:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

注意filename="file.txt"参数。

files如果需要更多控制,则可以使用元组作为映射值,其中包含2到4个元素。第一个元素是文件名,其后是内容,以及可选的content-type标头值和可选的附加标头映射:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

这将设置备用文件名和内容类型,而忽略可选的标题。

如果您要从文件中提取整个POST正文(未指定其他字段),则不要使用files参数,只需将文件直接发布为即可data。然后,您可能还需要设置Content-Type标头,否则将不会设置任何标头。请参阅Python请求-文件中的POST数据

If upload_file is meant to be the file, use:

files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}

r = requests.post(url, files=files, data=values)

and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.

The filename will be included in the mime header for the specific field:

>>> import requests
>>> open('file.txt', 'wb')  # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"


--c226ce13d09842658ffbd31e0563c6bd--

Note the filename="file.txt" parameter.

You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:

files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}

This sets an alternative filename and content type, leaving out the optional headers.

If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don’t use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests – POST data from a file.


回答 1

(2018)新的python请求库简化了此过程,我们可以使用’files’变量表示我们要上传经过多部分编码的文件

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

(2018) the new python requests library has simplified this process, we can use the ‘files’ variable to signal that we want to upload a multipart-encoded file

url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}

r = requests.post(url, files=files)
r.text

回答 2

客户上传

如果要使用Python requests库上传单个文件,则请求lib 支持流上传,这使您无需读取内存即可发送大文件或流。

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

服务器端

然后将文件存储在server.py侧面,这样就可以将流保存到文件中而不加载到内存中。以下是使用Flask文件上传的示例。

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

或使用修复程序中提到的werkzeug表单数据解析来解决“ 大文件上传占用内存 ”的问题,以避免在大文件上传时(约60秒内无效使用 st 22 GiB文件。) 13 MiB。)。

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

Client Upload

If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

Server Side

Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.

@app.route("/upload", methods=['POST'])
def upload_file():
    from werkzeug.datastructures import FileStorage
    FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
    return 'OK', 200

Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of “large file uploads eating up memory” in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).

@app.route("/upload", methods=['POST'])
def upload_file():
    def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
        import tempfile
        tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
        app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
        return tmpfile

    import werkzeug, flask
    stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
    for fil in files.values():
        app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
        # Do whatever with stored file at `fil.stream.name`
    return 'OK', 200

回答 3

在Ubuntu中,您可以采用这种方式,

将文件保存在某个位置(临时),然后打开并发送给API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

In Ubuntu you can apply this way,

to save file at some location (temporary) and then open and send it to API

      path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
      path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
      data={} #can be anything u want to pass along with File
      file1 = open(path12, 'rb')
      header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
       res= requests.post(url,data,header)

如何从Python的文件路径中提取文件夹路径?

问题:如何从Python的文件路径中提取文件夹路径?

我只想从完整路径到文件获取文件夹路径。

例如T:\Data\DBDesign\DBDesign_93_v141b.mdb,我想要得到T:\Data\DBDesign(不包括\DBDesign_93_v141b.mdb)。

我已经尝试过这样的事情:

existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
wkspFldr = str(existGDBPath.split('\\')[0:-1])
print wkspFldr 

但是它给了我这样的结果:

['T:', 'Data', 'DBDesign']

这不是我需要的结果(为T:\Data\DBDesign)。

关于如何获取文件路径的任何想法?

I would like to get just the folder path from the full path to a file.

For example T:\Data\DBDesign\DBDesign_93_v141b.mdb and I would like to get just T:\Data\DBDesign (excluding the \DBDesign_93_v141b.mdb).

I have tried something like this:

existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
wkspFldr = str(existGDBPath.split('\\')[0:-1])
print wkspFldr 

but it gave me a result like this:

['T:', 'Data', 'DBDesign']

which is not the result that I require (being T:\Data\DBDesign).

Any ideas on how I can get the path to my file?


回答 0

您几乎可以使用该split功能了。您只需要加入字符串,如下所示。

>>> import os
>>> '\\'.join(existGDBPath.split('\\')[0:-1])
'T:\\Data\\DBDesign'

虽然,我建议使用该os.path.dirname函数来执行此操作,但是您只需要传递字符串即可,它将为您完成工作。由于您似乎在Windows上,因此也考虑使用该abspath功能。一个例子:

>>> import os
>>> os.path.dirname(os.path.abspath(existGDBPath))
'T:\\Data\\DBDesign'

如果要在拆分后同时需要文件名和目录路径,则可以使用os.path.split返回元组的函数,如下所示。

>>> import os
>>> os.path.split(os.path.abspath(existGDBPath))
('T:\\Data\\DBDesign', 'DBDesign_93_v141b.mdb')

You were almost there with your use of the split function. You just needed to join the strings, like follows.

>>> import os
>>> '\\'.join(existGDBPath.split('\\')[0:-1])
'T:\\Data\\DBDesign'

Although, I would recommend using the os.path.dirname function to do this, you just need to pass the string, and it’ll do the work for you. Since, you seem to be on windows, consider using the abspath function too. An example:

>>> import os
>>> os.path.dirname(os.path.abspath(existGDBPath))
'T:\\Data\\DBDesign'

If you want both the file name and the directory path after being split, you can use the os.path.split function which returns a tuple, as follows.

>>> import os
>>> os.path.split(os.path.abspath(existGDBPath))
('T:\\Data\\DBDesign', 'DBDesign_93_v141b.mdb')

回答 1

带有PATHLIB模块(更新后的答案)

人们应该考虑使用pathlib进行新开发。它在适用于Python3.4的stdlib中,但在PyPI上可用于早期版本。该库提供了一种更面向对象的方法来操作路径<opinion>,并且使用更加容易阅读和编程</opinion>

>>> import pathlib
>>> existGDBPath = pathlib.Path(r'T:\Data\DBDesign\DBDesign_93_v141b.mdb')
>>> wkspFldr = existGDBPath.parent
>>> print wkspFldr
Path('T:\Data\DBDesign')

使用OS模块

使用os.path模块:

>>> import os
>>> existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
>>> wkspFldr = os.path.dirname(existGDBPath)
>>> print wkspFldr 
'T:\Data\DBDesign'

您可以继续假设,如果您需要进行某种文件名操作,则已经在中实现了os.path。如果没有,您可能仍需要将此模块用作构建模块。

WITH PATHLIB MODULE (UPDATED ANSWER)

One should consider using pathlib for new development. It is in the stdlib for Python3.4, but available on PyPI for earlier versions. This library provides a more object-orented method to manipulate paths <opinion> and is much easier read and program with </opinion>.

>>> import pathlib
>>> existGDBPath = pathlib.Path(r'T:\Data\DBDesign\DBDesign_93_v141b.mdb')
>>> wkspFldr = existGDBPath.parent
>>> print wkspFldr
Path('T:\Data\DBDesign')

WITH OS MODULE

Use the os.path module:

>>> import os
>>> existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
>>> wkspFldr = os.path.dirname(existGDBPath)
>>> print wkspFldr 
'T:\Data\DBDesign'

You can go ahead and assume that if you need to do some sort of filename manipulation it’s already been implemented in os.path. If not, you’ll still probably need to use this module as the building block.


回答 2

内置子模块os.path具有用于该任务的功能。

import os
os.path.dirname('T:\Data\DBDesign\DBDesign_93_v141b.mdb')

The built-in submodule os.path has a function for that very task.

import os
os.path.dirname('T:\Data\DBDesign\DBDesign_93_v141b.mdb')

回答 3

这是代码:

import os
existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
wkspFldr = os.path.dirname(existGDBPath)
print wkspFldr # T:\Data\DBDesign

Here is the code:

import os
existGDBPath = r'T:\Data\DBDesign\DBDesign_93_v141b.mdb'
wkspFldr = os.path.dirname(existGDBPath)
print wkspFldr # T:\Data\DBDesign

回答 4

这是我的小实用程序帮助程序,用于拆分路径int文件和路径标记:

import os    
# usage: file, path = splitPath(s)
def splitPath(s):
    f = os.path.basename(s)
    p = s[:-(len(f))-1]
    return f, p

Here is my little utility helper for splitting paths int file, path tokens:

import os    
# usage: file, path = splitPath(s)
def splitPath(s):
    f = os.path.basename(s)
    p = s[:-(len(f))-1]
    return f, p

回答 5

任何在ESRI GIS Table字段计算器界面中尝试执行此操作的人都可以使用Python解析器执行此操作:

PathToContainingFolder =

"\\".join(!FullFilePathWithFileName!.split("\\")[0:-1])

以便

\ Users \ me \ Desktop \ New文件夹\ file.txt

变成

\ Users \ me \ Desktop \ New文件夹

Anyone trying to do this in the ESRI GIS Table field calculator interface can do this with the Python parser:

PathToContainingFolder =

"\\".join(!FullFilePathWithFileName!.split("\\")[0:-1])

so that

\Users\me\Desktop\New folder\file.txt

becomes

\Users\me\Desktop\New folder