Python 实用宝典

Question 1

To read some text file, in C or Pascal, I always use the following snippets to read the data until EOF:

while not eof do begin
  readline(a);
  do_something;
end;

Thus, I wonder how can I do this simple and fast in Python?

Question 2

Loop over the file to read lines:

with open('somefile') as openfileobject:
    for line in openfileobject:
        do_something()

File objects are iterable and yield lines until EOF. Using the file object as an iterable uses a buffer to ensure performant reads.

You can do the same with the stdin (no need to use raw_input():

import sys

for line in sys.stdin:
    do_something()

To complete the picture, binary reads can be done with:

from functools import partial

with open('somefile', 'rb') as openfileobject:
    for chunk in iter(partial(openfileobject.read, 1024), b''):
        do_something()

where chunk will contain up to 1024 bytes at a time from the file, and iteration stops when openfileobject.read(1024) starts returning empty byte strings.

Question 3

You can imitate the C idiom in Python.

To read a buffer up to max_size number of bytes, you can do this:

with open(filename, 'rb') as f:
    while True:
        buf = f.read(max_size)
        if not buf:
            break
        process(buf)

Or, a text file line by line:

# warning -- not idiomatic Python! See below...
with open(filename, 'rb') as f:
    while True:
        line = f.readline()
        if not line:
            break
        process(line)

You need to use while True / break construct since there is no eof test in Python other than the lack of bytes returned from a read.

In C, you might have:

while ((ch != '\n') && (ch != EOF)) {
   // read the next ch and add to a buffer
   // ..
}

However, you cannot have this in Python:

 while (line = f.readline()):
     # syntax error

because assignments are not allowed in expressions in Python (although recent versions of Python can mimic this using assignment expressions, see below).

It is certainly more idiomatic in Python to do this:

# THIS IS IDIOMATIC Python. Do this:
with open('somefile') as f:
    for line in f:
        process(line)

Update: Since Python 3.8 you may also use assignment expressions:

 while line := f.readline():
     process(line)

Question 4

The Python idiom for opening a file and reading it line-by-line is:

with open('filename') as f:
    for line in f:
        do_something(line)

The file will be automatically closed at the end of the above code (the with construct takes care of that).

Finally, it is worth noting that line will preserve the trailing newline. This can be easily removed using:

line = line.rstrip()

Question 5

You can use below code snippet to read line by line, till end of file

line = obj.readline()
while(line != ''):

    # Do Something

    line = obj.readline()

Question 6

While there are suggestions above for “doing it the python way”, if one wants to really have a logic based on EOF, then I suppose using exception handling is the way to do it —

try:
    line = raw_input()
    ... whatever needs to be done incase of no EOF ...
except EOFError:
    ... whatever needs to be done incase of EOF ...

Example:

$ echo test | python -c "while True: print raw_input()"
test
Traceback (most recent call last):
  File "<string>", line 1, in <module> 
EOFError: EOF when reading a line

Or press Ctrl-Z at a raw_input() prompt (Windows, Ctrl-Z Linux)

Question 7

You can use the following code snippet. readlines() reads in the whole file at once and splits it by line.

line = obj.readlines()

Question 8

In addition to @dawg’s great answer, the equivalent solution using walrus operator (Python >= 3.8):

with open(filename, 'rb') as f:
    while buf := f.read(max_size):
        process(buf)

Question 9

How can I get the file name and line number in python script.

Exactly the file information we get from an exception traceback. In this case without raising an exception.

Question 10

Thanks to mcandre, the answer is:

#python3
from inspect import currentframe, getframeinfo

frameinfo = getframeinfo(currentframe())

print(frameinfo.filename, frameinfo.lineno)

Question 11

Whether you use currentframe().f_back depends on whether you are using a function or not.

Calling inspect directly:

from inspect import currentframe, getframeinfo

cf = currentframe()
filename = getframeinfo(cf).filename

print "This is line 5, python says line ", cf.f_lineno 
print "The filename is ", filename

Calling a function that does it for you:

from inspect import currentframe

def get_linenumber():
    cf = currentframe()
    return cf.f_back.f_lineno

print "This is line 7, python says line ", get_linenumber()

Question 12

Handy if used in a common file – prints file name, line number and function of the caller:

import inspect
def getLineInfo():
    print(inspect.stack()[1][1],":",inspect.stack()[1][2],":",
          inspect.stack()[1][3])

Question 13

Filename:

__file__
# or
sys.argv[0]

Line:

inspect.currentframe().f_lineno

(not inspect.currentframe().f_back.f_lineno as mentioned above)

Question 14

Better to use sys also-

print dir(sys._getframe())
print dir(sys._getframe().f_lineno)
print sys._getframe().f_lineno

The output is:

['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'f_back', 'f_builtins', 'f_code', 'f_exc_traceback', 'f_exc_type', 'f_exc_value', 'f_globals', 'f_lasti', 'f_lineno', 'f_locals', 'f_restricted', 'f_trace']
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']
14

Question 15

Just to contribute,

there is a linecache module in python, here is two links that can help.

linecache module documentation
linecache source code

In a sense, you can “dump” a whole file into its cache , and read it with linecache.cache data from class.

import linecache as allLines
## have in mind that fileName in linecache behaves as any other open statement, you will need a path to a file if file is not in the same directory as script
linesList = allLines.updatechache( fileName ,None)
for i,x in enumerate(lineslist): print(i,x) #prints the line number and content
#or for more info
print(line.cache)
#or you need a specific line
specLine = allLines.getline(fileName,numbOfLine)
#returns a textual line from that number of line

For additional info, for error handling, you can simply use

from sys import exc_info
try:
     raise YourError # or some other error
except Exception:
     print(exc_info() )

Question 16

import inspect    

file_name = __FILE__
current_line_no = inspect.stack()[0][2]
current_function_name = inspect.stack()[0][3]

#Try printing inspect.stack() you can see current stack and pick whatever you want

Question 17

In Python 3 you can use a variation on:

def Deb(msg = None):
  print(f"Debug {sys._getframe().f_back.f_lineno}: {msg if msg is not None else ''}")

In code, you can then use:

Deb("Some useful information")
Deb()

To produce:

123: Some useful information
124:

Where the 123 and 124 are the lines that the calls are made from.

Question 18

Here’s what works for me to get the line number in Python 3.7.3 in VSCode 1.39.2 (dmsg is my mnemonic for debug message):

import inspect

def dmsg(text_s):
    print (str(inspect.currentframe().f_back.f_lineno) + '| ' + text_s)

To call showing a variable name_s and its value:

name_s = put_code_here
dmsg('name_s: ' + name_s)

Output looks like this:

37| name_s: value_of_variable_at_line_37

Question 19

Why does trying to print directly to a file instead of sys.stdout produce the following syntax error:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f1=open('./testfile', 'w+')
>>> print('This is a test', file=f1)
  File "<stdin>", line 1
    print('This is a test', file=f1)
                            ^
SyntaxError: invalid syntax

From help(__builtins__) I have the following info:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout)

    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file: a file-like object (stream); defaults to the current sys.stdout.
    sep:  string inserted between values, default a space.
    end:  string appended after the last value, default a newline.

So what would be the right syntax to change the standard stream print writes to?

I know that there are different maybe better ways to write to file but I really don’t get why this should be a syntax error…

A nice explanation would be appreciated!

Question 20

If you want to use the print function in Python 2, you have to import from __future__:

from __future__ import print_function

But you can have the same effect without using the function, too:

print >>f1, 'This is a test'

Question 21

print is a keyword in python 2.X. You should use the following:

f1=open('./testfile', 'w+')
f1.write('This is a test')
f1.close()

Question 22

print(args, file=f1) is the python 3.x syntax. For python 2.x use print >> f1, args.

Question 23

You can export print statement to file without changing any code. Simply open a terminal windows and run your code in this way:

python yourcode.py >> log.txt

Question 24

This will redirect your ‘print’ output to a file:

import sys
sys.stdout = open("file.txt", "w+")
print "this line will redirect to file.txt"

Question 25

In Python 3.0+, print is a function, which you’d call with print(...). In earlier version, print is a statement, which you’d make with print ....

To print to a file in Python earlier than 3.0, you’d do:

print >> f, 'what ever %d', i

The >> operator directs print to the file f.

Question 26

How to copy all the files present in one directory to another directory using Python. I have the source path and the destination path as string.

Question 27

You can use os.listdir() to get the files in the source directory, os.path.isfile() to see if they are regular files (including symbolic links on *nix systems), and shutil.copy to do the copying.

The following code copies only the regular files from the source directory into the destination directory (I’m assuming you don’t want any sub-directories copied).

import os
import shutil
src_files = os.listdir(src)
for file_name in src_files:
    full_file_name = os.path.join(src, file_name)
    if os.path.isfile(full_file_name):
        shutil.copy(full_file_name, dest)

Question 28

If you don’t want to copy the whole tree (with subdirs etc), use or glob.glob("path/to/dir/*.*") to get a list of all the filenames, loop over the list and use shutil.copy to copy each file.

for filename in glob.glob(os.path.join(source_dir, '*.*')):
    shutil.copy(filename, dest_dir)

Question 29

Look at shutil in the Python docs, specifically the copytree command.

If the destination directory already exists, try:

shutil.copytree(source, destination, dirs_exist_ok=True)

Question 30

def recursive_copy_files(source_path, destination_path, override=False):
    """
    Recursive copies files from source  to destination directory.
    :param source_path: source directory
    :param destination_path: destination directory
    :param override if True all files will be overridden otherwise skip if file exist
    :return: count of copied files
    """
    files_count = 0
    if not os.path.exists(destination_path):
        os.mkdir(destination_path)
    items = glob.glob(source_path + '/*')
    for item in items:
        if os.path.isdir(item):
            path = os.path.join(destination_path, item.split('/')[-1])
            files_count += recursive_copy_files(source_path=item, destination_path=path, override=override)
        else:
            file = os.path.join(destination_path, item.split('/')[-1])
            if not os.path.exists(file) or override:
                shutil.copyfile(item, file)
                files_count += 1
    return files_count

Question 31

import os
import shutil
os.chdir('C:\\') #Make sure you add your source and destination path below

dir_src = ("C:\\foooo\\")
dir_dst = ("C:\\toooo\\")

for filename in os.listdir(dir_src):
    if filename.endswith('.txt'):
        shutil.copy( dir_src + filename, dir_dst)
    print(filename)

Question 32

Here is another example of a recursive copy function that lets you copy the contents of the directory (including sub-directories) one file at a time, which I used to solve this problem.

import os
import shutil

def recursive_copy(src, dest):
    """
    Copy each file from src dir to dest dir, including sub-directories.
    """
    for item in os.listdir(src):
        file_path = os.path.join(src, item)

        # if item is a file, copy it
        if os.path.isfile(file_path):
            shutil.copy(file_path, dest)

        # else if item is a folder, recurse 
        elif os.path.isdir(file_path):
            new_dest = os.path.join(dest, item)
            os.mkdir(new_dest)
            recursive_copy(file_path, new_dest)

EDIT: If you can, definitely just use shutil.copytree(src, dest). This requires that that destination folder does not already exist though. If you need to copy files into an existing folder, the above method works well!

Question 33

Recently I am using Python module os, when I tried to change the permission of a file, I did not get the expected result. For example, I intended to change the permission to rw-rw-r–,

os.chmod("/tmp/test_file", 664)

The ownership permission is actually -w–wx— (230)

--w--wx--- 1 ag ag 0 Mar 25 05:45 test_file

However, if I change 664 to 0664 in the code, the result is just what I need, e.g.

os.chmod("/tmp/test_file", 0664)

The result is:

-rw-rw-r-- 1 ag ag 0 Mar 25 05:55 test_file

Could anybody help explaining why does that leading 0 is so important to get the correct result?

Question 34

Found this on a different forum

If you’re wondering why that leading zero is important, it’s because permissions are set as an octal integer, and Python automagically treats any integer with a leading zero as octal. So os.chmod(“file”, 484) (in decimal) would give the same result.

What you are doing is passing 664 which in octal is 1230

In your case you would need

os.chmod("/tmp/test_file", 436)

[Update] Note, for Python 3 you have prefix with 0o (zero oh). E.G, 0o666

Question 35

So for people who want semantics similar to:

$ chmod 755 somefile

Use:

$ python -c "import os; os.chmod('somefile', 0o755)"

If your Python is older than 2.6:

$ python -c "import os; os.chmod('somefile', 0755)"

Question 36

leading 0 means this is octal constant, not the decimal one. and you need an octal to change file mode.

permissions are a bit mask, for example, rwxrwx--- is 111111000 in binary, and it’s very easy to group bits by 3 to convert to the octal, than calculate the decimal representation.

0644 (octal) is 0.110.100.100 in binary (i’ve added dots for readability), or, as you may calculate, 420 in decimal.

Question 37

Use permission symbols instead of numbers

Your problem would have been avoided if you had used the more semantically named permission symbols rather than raw magic numbers, e.g. for 664:

#!/usr/bin/env python3

import os
import stat

os.chmod(
    'myfile',
    stat.S_IRUSR |
    stat.S_IWUSR |
    stat.S_IRGRP |
    stat.S_IWGRP |
    stat.S_IROTH
)

This is documented at https://docs.python.org/3/library/os.html#os.chmod and the names are the same as the POSIX C API values documented at man 2 stat.

Another advantage is the greater portability as mentioned in the docs:

Note: Although Windows supports chmod(), you can only set the file’s read-only flag with it (via the stat.S_IWRITE and stat.S_IREAD constants or a corresponding integer value). All other bits are ignored.

chmod +x is demonstrated at: How do you do a simple “chmod +x” from within python?

Tested in Ubuntu 16.04, Python 3.5.2.

Question 38

If you have desired permissions saved to string then do

s = '660'
os.chmod(file_path, int(s, base=8))

Question 39

Using the stat.* bit masks does seem to me the most portable and explicit way of doing this. But on the other hand, I often forget how best to handle that. So, here’s an example of masking out the ‘group’ and ‘other’ permissions and leaving ‘owner’ permissions untouched. Using bitmasks and subtraction is a useful pattern.

import os
import stat
def chmodme(pn):
    """Removes 'group' and 'other' perms. Doesn't touch 'owner' perms."""
    mode = os.stat(pn).st_mode
    mode -= (mode & (stat.S_IRWXG | stat.S_IRWXO))
    os.chmod(pn, mode)

Question 40

I am getting an interesting error while trying to use Unpickler.load(), here is the source code:

open(target, 'a').close()
scores = {};
with open(target, "rb") as file:
    unpickler = pickle.Unpickler(file);
    scores = unpickler.load();
    if not isinstance(scores, dict):
        scores = {};

Here is the traceback:

Traceback (most recent call last):
File "G:\python\pendu\user_test.py", line 3, in <module>:
    save_user_points("Magix", 30);
File "G:\python\pendu\user.py", line 22, in save_user_points:
    scores = unpickler.load();
EOFError: Ran out of input

The file I am trying to read is empty. How can I avoid getting this error, and get an empty variable instead?

Question 41

I would check that the file is not empty first:

import os

scores = {} # scores is an empty dict already

if os.path.getsize(target) > 0:      
    with open(target, "rb") as f:
        unpickler = pickle.Unpickler(f)
        # if file is not empty scores will be equal
        # to the value unpickled
        scores = unpickler.load()

Also open(target, 'a').close() is doing nothing in your code and you don’t need to use ;.

Question 42

Most of the answers here have dealt with how to mange EOFError exceptions, which is really handy if you’re unsure about whether the pickled object is empty or not.

However, if you’re surprised that the pickle file is empty, it could be because you opened the filename through ‘wb’ or some other mode that could have over-written the file.

for example:

filename = 'cd.pkl'
with open(filename, 'wb') as f:
    classification_dict = pickle.load(f)

This will over-write the pickled file. You might have done this by mistake before using:

...
open(filename, 'rb') as f:

And then got the EOFError because the previous block of code over-wrote the cd.pkl file.

When working in Jupyter, or in the console (Spyder) I usually write a wrapper over the reading/writing code, and call the wrapper subsequently. This avoids common read-write mistakes, and saves a bit of time if you’re going to be reading the same file multiple times through your travails

Question 43

As you see, that’s actually a natural error ..

A typical construct for reading from an Unpickler object would be like this ..

try:
    data = unpickler.load()
except EOFError:
    data = list()  # or whatever you want

EOFError is simply raised, because it was reading an empty file, it just meant End of File ..

Question 44

It is very likely that the pickled file is empty.

It is surprisingly easy to overwrite a pickle file if you’re copying and pasting code.

For example the following writes a pickle file:

pickle.dump(df,open('df.p','wb'))

And if you copied this code to reopen it, but forgot to change 'wb' to 'rb' then you would overwrite the file:

df=pickle.load(open('df.p','wb'))

The correct syntax is

df=pickle.load(open('df.p','rb'))

Question 45

if path.exists(Score_file):
      try : 
         with open(Score_file , "rb") as prev_Scr:

            return Unpickler(prev_Scr).load()

    except EOFError : 

        return dict()

Question 46

You can catch that exception and return whatever you want from there.

open(target, 'a').close()
scores = {};
try:
    with open(target, "rb") as file:
        unpickler = pickle.Unpickler(file);
        scores = unpickler.load();
        if not isinstance(scores, dict):
            scores = {};
except EOFError:
    return {}

Question 47

Note that the mode of opening files is ‘a’ or some other have alphabet ‘a’ will also make error because of the overwritting.

pointer = open('makeaafile.txt', 'ab+')
tes = pickle.load(pointer, encoding='utf-8')

Question 48

Currently I’m using this:

f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.close()

But the problem is that the old file is larger than the new file. So I end up with a new file that has a part of the old file on the end of it.

Question 49

If you don’t want to close and reopen the file, to avoid race conditions, you could truncate it:

f = open(filename, 'r+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.write(text)
f.truncate()
f.close()

The functionality will likely also be cleaner and safer using open as a context manager, which will close the file handler, even if an error occurs!

with open(filename, 'r+') as f:
    text = f.read()
    text = re.sub('foobar', 'bar', text)
    f.seek(0)
    f.write(text)
    f.truncate()

Question 50

Probably it would be easier and neater to close the file after text = re.sub('foobar', 'bar', text), re-open it for writing (thus clearing old contents), and write your updated text to it.

Question 51

The fileinput module has an inplace mode for writing changes to the file you are processing without using temporary files etc. The module nicely encapsulates the common operation of looping over the lines in a list of files, via an object which transparently keeps track of the file name, line number etc if you should want to inspect them inside the loop.

from fileinput import FileInput
for line in FileInput("file", inplace=1):
    line = line.replace("foobar", "bar")
    print(line)

Question 52

Honestly you can take a look at this class that I built which does basic file operations. The write method overwrites and append keeps old data.

class IO:
    def read(self, filename):
        toRead = open(filename, "rb")

        out = toRead.read()
        toRead.close()
        
        return out
    
    def write(self, filename, data):
        toWrite = open(filename, "wb")

        out = toWrite.write(data)
        toWrite.close()

    def append(self, filename, data):
        append = self.read(filename)
        self.write(filename, append+data)

Question 53

Try writing it in a new file..

f = open(filename, 'r+')
f2= open(filename2,'a+')
text = f.read()
text = re.sub('foobar', 'bar', text)
f.seek(0)
f.close()
f2.write(text)
fw.close()

Question 54

Several times here on SO I’ve seen people using rt and wt modes for reading and writing files.

For example:

with open('input.txt', 'rt') as input_file:
     with open('output.txt', 'wt') as output_file: 
         ...

I don’t see the modes documented, but since open() doesn’t throw an error – looks like it’s pretty much legal to use.

What is it for and is there any difference between using wt vs w and rt vs r?

Question 55

t refers to the text mode. There is no difference between r and rt or w and wt since text mode is the default.

Documented here:

Character   Meaning
'r'     open for reading (default)
'w'     open for writing, truncating the file first
'x'     open for exclusive creation, failing if the file already exists
'a'     open for writing, appending to the end of the file if it exists
'b'     binary mode
't'     text mode (default)
'+'     open a disk file for updating (reading and writing)
'U'     universal newlines mode (deprecated)

The default mode is 'r' (open for reading text, synonym of 'rt').

Question 56

The t indicates text mode, meaning that \n characters will be translated to the host OS line endings when writing to a file, and back again when reading. The flag is basically just noise, since text mode is the default.

Other than U, those mode flags come directly from the standard C library’s fopen() function, a fact that is documented in the sixth paragraph of the python2 documentation for open().

As far as I know, t is not and has never been part of the C standard, so although many implementations of the C library accept it anyway, there’s no guarantee that they all will, and therefore no guarantee that it will work on every build of python. That explains why the python2 docs didn’t list it, and why it generally worked anyway. The python3 docs make it official.

Question 57

The ‘r’ is for reading, ‘w’ for writing and ‘a’ is for appending.

The ‘t’ represents text mode as apposed to binary mode.

Several times here on SO I’ve seen people using rt and wt modes for reading and writing files.

Edit: Are you sure you saw rt and not rb?

These functions generally wrap the fopen function which is described here:

http://www.cplusplus.com/reference/cstdio/fopen/

As you can see it mentions the use of b to open the file in binary mode.

The document link you provided also makes reference to this b mode:

Appending ‘b’ is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.

Question 58

t indicates for text mode

https://docs.python.org/release/3.1.5/library/functions.html#open

on linux, there’s no difference between text mode and binary mode, however, in windows, they converts \n to \r\n when text mode.

http://www.cygwin.com/cygwin-ug-net/using-textbinary.html

Question 59

Code:

file('pinax/media/a.jpg', 'wb')

Question 60

File mode, write and binary. Since you are writing a .jpg file, it looks fine.

But if you supposed to read that jpg file you need to use 'rb'

More info

On Windows, ‘b’ appended to the mode opens the file in binary mode, so there are also modes like ‘rb’, ‘wb’, and ‘r+b’. Python on Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it’ll corrupt binary data like that in JPEG or EXE files.

Question 61

The wb indicates that the file is opened for writing in binary mode.

When writing in binary mode, Python makes no changes to data as it is written to the file. In text mode (when the b is excluded as in just w or when you specify text mode with wt), however, Python will encode the text based on the default text encoding. Additionally, Python will convert line endings (\n) to whatever the platform-specific line ending is, which would corrupt a binary file like an exe or png file.

Text mode should therefore be used when writing text files (whether using plain text or a text-based format like CSV), while binary mode must be used when writing non-text files like images.

References:

https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files https://docs.python.org/3/library/functions.html#open

Question 62

That is the mode with which you are opening the file. “wb” means that you are writing to the file (w), and that you are writing in binary mode (b).

Check out the documentation for more: clicky

Question 63

I’m currently trying to read data from .csv files in Python 2.7 with up to 1 million rows, and 200 columns (files range from 100mb to 1.6gb). I can do this (very slowly) for the files with under 300,000 rows, but once I go above that I get memory errors. My code looks like this:

def getdata(filename, criteria):
    data=[]
    for criterion in criteria:
        data.append(getstuff(filename, criteron))
    return data

def getstuff(filename, criterion):
    import csv
    data=[]
    with open(filename, "rb") as csvfile:
        datareader=csv.reader(csvfile)
        for row in datareader: 
            if row[3]=="column header":
                data.append(row)
            elif len(data)<2 and row[3]!=criterion:
                pass
            elif row[3]==criterion:
                data.append(row)
            else:
                return data

The reason for the else clause in the getstuff function is that all the elements which fit the criterion will be listed together in the csv file, so I leave the loop when I get past them to save time.

My questions are:

How can I manage to get this to work with the bigger files?
Is there any way I can make it faster?

My computer has 8gb RAM, running 64bit Windows 7, and the processor is 3.40 GHz (not certain what information you need).

Question 64

You are reading all rows into a list, then processing that list. Don’t do that.

Process your rows as you produce them. If you need to filter the data first, use a generator function:

import csv

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        count = 0
        for row in datareader:
            if row[3] == criterion:
                yield row
                count += 1
            elif count:
                # done when having read a consecutive series of rows 
                return

I also simplified your filter test; the logic is the same but more concise.

Because you are only matching a single sequence of rows matching the criterion, you could also use:

import csv
from itertools import dropwhile, takewhile

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        # first row, plus any subsequent rows that match, then stop
        # reading altogether
        # Python 2: use `for row in takewhile(...): yield row` instead
        # instead of `yield from takewhile(...)`.
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))
        return

You can now loop over getstuff() directly. Do the same in getdata():

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

Now loop directly over getdata() in your code:

for row in getdata(somefilename, sequence_of_criteria):
    # process row

You now only hold one row in memory, instead of your thousands of lines per criterion.

yield makes a function a generator function, which means it won’t do any work until you start looping over it.

问题：在Python中，“虽然不是EOF”的完美替代品是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：python脚本的文件名和行号

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：Python 2.7：打印到文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：在Python中复制多个文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Python模块os.chmod（file，664）不会将权限更改为rw-rw-r，而是-w–wx —-

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：为什么在读取一个空文件时出现“ Pickle-EOFError：Ran out of input”的问题？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：在Python中读取和覆盖文件

回答 0

回答 1

回答 2

回答 3

回答 4

问题：以“ rt”和“ wt”模式打开文件

回答 0

回答 1

回答 2

回答 3

问题：使用Python，“ wb”在此代码中是什么意思？

回答 0

回答 1

回答 2

问题：读取巨大的.csv文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

有趣好用的Python教程