问题:用python读取二进制文件

我发现用Python读取二进制文件特别困难。你能帮我个忙吗?我需要读取此文件,在Fortran 90中,该文件很容易被读取

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

详细而言,文件格式为:

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N. 

如何使用Python阅读?我尝试了一切,但没有成功。我是否有可能在python中使用f90程序,读取此二进制文件,然后保存需要使用的数据?

I find particularly difficult reading binary file with Python. Can you give me a hand? I need to read this file, which in Fortran 90 is easily read by

int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
read (*) (group_id(j),j=1,n_particles)

In detail, the file format is:

Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N. 

How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?


回答 0

读取二进制文件内容,如下所示:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

然后使用struct.unpack “解压缩”二进制数据:

起始字节: struct.unpack("iiiii", fileContent[:20])

正文:忽略标题字节和尾随字节(= 24);其余部分构成主体,要知道主体中的字节数除以4,就可以得到整数。将获得的商乘以字符串'i'以为unpack方法创建正确的格式:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

结束字节: struct.unpack("i", fileContent[-4:])

Read the binary file content like this:

with open(fileName, mode='rb') as file: # b is important -> binary
    fileContent = file.read()

then “unpack” binary data using struct.unpack:

The start bytes: struct.unpack("iiiii", fileContent[:20])

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string 'i' to create the correct format for the unpack method:

struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])

The end byte: struct.unpack("i", fileContent[-4:])


回答 1

通常,我建议您考虑使用Python的struct模块。这是Python的标准,并且应该很容易将您的问题的说明翻译成适合的格式字符串struct.unpack()

请注意,如果字段之间/周围存在“不可见”填充,则需要弄清楚该填充并将其包含在unpack()调用中,否则您将读取错误的位。

读取文件内容以进行解压缩很简单:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

这将解压缩前两个字段,假设它们从文件的开头开始(没有填充或无关的数据),并且还假设本机字节顺序(@符号)。I格式字符串中的s表示“ 32位无符号整数”。

In general, I would recommend that you look into using Python’s struct module for this. It’s standard with Python, and it should be easy to translate your question’s specification into a formatting string suitable for struct.unpack().

Do note that if there’s “invisible” padding between/around the fields, you will need to figure that out and include it in the unpack() call, or you will read the wrong bits.

Reading the contents of the file in order to have something to unpack is pretty trivial:

import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)

This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the @ symbol). The Is in the formatting string mean “unsigned integer, 32 bits”.


回答 2

您可以使用,它可以从文本文件和二进制文件中读取数据。您首先要使用构造一个数据类型,该数据类型表示您的文件格式,然后使用来从文件中读取此类型numpy.fromfile

You could use , which can read data from both text and binary files. You would first construct a data type, which represents your file format, using , and then read this type from file using numpy.fromfile.


回答 3

要将二进制文件读取到bytes对象:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

要从int字节0-3 创建数据:

i = int.from_bytes(data[:4], byteorder='little', signed=False)

要从int数据中解压缩多个:

import struct
ints = struct.unpack('iiii', data[:16])

To read a binary file to a bytes object:

from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+

To create an int from bytes 0-3 of the data:

i = int.from_bytes(data[:4], byteorder='little', signed=False)

To unpack multiple ints from the data:

import struct
ints = struct.unpack('iiii', data[:16])

回答 4

我也发现在读取和写入二进制文件方面缺少Python,因此我编写了一个小模块(适用于Python 3.6+)。

使用binaryfile,您将执行以下操作(我猜是因为我不了解Fortran):

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

产生如下输出:

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

我使用skip()跳过了Fortran添加的其他数据,但是您可能想添加一个实用程序来正确处理Fortran记录。如果您这样做,将欢迎提出请求。

I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).

With binaryfile you’d do something like this (I’m guessing, since I don’t know Fortran):

import binaryfile

def particle_file(f):
    f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
    f.skip(4)  # Bytes 1-4
    num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
    f.int('num_groups', 4)  # Bytes 9-12
    f.skip(8)  # Bytes 13-20
    for i in range(num_particles):
        f.struct('group_ids', '>f')  # 4 bytes x num_particles
    f.skip(4)

with open('myfile.bin', 'rb') as fh:
    result = binaryfile.read(fh, particle_file)
print(result)

Which produces an output like this:

{
    'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
    '__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
    'num_particles': 5,
    'num_groups': 3
}

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.


回答 5

import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()
import pickle
f=open("filename.dat","rb")
try:
    while True:
        x=pickle.load(f)
        print x
except EOFError:
    pass
f.close()

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。