标签归档:zip

一对单对

问题:一对单对

通常,我发现需要成对处理列表。我想知道哪种方法是有效的pythonic方法,并在Google上找到了它:

pairs = zip(t[::2], t[1::2])

我认为这已经足够好用了,但是在最近涉及成语与效率的讨论之后,我决定进行一些测试:

import time
from itertools import islice, izip

def pairs_1(t):
    return zip(t[::2], t[1::2]) 

def pairs_2(t):
    return izip(t[::2], t[1::2]) 

def pairs_3(t):
    return izip(islice(t,None,None,2), islice(t,1,None,2))

A = range(10000)
B = xrange(len(A))

def pairs_4(t):
    # ignore value of t!
    t = B
    return izip(islice(t,None,None,2), islice(t,1,None,2))

for f in pairs_1, pairs_2, pairs_3, pairs_4:
    # time the pairing
    s = time.time()
    for i in range(1000):
        p = f(A)
    t1 = time.time() - s

    # time using the pairs
    s = time.time()
    for i in range(1000):
        p = f(A)
        for a, b in p:
            pass
    t2 = time.time() - s
    print t1, t2, t2-t1

这些是我计算机上的结果:

1.48668909073 2.63187503815 1.14518594742
0.105381965637 1.35109519958 1.24571323395
0.00257992744446 1.46182489395 1.45924496651
0.00251388549805 1.70076990128 1.69825601578

如果我正确地解释了它们,那应该意味着在Python中实现列表,列表索引和列表切片非常有效。这是令人安慰和意外的结果。

是否有另一种“更好”的成对遍历列表的方式?

请注意,如果列表中元素的数量为奇数,则最后一个元素将不在任何对中。

确保包含所有元素的正确方法是哪种?

我从测试答案中添加了这两个建议:

def pairwise(t):
    it = iter(t)
    return izip(it, it)

def chunkwise(t, size=2):
    it = iter(t)
    return izip(*[it]*size)

结果如下:

0.00159502029419 1.25745987892 1.25586485863
0.00222492218018 1.23795199394 1.23572707176

到目前为止的结果

最pythonic,非常高效:

pairs = izip(t[::2], t[1::2])

最有效且非常pythonic:

pairs = izip(*[iter(t)]*2)

我花了一点时间想知道第一个答案使用了两个迭代器,而第二个答案使用了一个迭代器。

为了处理具有奇数个元素的序列,建议增加原始序列,增加一个元素(None)与之前的最后一个元素配对,这可以通过实现itertools.izip_longest()

最后

请注意,在Python 3.x中,zip()其行为与itertools.izip()itertools.izip() 消失了。

Often enough, I’ve found the need to process a list by pairs. I was wondering which would be the pythonic and efficient way to do it, and found this on Google:

pairs = zip(t[::2], t[1::2])

I thought that was pythonic enough, but after a recent discussion involving idioms versus efficiency, I decided to do some tests:

import time
from itertools import islice, izip

def pairs_1(t):
    return zip(t[::2], t[1::2]) 

def pairs_2(t):
    return izip(t[::2], t[1::2]) 

def pairs_3(t):
    return izip(islice(t,None,None,2), islice(t,1,None,2))

A = range(10000)
B = xrange(len(A))

def pairs_4(t):
    # ignore value of t!
    t = B
    return izip(islice(t,None,None,2), islice(t,1,None,2))

for f in pairs_1, pairs_2, pairs_3, pairs_4:
    # time the pairing
    s = time.time()
    for i in range(1000):
        p = f(A)
    t1 = time.time() - s

    # time using the pairs
    s = time.time()
    for i in range(1000):
        p = f(A)
        for a, b in p:
            pass
    t2 = time.time() - s
    print t1, t2, t2-t1

These were the results on my computer:

1.48668909073 2.63187503815 1.14518594742
0.105381965637 1.35109519958 1.24571323395
0.00257992744446 1.46182489395 1.45924496651
0.00251388549805 1.70076990128 1.69825601578

If I’m interpreting them correctly, that should mean that the implementation of lists, list indexing, and list slicing in Python is very efficient. It’s a result both comforting and unexpected.

Is there another, “better” way of traversing a list in pairs?

Note that if the list has an odd number of elements then the last one will not be in any of the pairs.

Which would be the right way to ensure that all elements are included?

I added these two suggestions from the answers to the tests:

def pairwise(t):
    it = iter(t)
    return izip(it, it)

def chunkwise(t, size=2):
    it = iter(t)
    return izip(*[it]*size)

These are the results:

0.00159502029419 1.25745987892 1.25586485863
0.00222492218018 1.23795199394 1.23572707176

Results so far

Most pythonic and very efficient:

pairs = izip(t[::2], t[1::2])

Most efficient and very pythonic:

pairs = izip(*[iter(t)]*2)

It took me a moment to grok that the first answer uses two iterators while the second uses a single one.

To deal with sequences with an odd number of elements, the suggestion has been to augment the original sequence adding one element (None) that gets paired with the previous last element, something that can be achieved with itertools.izip_longest().

Finally

Note that, in Python 3.x, zip() behaves as itertools.izip(), and itertools.izip() is gone.


回答 0

我最喜欢的方式:

from itertools import izip

def pairwise(t):
    it = iter(t)
    return izip(it,it)

# for "pairs" of any length
def chunkwise(t, size=2):
    it = iter(t)
    return izip(*[it]*size)

当您要配对所有元素时,您显然可能需要一个fillvalue:

from itertools import izip_longest
def blockwise(t, size=2, fillvalue=None):
    it = iter(t)
    return izip_longest(*[it]*size, fillvalue=fillvalue)

My favorite way to do it:

from itertools import izip

def pairwise(t):
    it = iter(t)
    return izip(it,it)

# for "pairs" of any length
def chunkwise(t, size=2):
    it = iter(t)
    return izip(*[it]*size)

When you want to pair all elements you obviously might need a fillvalue:

from itertools import izip_longest
def blockwise(t, size=2, fillvalue=None):
    it = iter(t)
    return izip_longest(*[it]*size, fillvalue=fillvalue)

回答 1

我想说您的初始解决方案pairs = zip(t[::2], t[1::2])是最好的解决方案,因为它最容易阅读(在Python 3中,它会zip自动返回一个迭代器而不是列表)。

为了确保包括所有元素,您可以通过扩展列表None

然后,如果列表中元素的数量为奇数,则最后一对将为(item, None)

>>> t = [1,2,3,4,5]
>>> t.append(None)
>>> zip(t[::2], t[1::2])
[(1, 2), (3, 4), (5, None)]
>>> t = [1,2,3,4,5,6]
>>> t.append(None)
>>> zip(t[::2], t[1::2])
[(1, 2), (3, 4), (5, 6)]

I’d say that your initial solution pairs = zip(t[::2], t[1::2]) is the best one because it is easiest to read (and in Python 3, zip automatically returns an iterator instead of a list).

To ensure that all elements are included, you could simply extend the list by None.

Then, if the list has an odd number of elements, the last pair will be (item, None).

>>> t = [1,2,3,4,5]
>>> t.append(None)
>>> zip(t[::2], t[1::2])
[(1, 2), (3, 4), (5, None)]
>>> t = [1,2,3,4,5,6]
>>> t.append(None)
>>> zip(t[::2], t[1::2])
[(1, 2), (3, 4), (5, 6)]

回答 2

我从小的免责声明开始-不要使用下面的代码。根本不是Pythonic,我只是为了好玩而写。它类似于@ THC4k pairwise函数,但使用iterlambda闭包。它不使用itertools模块,不支持fillvalue。我把它放在这里是因为有人可能会觉得有趣:

pairwise = lambda t: iter((lambda f: lambda: (f(), f()))(iter(t).next), None)

I start with small disclaimer – don’t use the code below. It’s not Pythonic at all, I wrote just for fun. It’s similar to @THC4k pairwise function but it uses iter and lambda closures. It doesn’t use itertools module and doesn’t support fillvalue. I put it here because someone might find it interesting:

pairwise = lambda t: iter((lambda f: lambda: (f(), f()))(iter(t).next), None)

回答 3

就大多数pythonic而言,我想说python源文档中提供食谱(其中一些看起来很像@JochenRitzel提供的答案)可能是您最好的选择;)

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

As far as most pythonic goes, I’d say the recipes supplied in the python source docs (some of which look a lot like the answers that @JochenRitzel provided) is probably your best bet ;)

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

On modern python you just have to use zip_longest(*args, fillvalue=fillvalue) according to the corresponding doc page.


回答 4

是否有另一种“更好”的成对遍历列表的方式?

我不能肯定地说,但我对此表示怀疑:任何其他遍历都会包含更多必须解释的Python代码。诸如zip()之类的内置函数是用C编写的,这要快得多。

确保包含所有元素的正确方法是哪种?

检查列表的长度,如果它是奇数(len(list) & 1 == 1),则复制列表并附加一个项目。

Is there another, “better” way of traversing a list in pairs?

I can’t say for sure but I doubt it: Any other traversal would include more Python code which has to be interpreted. The built-in functions like zip() are written in C which is much faster.

Which would be the right way to ensure that all elements are included?

Check the length of the list and if it’s odd (len(list) & 1 == 1), copy the list and append an item.


回答 5

>>> my_list = [1,2,3,4,5,6,7,8,9,10]
>>> my_pairs = list()
>>> while(my_list):
...     a = my_list.pop(0); b = my_list.pop(0)
...     my_pairs.append((a,b))
... 
>>> print(my_pairs)
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
>>> my_list = [1,2,3,4,5,6,7,8,9,10]
>>> my_pairs = list()
>>> while(my_list):
...     a = my_list.pop(0); b = my_list.pop(0)
...     my_pairs.append((a,b))
... 
>>> print(my_pairs)
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

回答 6

只做:

>>> l = [1, 2, 3, 4, 5, 6]
>>> [(x,y) for x,y in zip(l[:-1], l[1:])]
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]

Only do it:

>>> l = [1, 2, 3, 4, 5, 6]
>>> [(x,y) for x,y in zip(l[:-1], l[1:])]
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]

回答 7

这是使用生成器创建对/腿的示例。生成器不受堆栈限制

def pairwise(data):
    zip(data[::2], data[1::2])

例:

print(list(pairwise(range(10))))

输出:

[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

Here is an example of creating pairs/legs by using a generator. Generators are free from stack limits

def pairwise(data):
    zip(data[::2], data[1::2])

Example:

print(list(pairwise(range(10))))

Output:

[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

回答 8

万一有人需要明智的答案算法,这里是:

>>> def getPairs(list):
...     out = []
...     for i in range(len(list)-1):
...         a = list.pop(0)
...         for j in a:
...             out.append([a, j])
...     return b
>>> 
>>> k = [1, 2, 3, 4]
>>> l = getPairs(k)
>>> l
[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

但是请注意,您原来的列表也将被简化为最后一个元素,因为您使用pop了它。

>>> k
[4]

Just in case someone needs the answer algorithm-wise, here it is:

>>> def getPairs(list):
...     out = []
...     for i in range(len(list)-1):
...         a = list.pop(0)
...         for j in a:
...             out.append([a, j])
...     return b
>>> 
>>> k = [1, 2, 3, 4]
>>> l = getPairs(k)
>>> l
[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

But take note that your original list will also be reduced to its last element, because you used pop on it.

>>> k
[4]

用列表输出而不是元组压缩

问题:用列表输出而不是元组压缩

从两个列表中选择列表的最快,最优雅的方法是什么?

我有

In [1]: a=[1,2,3,4,5,6]

In [2]: b=[7,8,9,10,11,12]

In [3]: zip(a,b)
Out[3]: [(1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12)]

我想要

In [3]: some_method(a,b)
Out[3]: [[1, 7], [2, 8], [3, 9], [4, 10], [5, 11], [6, 12]]

我当时在考虑使用map而不是zip,但我不知道是否有一些标准库方法作为第一个参数。

我可以为此定义自己的功能,并使用map,我的问题是是否已经实现了某些功能。也是答案。

What is the fastest and most elegant way of doing list of lists from two lists?

I have

In [1]: a=[1,2,3,4,5,6]

In [2]: b=[7,8,9,10,11,12]

In [3]: zip(a,b)
Out[3]: [(1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12)]

And I’d like to have

In [3]: some_method(a,b)
Out[3]: [[1, 7], [2, 8], [3, 9], [4, 10], [5, 11], [6, 12]]

I was thinking about using map instead of zip, but I don’t know if there is some standard library method to put as a first argument.

I can def my own function for this, and use map, my question is if there is already implemented something. No is also an answer.


回答 0

如果您要压缩2个以上的列表(就此而言,甚至压缩2个),一种可读的方式将是:

[list(a) for a in zip([1,2,3], [4,5,6], [7,8,9])]

这使用列表推导并将列表(元组)中的每个元素转换为列表。

If you are zipping more than 2 lists (or even only 2, for that matter), a readable way would be:

[list(a) for a in zip([1,2,3], [4,5,6], [7,8,9])]

This uses list comprehensions and converts each element in the list (tuples) into lists.


回答 1

您自己几乎已经有了答案。不要使用map代替zip。使用map AND zip

您可以将地图和zip结合使用,以实现优雅,实用的方法:

list(map(list, zip(a, b)))

zip返回一个元组列表。map(list, [...])调用list列表中的每个元组。list(map([...])将地图对象变成可读列表。

You almost had the answer yourself. Don’t use map instead of zip. Use map AND zip.

You can use map along with zip for an elegant, functional approach:

list(map(list, zip(a, b)))

zip returns a list of tuples. map(list, [...]) calls list on each tuple in the list. list(map([...]) turns the map object into a readable list.


回答 2

我喜欢zip函数的优雅,但是在operator模块中使用itemgetter()函数似乎要快得多。我写了一个简单的脚本来测试:

import time
from operator import itemgetter

list1 = list()
list2 = list()
origlist = list()
for i in range (1,5000000):
        t = (i, 2*i)
        origlist.append(t)

print "Using zip"
starttime = time.time()
list1, list2 = map(list, zip(*origlist))
elapsed = time.time()-starttime
print elapsed

print "Using itemgetter"
starttime = time.time()
list1 = map(itemgetter(0),origlist)
list2 = map(itemgetter(1),origlist)
elapsed = time.time()-starttime
print elapsed

我期望zip可以更快,但是itemgetter方法远胜于此:

Using zip
6.1550450325
Using itemgetter
0.768098831177

I love the elegance of the zip function, but using the itemgetter() function in the operator module appears to be much faster. I wrote a simple script to test this:

import time
from operator import itemgetter

list1 = list()
list2 = list()
origlist = list()
for i in range (1,5000000):
        t = (i, 2*i)
        origlist.append(t)

print "Using zip"
starttime = time.time()
list1, list2 = map(list, zip(*origlist))
elapsed = time.time()-starttime
print elapsed

print "Using itemgetter"
starttime = time.time()
list1 = map(itemgetter(0),origlist)
list2 = map(itemgetter(1),origlist)
elapsed = time.time()-starttime
print elapsed

I expected zip to be faster, but the itemgetter method wins by a long shot:

Using zip
6.1550450325
Using itemgetter
0.768098831177

回答 3

我通常不喜欢使用lambda,但是…

>>> a = [1, 2, 3, 4, 5]
>>> b = [6, 7, 8, 9, 10]
>>> c = lambda a, b: [list(c) for c in zip(a, b)]
>>> c(a, b)
[[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]

如果您需要额外的速度,则地图会稍微快一些:

>>> d = lambda a, b: map(list, zip(a, b))
>>> d(a, b)
[[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]

但是,映射被认为是非Python的,仅应用于性能调整。

I generally don’t like using lambda, but…

>>> a = [1, 2, 3, 4, 5]
>>> b = [6, 7, 8, 9, 10]
>>> c = lambda a, b: [list(c) for c in zip(a, b)]
>>> c(a, b)
[[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]

If you need the extra speed, map is slightly faster:

>>> d = lambda a, b: map(list, zip(a, b))
>>> d(a, b)
[[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]

However, map is considered unpythonic and should only be used for performance tuning.


回答 4

这个怎么样?

>>> def list_(*args): return list(args)

>>> map(list_, range(5), range(9,4,-1))
[[0, 9], [1, 8], [2, 7], [3, 6], [4, 5]]

甚至更好:

>>> def zip_(*args): return map(list_, *args)
>>> zip_(range(5), range(9,4,-1))
[[0, 9], [1, 8], [2, 7], [3, 6], [4, 5]]

How about this?

>>> def list_(*args): return list(args)

>>> map(list_, range(5), range(9,4,-1))
[[0, 9], [1, 8], [2, 7], [3, 6], [4, 5]]

Or even better:

>>> def zip_(*args): return map(list_, *args)
>>> zip_(range(5), range(9,4,-1))
[[0, 9], [1, 8], [2, 7], [3, 6], [4, 5]]

回答 5

使用numpy

优雅的定义可能会令人质疑,但是,如果您要numpy创建数组并将其转换为列表(如果需要…),则可能非常实用,即使使用map函数或列表理解相比效率不高。

import numpy as np 
a = b = range(10)
zipped = zip(a,b)
result = np.array(zipped).tolist()
Out: [[0, 0],
 [1, 1],
 [2, 2],
 [3, 3],
 [4, 4],
 [5, 5],
 [6, 6],
 [7, 7],
 [8, 8],
 [9, 9]]

否则跳过该zip功能,您可以直接使用np.dstack

np.dstack((a,b))[0].tolist()

Using numpy

The definition of elegance can be quite questionable but if you are working with numpy the creation of an array and its conversion to list (if needed…) could be very practical even though not so efficient compared using the map function or the list comprehension.

import numpy as np 
a = b = range(10)
zipped = zip(a,b)
result = np.array(zipped).tolist()
Out: [[0, 0],
 [1, 1],
 [2, 2],
 [3, 3],
 [4, 4],
 [5, 5],
 [6, 6],
 [7, 7],
 [8, 8],
 [9, 9]]

Otherwise skipping the zip function you can use directly np.dstack:

np.dstack((a,b))[0].tolist()

回答 6

我想列表理解将是非常简单的解决方案。

a=[1,2,3,4,5,6]

b=[7,8,9,10,11,12]

x = [[i, j] for i, j in zip(a,b)]

print(x)

output : [[1, 7], [2, 8], [3, 9], [4, 10], [5, 11], [6, 12]]

List comprehension would be very simple solution I guess.

a=[1,2,3,4,5,6]

b=[7,8,9,10,11,12]

x = [[i, j] for i, j in zip(a,b)]

print(x)

output : [[1, 7], [2, 8], [3, 9], [4, 10], [5, 11], [6, 12]]

读取压缩文件作为Pandas DataFrame

问题:读取压缩文件作为Pandas DataFrame

我正在尝试解压缩csv文件并将其传递到熊猫中,以便我可以处理该文件。
到目前为止,我尝试过的代码是:

import requests, zipfile, StringIO
r = requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
crime2013 = pandas.read_csv(z.read('crime_incidents_2013_CSV.csv'))

在最后一行之后,尽管python能够获取文件,但在错误末尾出现“不存在”。

有人可以告诉我我做错了什么吗?

I’m trying to unzip a csv file and pass it into pandas so I can work on the file.
The code I have tried so far is:

import requests, zipfile, StringIO
r = requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
crime2013 = pandas.read_csv(z.read('crime_incidents_2013_CSV.csv'))

After the last line, although python is able to get the file, I get a “does not exist” at the end of the error.

Can someone tell me what I’m doing incorrectly?


回答 0

如果要将压缩文件或tar.gz文件读入pandas数据帧,则这些read_csv方法包括此特定实现。

df = pd.read_csv('filename.zip')

或长格式:

df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')

docs中压缩参数的说明:

压缩:{‘infer’,’gzip’,’bz2’,’zip’,’xz’,无},默认为’infer’用于对磁盘数据进行实时解压缩。如果’infer’和filepath_or_buffer与路径类似,则从以下扩展名检测压缩:’.gz’,’。bz2’,’。zip’或’.xz’(否则不进行解压缩)。如果使用“ zip”,则ZIP文件必须仅包含一个要读取的数据文件。设置为“无”将不进行解压缩。

0.18.1版中的新功能:支持“ zip”和“ xz”压缩。

If you want to read a zipped or a tar.gz file into pandas dataframe, the read_csv methods includes this particular implementation.

df = pd.read_csv('filename.zip')

Or the long form:

df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')

Description of the compression argument from the docs:

compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.

New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.


回答 1

我认为您想要openZipFile,它返回一个类似文件的对象,而不是read

In [11]: crime2013 = pd.read_csv(z.open('crime_incidents_2013_CSV.csv'))

In [12]: crime2013
Out[12]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 24567 entries, 0 to 24566
Data columns (total 15 columns):
CCN                            24567  non-null values
REPORTDATETIME                 24567  non-null values
SHIFT                          24567  non-null values
OFFENSE                        24567  non-null values
METHOD                         24567  non-null values
LASTMODIFIEDDATE               24567  non-null values
BLOCKSITEADDRESS               24567  non-null values
BLOCKXCOORD                    24567  non-null values
BLOCKYCOORD                    24567  non-null values
WARD                           24563  non-null values
ANC                            24567  non-null values
DISTRICT                       24567  non-null values
PSA                            24567  non-null values
NEIGHBORHOODCLUSTER            24263  non-null values
BUSINESSIMPROVEMENTDISTRICT    3613  non-null values
dtypes: float64(4), int64(1), object(10)

I think you want to open the ZipFile, which returns a file-like object, rather than read:

In [11]: crime2013 = pd.read_csv(z.open('crime_incidents_2013_CSV.csv'))

In [12]: crime2013
Out[12]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 24567 entries, 0 to 24566
Data columns (total 15 columns):
CCN                            24567  non-null values
REPORTDATETIME                 24567  non-null values
SHIFT                          24567  non-null values
OFFENSE                        24567  non-null values
METHOD                         24567  non-null values
LASTMODIFIEDDATE               24567  non-null values
BLOCKSITEADDRESS               24567  non-null values
BLOCKXCOORD                    24567  non-null values
BLOCKYCOORD                    24567  non-null values
WARD                           24563  non-null values
ANC                            24567  non-null values
DISTRICT                       24567  non-null values
PSA                            24567  non-null values
NEIGHBORHOODCLUSTER            24263  non-null values
BUSINESSIMPROVEMENTDISTRICT    3613  non-null values
dtypes: float64(4), int64(1), object(10)

回答 2

似乎您甚至不必再指定压缩。以下代码段将文件名中的数据加载到df中。

import pandas as pd
df = pd.read_csv('filename.zip')

(当然,如果它们与默认值不同,则需要指定分隔符,标头等。)

It seems you don’t even have to specify the compression any more. The following snippet loads the data from filename.zip into df.

import pandas as pd
df = pd.read_csv('filename.zip')

(Of course you will need to specify separator, header, etc. if they are different from the defaults.)


回答 3

对于“ zip ”文件,您可以使用import zipfile并且您的代码将仅通过以下几行工作:

import zipfile
import pandas as pd
with zipfile.ZipFile("Crime_Incidents_in_2013.zip") as z:
   with z.open("Crime_Incidents_in_2013.csv") as f:
      train = pd.read_csv(f, header=0, delimiter="\t")
      print(train.head())    # print the first 5 rows

结果将是:

X,Y,CCN,REPORT_DAT,SHIFT,METHOD,OFFENSE,BLOCK,XBLOCK,YBLOCK,WARD,ANC,DISTRICT,PSA,NEIGHBORHOOD_CLUSTER,BLOCK_GROUP,CENSUS_TRACT,VOTING_PRECINCT,XCOORD,YCOORD,LATITUDE,LONGITUDE,BID,START_DATE,END_DATE,OBJECTID
0  -77.054968548763071,38.899775938598317,0925135...                                                                                                                                                               
1  -76.967309569035052,38.872119553647011,1003352...                                                                                                                                                               
2  -76.996184958456539,38.927921847721443,1101010...                                                                                                                                                               
3  -76.943077541353617,38.883686046653935,1104551...                                                                                                                                                               
4  -76.939209158039446,38.892278093281632,1125028...

For “zip” files, you can use import zipfile and your code will be working simply with these lines:

import zipfile
import pandas as pd
with zipfile.ZipFile("Crime_Incidents_in_2013.zip") as z:
   with z.open("Crime_Incidents_in_2013.csv") as f:
      train = pd.read_csv(f, header=0, delimiter="\t")
      print(train.head())    # print the first 5 rows

And the result will be:

X,Y,CCN,REPORT_DAT,SHIFT,METHOD,OFFENSE,BLOCK,XBLOCK,YBLOCK,WARD,ANC,DISTRICT,PSA,NEIGHBORHOOD_CLUSTER,BLOCK_GROUP,CENSUS_TRACT,VOTING_PRECINCT,XCOORD,YCOORD,LATITUDE,LONGITUDE,BID,START_DATE,END_DATE,OBJECTID
0  -77.054968548763071,38.899775938598317,0925135...                                                                                                                                                               
1  -76.967309569035052,38.872119553647011,1003352...                                                                                                                                                               
2  -76.996184958456539,38.927921847721443,1101010...                                                                                                                                                               
3  -76.943077541353617,38.883686046653935,1104551...                                                                                                                                                               
4  -76.939209158039446,38.892278093281632,1125028...

回答 4

https://www.kaggle.com/jboysen/quick-gz-pandas-tutorial

请点击此链接。

import pandas as pd
traffic_station_df = pd.read_csv('C:\\Folders\\Jupiter_Feed.txt.gz', compression='gzip',
                                 header=1, sep='\t', quotechar='"')

#traffic_station_df['Address'] = 'address'

#traffic_station_df.append(traffic_station_df)
print(traffic_station_df)

https://www.kaggle.com/jboysen/quick-gz-pandas-tutorial

Please follow this link.

import pandas as pd
traffic_station_df = pd.read_csv('C:\\Folders\\Jupiter_Feed.txt.gz', compression='gzip',
                                 header=1, sep='\t', quotechar='"')

#traffic_station_df['Address'] = 'address'

#traffic_station_df.append(traffic_station_df)
print(traffic_station_df)

如何使用Python创建完整的压缩tar文件?

问题:如何使用Python创建完整的压缩tar文件?

如何使用Python压缩创建.tar.gz文件?

How can I create a .tar.gz file with compression in Python?


回答 0

为整个目录树构建一个.tar.gz(aka .tgz):

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

这将创建一个压缩的tar归档文件,其中包含一个名称和内容与相同的单个顶级文件夹source_dir

To build a .tar.gz (aka .tgz) for an entire directory tree:

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.


回答 1

import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

如果要创建tar.bz2压缩文件,只需将文件扩展名替换为“ .tar.bz2”,将“ w:gz”替换为“ w:bz2”。

import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

If you want to create a tar.bz2 compressed file, just replace file extension name with “.tar.bz2” and “w:gz” with “w:bz2”.


回答 2

你叫tarfile.openmode='w:gz',意为“开放的gzip压缩的写作。”

您可能希望以结束文件名(的name参数open.tar.gz,但这不会影响压缩功能。

顺便说一句,通常您可以使用的方式获得更好的压缩效果'w:bz2',就像tar通常使用bzip2时可以压缩甚至比使用时可以压缩得更好gzip

You call tarfile.open with mode='w:gz', meaning “Open for gzip compressed writing.”

You’ll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn’t affect compression abilities.

BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.


回答 3

先前的答案建议使用tarfilePython模块.tar.gz在Python中创建文件。这显然是一个不错的Python风格的解决方案,但是它在归档速度方面存在严重缺陷。这个问题提到的tarfile速度大约是tar Linux中实用程序。根据我的经验,这一估计是非常正确的。

因此,为了加快归档速度,可以使用tarusing subprocess模块命令:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])

Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That’s obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.

So for faster archiving you can use the tar command using subprocess module:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])

回答 4

在此tar.gz文件中,在打开的视图目录中压缩要解决,请使用os.path.basename(file_directory)

with tarfile.open("save.tar.gz","w:gz"):
      for file in ["a.txt","b.log","c.png"]:
           tar.add(os.path.basename(file))

它在tar.gz文件中的使用压缩在目录中

In this tar.gz file compress in open view directory In solve use os.path.basename(file_directory)

with tarfile.open("save.tar.gz","w:gz"):
      for file in ["a.txt","b.log","c.png"]:
           tar.add(os.path.basename(file))

its use in tar.gz file compress in directory


回答 5

除了@Aleksandr Tukallo的答案外,您还可以获得输出和错误消息(如果发生)。tar以下答案中很好地解释了使用压缩文件夹。

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       

In addition to @Aleksandr Tukallo’s answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar is explained pretty well on the following answer.

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       

是否有一个类似zip的函数可以在Python中填充最长的长度?

问题:是否有一个类似zip的函数可以在Python中填充最长的长度?

是否有一个内置功能可以像这样工作,zip()但是会填充结果,以便结果列表的长度是最长输入的长度而不是最短输入的长度

>>> a = ['a1']
>>> b = ['b1', 'b2', 'b3']
>>> c = ['c1', 'c2']

>>> zip(a, b, c)
[('a1', 'b1', 'c1')]

>>> What command goes here?
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

Is there a built-in function that works like zip() but that will pad the results so that the length of the resultant list is the length of the longest input rather than the shortest input?

>>> a = ['a1']
>>> b = ['b1', 'b2', 'b3']
>>> c = ['c1', 'c2']

>>> zip(a, b, c)
[('a1', 'b1', 'c1')]

>>> What command goes here?
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 0

在Python 3中,您可以使用 itertools.zip_longest

>>> list(itertools.zip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

您可以None使用与fillvalue参数不同的值进行填充:

>>> list(itertools.zip_longest(a, b, c, fillvalue='foo'))
[('a1', 'b1', 'c1'), ('foo', 'b2', 'c2'), ('foo', 'b3', 'foo')]

使用Python 2,你既可以使用itertools.izip_longest(Python的2.6+),也可以使用mapNone。这是的鲜为人知的功能map(但map在Python 3.x中有所更改,因此仅在Python 2.x中有效)。

>>> map(None, a, b, c)
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

In Python 3 you can use itertools.zip_longest

>>> list(itertools.zip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

You can pad with a different value than None by using the fillvalue parameter:

>>> list(itertools.zip_longest(a, b, c, fillvalue='foo'))
[('a1', 'b1', 'c1'), ('foo', 'b2', 'c2'), ('foo', 'b3', 'foo')]

With Python 2 you can either use itertools.izip_longest (Python 2.6+), or you can use map with None. It is a little known feature of map (but map changed in Python 3.x, so this only works in Python 2.x).

>>> map(None, a, b, c)
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 1

对于Python 2.6x,请使用itertools模块的izip_longest

对于Python 3,请zip_longest改用(不加i)。

>>> list(itertools.izip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

For Python 2.6x use itertools module’s izip_longest.

For Python 3 use zip_longest instead (no leading i).

>>> list(itertools.izip_longest(a, b, c))
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]

回答 2

非itertools Python 3解决方案:

def zip_longest(*lists):
    def g(l):
        for item in l:
            yield item
        while True:
            yield None
    gens = [g(l) for l in lists]    
    for _ in range(max(map(len, lists))):
        yield tuple(next(g) for g in gens)

non itertools Python 3 solution:

def zip_longest(*lists):
    def g(l):
        for item in l:
            yield item
        while True:
            yield None
    gens = [g(l) for l in lists]    
    for _ in range(max(map(len, lists))):
        yield tuple(next(g) for g in gens)

回答 3

non itertools我的Python 2解决方案:

if len(list1) < len(list2):
    list1.extend([None] * (len(list2) - len(list1)))
else:
    list2.extend([None] * (len(list1) - len(list2)))

non itertools My Python 2 solution:

if len(list1) < len(list2):
    list1.extend([None] * (len(list2) - len(list1)))
else:
    list2.extend([None] * (len(list1) - len(list2)))

回答 4

我使用2d数组,但是使用python 2.x的概念相似:

if len(set([len(p) for p in printer])) > 1:
    printer = [column+['']*(max([len(p) for p in printer])-len(column)) for column in printer]

Im using a 2d array but the concept is the similar using python 2.x:

if len(set([len(p) for p in printer])) > 1:
    printer = [column+['']*(max([len(p) for p in printer])-len(column)) for column in printer]

如何在Python中创建目录的zip存档?

问题:如何在Python中创建目录的zip存档?

如何在Python中创建目录结构的zip存档?

How can I create a zip archive of a directory structure in Python?


回答 0

正如其他人指出的那样,您应该使用zipfile。该文档告诉您可用的功能,但并未真正说明如何使用它们来压缩整个目录。我认为用一些示例代码来解释是最简单的:

#!/usr/bin/env python
import os
import zipfile

def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))

if __name__ == '__main__':
    zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
    zipdir('tmp/', zipf)
    zipf.close()

改编自:http : //www.devshed.com/c/a/Python/Python-UnZipped/

As others have pointed out, you should use zipfile. The documentation tells you what functions are available, but doesn’t really explain how you can use them to zip an entire directory. I think it’s easiest to explain with some example code:

#!/usr/bin/env python
import os
import zipfile

def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))

if __name__ == '__main__':
    zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
    zipdir('tmp/', zipf)
    zipf.close()

Adapted from: http://www.devshed.com/c/a/Python/Python-UnZipped/


回答 1

最简单的方法是使用shutil.make_archive。它支持zip和tar格式。

import shutil
shutil.make_archive(output_filename, 'zip', dir_name)

如果您需要做的事情比压缩整个目录还要复杂(例如跳过某些文件),那么您将需要zipfile按照其他人的建议深入研究该模块。

The easiest way is to use shutil.make_archive. It supports both zip and tar formats.

import shutil
shutil.make_archive(output_filename, 'zip', dir_name)

If you need to do something more complicated than zipping the whole directory (such as skipping certain files), then you’ll need to dig into the zipfile module as others have suggested.


回答 2

要将内容添加mydirectory到新的zip文件中,包括所有文件和子目录:

import os
import zipfile

zf = zipfile.ZipFile("myzipfile.zip", "w")
for dirname, subdirs, files in os.walk("mydirectory"):
    zf.write(dirname)
    for filename in files:
        zf.write(os.path.join(dirname, filename))
zf.close()

To add the contents of mydirectory to a new zip file, including all files and subdirectories:

import os
import zipfile

zf = zipfile.ZipFile("myzipfile.zip", "w")
for dirname, subdirs, files in os.walk("mydirectory"):
    zf.write(dirname)
    for filename in files:
        zf.write(os.path.join(dirname, filename))
zf.close()

回答 3

如何在Python中创建目录结构的zip存档?

在Python脚本中

在Python 2.7+中,shutil具有make_archive功能。

from shutil import make_archive
make_archive(
  'zipfile_name', 
  'zip',           # the archive format - or tar, bztar, gztar 
  root_dir=None,   # root for archive - current working dir if None
  base_dir=None)   # start archiving from here - cwd if None too

此处的压缩存档将命名为zipfile_name.zip。如果base_dir距离较远root_dir,它将排除不在中的文件base_dir,但仍将文件归档在父目录中,直到root_dir

我在使用2.7的Cygwin上测试时确实遇到了问题-它需要一个root_dir参数,用于cwd:

make_archive('zipfile_name', 'zip', root_dir='.')

从外壳使用Python

您还可以使用以下zipfile模块从外壳使用Python :

$ python -m zipfile -c zipname sourcedir

zipname您想要的目标文件的名称在哪里(.zip如果需要,可以添加,它将不会自动添加),而sourcedir是目录的路径。

压缩Python(或者只是不希望父目录):

如果你想拉上一个Python包用__init__.py__main__.py,和你不想要的父目录,它是

$ python -m zipfile -c zipname sourcedir/*

$ python zipname

将运行该软件包。(请注意,您不能将子包作为压缩存档的入口点运行。)

压缩Python应用程式:

如果您拥有python3.5 +,并且特别想压缩一个Python包,请使用zipapp

$ python -m zipapp myapp
$ python myapp.pyz

How can I create a zip archive of a directory structure in Python?

In a Python script

In Python 2.7+, shutil has a make_archive function.

from shutil import make_archive
make_archive(
  'zipfile_name', 
  'zip',           # the archive format - or tar, bztar, gztar 
  root_dir=None,   # root for archive - current working dir if None
  base_dir=None)   # start archiving from here - cwd if None too

Here the zipped archive will be named zipfile_name.zip. If base_dir is farther down from root_dir it will exclude files not in the base_dir, but still archive the files in the parent dirs up to the root_dir.

I did have an issue testing this on Cygwin with 2.7 – it wants a root_dir argument, for cwd:

make_archive('zipfile_name', 'zip', root_dir='.')

Using Python from the shell

You can do this with Python from the shell also using the zipfile module:

$ python -m zipfile -c zipname sourcedir

Where zipname is the name of the destination file you want (add .zip if you want it, it won’t do it automatically) and sourcedir is the path to the directory.

Zipping up Python (or just don’t want parent dir):

If you’re trying to zip up a python package with a __init__.py and __main__.py, and you don’t want the parent dir, it’s

$ python -m zipfile -c zipname sourcedir/*

And

$ python zipname

would run the package. (Note that you can’t run subpackages as the entry point from a zipped archive.)

Zipping a Python app:

If you have python3.5+, and specifically want to zip up a Python package, use zipapp:

$ python -m zipapp myapp
$ python myapp.pyz

回答 4

此功能将递归压缩目录树,压缩文件,并在存档中记录正确的相对文件名。存档条目与生成的条目相同zip -r output.zip source_dir

import os
import zipfile
def make_zipfile(output_filename, source_dir):
    relroot = os.path.abspath(os.path.join(source_dir, os.pardir))
    with zipfile.ZipFile(output_filename, "w", zipfile.ZIP_DEFLATED) as zip:
        for root, dirs, files in os.walk(source_dir):
            # add directory (needed for empty dirs)
            zip.write(root, os.path.relpath(root, relroot))
            for file in files:
                filename = os.path.join(root, file)
                if os.path.isfile(filename): # regular files only
                    arcname = os.path.join(os.path.relpath(root, relroot), file)
                    zip.write(filename, arcname)

This function will recursively zip up a directory tree, compressing the files, and recording the correct relative filenames in the archive. The archive entries are the same as those generated by zip -r output.zip source_dir.

import os
import zipfile
def make_zipfile(output_filename, source_dir):
    relroot = os.path.abspath(os.path.join(source_dir, os.pardir))
    with zipfile.ZipFile(output_filename, "w", zipfile.ZIP_DEFLATED) as zip:
        for root, dirs, files in os.walk(source_dir):
            # add directory (needed for empty dirs)
            zip.write(root, os.path.relpath(root, relroot))
            for file in files:
                filename = os.path.join(root, file)
                if os.path.isfile(filename): # regular files only
                    arcname = os.path.join(os.path.relpath(root, relroot), file)
                    zip.write(filename, arcname)

回答 5

使用shutil,它是python标准库集的一部分。使用shutil非常简单(请参见下面的代码):

  • 第一个参数:生成的zip / tar文件的文件名,
  • 第二个参数:zip / tar,
  • 第三个参数:dir_name

码:

import shutil
shutil.make_archive('/home/user/Desktop/Filename','zip','/home/username/Desktop/Directory')

Use shutil, which is part of python standard library set. Using shutil is so simple(see code below):

  • 1st arg: Filename of resultant zip/tar file,
  • 2nd arg: zip/tar,
  • 3rd arg: dir_name

Code:

import shutil
shutil.make_archive('/home/user/Desktop/Filename','zip','/home/username/Desktop/Directory')

回答 6

要将压缩添加到生成的zip文件中,请查看此链接

您需要更改:

zip = zipfile.ZipFile('Python.zip', 'w')

zip = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)

For adding compression to the resulting zip file, check out this link.

You need to change:

zip = zipfile.ZipFile('Python.zip', 'w')

to

zip = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)

回答 7

我对Mark Byers给出的代码进行了一些更改。如果有空目录,下面的函数还会添加空目录。通过示例可以更清楚地了解添加到zip的路径是什么。

#!/usr/bin/env python
import os
import zipfile

def addDirToZip(zipHandle, path, basePath=""):
    """
    Adding directory given by \a path to opened zip file \a zipHandle

    @param basePath path that will be removed from \a path when adding to archive

    Examples:
        # add whole "dir" to "test.zip" (when you open "test.zip" you will see only "dir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir')
        zipHandle.close()

        # add contents of "dir" to "test.zip" (when you open "test.zip" you will see only it's contents)
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir', 'dir')
        zipHandle.close()

        # add contents of "dir/subdir" to "test.zip" (when you open "test.zip" you will see only contents of "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir', 'dir/subdir')
        zipHandle.close()

        # add whole "dir/subdir" to "test.zip" (when you open "test.zip" you will see only "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir', 'dir')
        zipHandle.close()

        # add whole "dir/subdir" with full path to "test.zip" (when you open "test.zip" you will see only "dir" and inside it only "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir')
        zipHandle.close()

        # add whole "dir" and "otherDir" (with full path) to "test.zip" (when you open "test.zip" you will see only "dir" and "otherDir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir')
        addDirToZip(zipHandle, 'otherDir')
        zipHandle.close()
    """
    basePath = basePath.rstrip("\\/") + ""
    basePath = basePath.rstrip("\\/")
    for root, dirs, files in os.walk(path):
        # add dir itself (needed for empty dirs
        zipHandle.write(os.path.join(root, "."))
        # add files
        for file in files:
            filePath = os.path.join(root, file)
            inZipPath = filePath.replace(basePath, "", 1).lstrip("\\/")
            #print filePath + " , " + inZipPath
            zipHandle.write(filePath, inZipPath)

上面是一个简单函数,适用于简单情况。您可以在我的Gist中找到更优雅的类:https : //gist.github.com/Eccenux/17526123107ca0ac28e6

I’ve made some changes to code given by Mark Byers. Below function will also adds empty directories if you have them. Examples should make it more clear what is the path added to the zip.

#!/usr/bin/env python
import os
import zipfile

def addDirToZip(zipHandle, path, basePath=""):
    """
    Adding directory given by \a path to opened zip file \a zipHandle

    @param basePath path that will be removed from \a path when adding to archive

    Examples:
        # add whole "dir" to "test.zip" (when you open "test.zip" you will see only "dir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir')
        zipHandle.close()

        # add contents of "dir" to "test.zip" (when you open "test.zip" you will see only it's contents)
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir', 'dir')
        zipHandle.close()

        # add contents of "dir/subdir" to "test.zip" (when you open "test.zip" you will see only contents of "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir', 'dir/subdir')
        zipHandle.close()

        # add whole "dir/subdir" to "test.zip" (when you open "test.zip" you will see only "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir', 'dir')
        zipHandle.close()

        # add whole "dir/subdir" with full path to "test.zip" (when you open "test.zip" you will see only "dir" and inside it only "subdir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir/subdir')
        zipHandle.close()

        # add whole "dir" and "otherDir" (with full path) to "test.zip" (when you open "test.zip" you will see only "dir" and "otherDir")
        zipHandle = zipfile.ZipFile('test.zip', 'w')
        addDirToZip(zipHandle, 'dir')
        addDirToZip(zipHandle, 'otherDir')
        zipHandle.close()
    """
    basePath = basePath.rstrip("\\/") + ""
    basePath = basePath.rstrip("\\/")
    for root, dirs, files in os.walk(path):
        # add dir itself (needed for empty dirs
        zipHandle.write(os.path.join(root, "."))
        # add files
        for file in files:
            filePath = os.path.join(root, file)
            inZipPath = filePath.replace(basePath, "", 1).lstrip("\\/")
            #print filePath + " , " + inZipPath
            zipHandle.write(filePath, inZipPath)

Above is a simple function that should work for simple cases. You can find more elegant class in my Gist: https://gist.github.com/Eccenux/17526123107ca0ac28e6


回答 8

现代Python(3.6+)使用该pathlib模块进行类似于OOP的简洁路径处理和pathlib.Path.rglob()递归glob。据我所知,这相当于George V. Reilly的答案:压缩压缩,最上面的元素是目录,保留空目录,使用相对路径。

from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile

from os import PathLike
from typing import Union


def zip_dir(zip_name: str, source_dir: Union[str, PathLike]):
    src_path = Path(source_dir).expanduser().resolve(strict=True)
    with ZipFile(zip_name, 'w', ZIP_DEFLATED) as zf:
        for file in src_path.rglob('*'):
            zf.write(file, file.relative_to(src_path.parent))

注意:如可选类型提示所指示,zip_name不能是Path对象(将在3.6.2+中修复)。

Modern Python (3.6+) using the pathlib module for concise OOP-like handling of paths, and pathlib.Path.rglob() for recursive globbing. As far as I can tell, this is equivalent to George V. Reilly’s answer: zips with compression, the topmost element is a directory, keeps empty dirs, uses relative paths.

from pathlib import Path
from zipfile import ZIP_DEFLATED, ZipFile

from os import PathLike
from typing import Union


def zip_dir(zip_name: str, source_dir: Union[str, PathLike]):
    src_path = Path(source_dir).expanduser().resolve(strict=True)
    with ZipFile(zip_name, 'w', ZIP_DEFLATED) as zf:
        for file in src_path.rglob('*'):
            zf.write(file, file.relative_to(src_path.parent))

Note: as optional type hints indicate, zip_name can’t be a Path object (would be fixed in 3.6.2+).


回答 9

我有另一个使用python3,pathlib和zipfile可能会有所帮助的代码示例。它应该可以在任何操作系统上运行。

from pathlib import Path
import zipfile
from datetime import datetime

DATE_FORMAT = '%y%m%d'


def date_str():
    """returns the today string year, month, day"""
    return '{}'.format(datetime.now().strftime(DATE_FORMAT))


def zip_name(path):
    """returns the zip filename as string"""
    cur_dir = Path(path).resolve()
    parent_dir = cur_dir.parents[0]
    zip_filename = '{}/{}_{}.zip'.format(parent_dir, cur_dir.name, date_str())
    p_zip = Path(zip_filename)
    n = 1
    while p_zip.exists():
        zip_filename = ('{}/{}_{}_{}.zip'.format(parent_dir, cur_dir.name,
                                             date_str(), n))
        p_zip = Path(zip_filename)
        n += 1
    return zip_filename


def all_files(path):
    """iterator returns all files and folders from path as absolute path string
    """
    for child in Path(path).iterdir():
        yield str(child)
        if child.is_dir():
            for grand_child in all_files(str(child)):
                yield str(Path(grand_child))


def zip_dir(path):
    """generate a zip"""
    zip_filename = zip_name(path)
    zip_file = zipfile.ZipFile(zip_filename, 'w')
    print('create:', zip_filename)
    for file in all_files(path):
        print('adding... ', file)
        zip_file.write(file)
    zip_file.close()


if __name__ == '__main__':
    zip_dir('.')
    print('end!')

I have another code example that may help, using python3, pathlib and zipfile. It should work in any OS.

from pathlib import Path
import zipfile
from datetime import datetime

DATE_FORMAT = '%y%m%d'


def date_str():
    """returns the today string year, month, day"""
    return '{}'.format(datetime.now().strftime(DATE_FORMAT))


def zip_name(path):
    """returns the zip filename as string"""
    cur_dir = Path(path).resolve()
    parent_dir = cur_dir.parents[0]
    zip_filename = '{}/{}_{}.zip'.format(parent_dir, cur_dir.name, date_str())
    p_zip = Path(zip_filename)
    n = 1
    while p_zip.exists():
        zip_filename = ('{}/{}_{}_{}.zip'.format(parent_dir, cur_dir.name,
                                             date_str(), n))
        p_zip = Path(zip_filename)
        n += 1
    return zip_filename


def all_files(path):
    """iterator returns all files and folders from path as absolute path string
    """
    for child in Path(path).iterdir():
        yield str(child)
        if child.is_dir():
            for grand_child in all_files(str(child)):
                yield str(Path(grand_child))


def zip_dir(path):
    """generate a zip"""
    zip_filename = zip_name(path)
    zip_file = zipfile.ZipFile(zip_filename, 'w')
    print('create:', zip_filename)
    for file in all_files(path):
        print('adding... ', file)
        zip_file.write(file)
    zip_file.close()


if __name__ == '__main__':
    zip_dir('.')
    print('end!')

回答 10

您可能想看一下zipfile模块;在http://docs.python.org/library/zipfile.html上有文档。

您可能还想os.walk()索引目录结构。

You probably want to look at the zipfile module; there’s documentation at http://docs.python.org/library/zipfile.html.

You may also want os.walk() to index the directory structure.


回答 11

这是Nux给出的答案的变体,它对我有用:

def WriteDirectoryToZipFile( zipHandle, srcPath, zipLocalPath = "", zipOperation = zipfile.ZIP_DEFLATED ):
    basePath = os.path.split( srcPath )[ 0 ]
    for root, dirs, files in os.walk( srcPath ):
        p = os.path.join( zipLocalPath, root [ ( len( basePath ) + 1 ) : ] )
        # add dir
        zipHandle.write( root, p, zipOperation )
        # add files
        for f in files:
            filePath = os.path.join( root, f )
            fileInZipPath = os.path.join( p, f )
            zipHandle.write( filePath, fileInZipPath, zipOperation )

Here is a variation on the answer given by Nux that works for me:

def WriteDirectoryToZipFile( zipHandle, srcPath, zipLocalPath = "", zipOperation = zipfile.ZIP_DEFLATED ):
    basePath = os.path.split( srcPath )[ 0 ]
    for root, dirs, files in os.walk( srcPath ):
        p = os.path.join( zipLocalPath, root [ ( len( basePath ) + 1 ) : ] )
        # add dir
        zipHandle.write( root, p, zipOperation )
        # add files
        for f in files:
            filePath = os.path.join( root, f )
            fileInZipPath = os.path.join( p, f )
            zipHandle.write( filePath, fileInZipPath, zipOperation )

回答 12

试试下面的一个对我有用

import zipfile, os
zipf = "compress.zip"  
def main():
    directory = r"Filepath"
    toZip(directory)
def toZip(directory):
    zippedHelp = zipfile.ZipFile(zipf, "w", compression=zipfile.ZIP_DEFLATED )

    list = os.listdir(directory)
    for file_list in list:
        file_name = os.path.join(directory,file_list)

        if os.path.isfile(file_name):
            print file_name
            zippedHelp.write(file_name)
        else:
            addFolderToZip(zippedHelp,file_list,directory)
            print "---------------Directory Found-----------------------"
    zippedHelp.close()

def addFolderToZip(zippedHelp,folder,directory):
    path=os.path.join(directory,folder)
    print path
    file_list=os.listdir(path)
    for file_name in file_list:
        file_path=os.path.join(path,file_name)
        if os.path.isfile(file_path):
            zippedHelp.write(file_path)
        elif os.path.isdir(file_name):
            print "------------------sub directory found--------------------"
            addFolderToZip(zippedHelp,file_name,path)


if __name__=="__main__":
    main()

Try the below one .it worked for me.

import zipfile, os
zipf = "compress.zip"  
def main():
    directory = r"Filepath"
    toZip(directory)
def toZip(directory):
    zippedHelp = zipfile.ZipFile(zipf, "w", compression=zipfile.ZIP_DEFLATED )

    list = os.listdir(directory)
    for file_list in list:
        file_name = os.path.join(directory,file_list)

        if os.path.isfile(file_name):
            print file_name
            zippedHelp.write(file_name)
        else:
            addFolderToZip(zippedHelp,file_list,directory)
            print "---------------Directory Found-----------------------"
    zippedHelp.close()

def addFolderToZip(zippedHelp,folder,directory):
    path=os.path.join(directory,folder)
    print path
    file_list=os.listdir(path)
    for file_name in file_list:
        file_path=os.path.join(path,file_name)
        if os.path.isfile(file_path):
            zippedHelp.write(file_path)
        elif os.path.isdir(file_name):
            print "------------------sub directory found--------------------"
            addFolderToZip(zippedHelp,file_name,path)


if __name__=="__main__":
    main()

回答 13

如果要使用任何通用图形文件管理器的compress文件夹之类的功能,则可以使用以下代码,它使用zipfile模块。使用此代码,您将获得带有路径的zip文件作为其根文件夹。

import os
import zipfile

def zipdir(path, ziph):
    # Iterate all the directories and files
    for root, dirs, files in os.walk(path):
        # Create a prefix variable with the folder structure inside the path folder. 
        # So if a file is at the path directory will be at the root directory of the zip file
        # so the prefix will be empty. If the file belongs to a containing folder of path folder 
        # then the prefix will be that folder.
        if root.replace(path,'') == '':
                prefix = ''
        else:
                # Keep the folder structure after the path folder, append a '/' at the end 
                # and remome the first character, if it is a '/' in order to have a path like 
                # folder1/folder2/file.txt
                prefix = root.replace(path, '') + '/'
                if (prefix[0] == '/'):
                        prefix = prefix[1:]
        for filename in files:
                actual_file_path = root + '/' + filename
                zipped_file_path = prefix + filename
                zipf.write( actual_file_path, zipped_file_path)


zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('/tmp/justtest/', zipf)
zipf.close()

If you want a functionality like the compress folder of any common graphical file manager you can use the following code, it uses the zipfile module. Using this code you will have the zip file with the path as its root folder.

import os
import zipfile

def zipdir(path, ziph):
    # Iterate all the directories and files
    for root, dirs, files in os.walk(path):
        # Create a prefix variable with the folder structure inside the path folder. 
        # So if a file is at the path directory will be at the root directory of the zip file
        # so the prefix will be empty. If the file belongs to a containing folder of path folder 
        # then the prefix will be that folder.
        if root.replace(path,'') == '':
                prefix = ''
        else:
                # Keep the folder structure after the path folder, append a '/' at the end 
                # and remome the first character, if it is a '/' in order to have a path like 
                # folder1/folder2/file.txt
                prefix = root.replace(path, '') + '/'
                if (prefix[0] == '/'):
                        prefix = prefix[1:]
        for filename in files:
                actual_file_path = root + '/' + filename
                zipped_file_path = prefix + filename
                zipf.write( actual_file_path, zipped_file_path)


zipf = zipfile.ZipFile('Python.zip', 'w', zipfile.ZIP_DEFLATED)
zipdir('/tmp/justtest/', zipf)
zipf.close()

回答 14

为了提供更大的灵活性,例如,按名称选择目录/文件,请使用:

import os
import zipfile

def zipall(ob, path, rel=""):
    basename = os.path.basename(path)
    if os.path.isdir(path):
        if rel == "":
            rel = basename
        ob.write(path, os.path.join(rel))
        for root, dirs, files in os.walk(path):
            for d in dirs:
                zipall(ob, os.path.join(root, d), os.path.join(rel, d))
            for f in files:
                ob.write(os.path.join(root, f), os.path.join(rel, f))
            break
    elif os.path.isfile(path):
        ob.write(path, os.path.join(rel, basename))
    else:
        pass

对于文件树:

.
├── dir
   ├── dir2
      └── file2.txt
   ├── dir3
      └── file3.txt
   └── file.txt
├── dir4
   ├── dir5
   └── file4.txt
├── listdir.zip
├── main.py
├── root.txt
└── selective.zip

您可以例如仅选择dir4root.txt

cwd = os.getcwd()
files = [os.path.join(cwd, f) for f in ['dir4', 'root.txt']]

with zipfile.ZipFile("selective.zip", "w" ) as myzip:
    for f in files:
        zipall(myzip, f)

或者只是listdir在脚本调用目录中,然后从此处添加所有内容:

with zipfile.ZipFile("listdir.zip", "w" ) as myzip:
    for f in os.listdir():
        if f == "listdir.zip":
            # Creating a listdir.zip in the same directory
            # will include listdir.zip inside itself, beware of this
            continue
        zipall(myzip, f)

To give more flexibility, e.g. select directory/file by name use:

import os
import zipfile

def zipall(ob, path, rel=""):
    basename = os.path.basename(path)
    if os.path.isdir(path):
        if rel == "":
            rel = basename
        ob.write(path, os.path.join(rel))
        for root, dirs, files in os.walk(path):
            for d in dirs:
                zipall(ob, os.path.join(root, d), os.path.join(rel, d))
            for f in files:
                ob.write(os.path.join(root, f), os.path.join(rel, f))
            break
    elif os.path.isfile(path):
        ob.write(path, os.path.join(rel, basename))
    else:
        pass

For a file tree:

.
├── dir
│   ├── dir2
│   │   └── file2.txt
│   ├── dir3
│   │   └── file3.txt
│   └── file.txt
├── dir4
│   ├── dir5
│   └── file4.txt
├── listdir.zip
├── main.py
├── root.txt
└── selective.zip

You can e.g. select only dir4 and root.txt:

cwd = os.getcwd()
files = [os.path.join(cwd, f) for f in ['dir4', 'root.txt']]

with zipfile.ZipFile("selective.zip", "w" ) as myzip:
    for f in files:
        zipall(myzip, f)

Or just listdir in script invocation directory and add everything from there:

with zipfile.ZipFile("listdir.zip", "w" ) as myzip:
    for f in os.listdir():
        if f == "listdir.zip":
            # Creating a listdir.zip in the same directory
            # will include listdir.zip inside itself, beware of this
            continue
        zipall(myzip, f)

回答 15

假设您要压缩当前目录中的所有文件夹(子目录)。

for root, dirs, files in os.walk("."):
    for sub_dir in dirs:
        zip_you_want = sub_dir+".zip"
        zip_process = zipfile.ZipFile(zip_you_want, "w", zipfile.ZIP_DEFLATED)
        zip_process.write(file_you_want_to_include)
        zip_process.close()

        print("Successfully zipped directory: {sub_dir}".format(sub_dir=sub_dir))

Say you want to Zip all the folders(sub directories) in the current directory.

for root, dirs, files in os.walk("."):
    for sub_dir in dirs:
        zip_you_want = sub_dir+".zip"
        zip_process = zipfile.ZipFile(zip_you_want, "w", zipfile.ZIP_DEFLATED)
        zip_process.write(file_you_want_to_include)
        zip_process.close()

        print("Successfully zipped directory: {sub_dir}".format(sub_dir=sub_dir))

回答 16

为了将文件夹层次结构保留在要归档的父目录下的简洁方法:

import glob
import zipfile

with zipfile.ZipFile(fp_zip, "w", zipfile.ZIP_DEFLATED) as zipf:
    for fp in glob(os.path.join(parent, "**/*")):
        base = os.path.commonpath([parent, fp])
        zipf.write(fp, arcname=fp.replace(base, ""))

如果需要,可以将其更改为pathlib 用于文件globbing

For a concise way to retain the folder hierarchy under the parent directory to be archived:

import glob
import zipfile

with zipfile.ZipFile(fp_zip, "w", zipfile.ZIP_DEFLATED) as zipf:
    for fp in glob(os.path.join(parent, "**/*")):
        base = os.path.commonpath([parent, fp])
        zipf.write(fp, arcname=fp.replace(base, ""))

If you want, you could change this to use pathlib for file globbing.


回答 17

这里有这么多答案,我希望我可以为自己的版本做出贡献,该版本基于原始答案(顺便说一句),但具有更多图形化的视角,还为每个zipfile设置和排序使用了上下文os.walk(),以便获得有序输出。

具有这些文件夹及其文件(以及其他文件夹),我想.zip为每个cap_文件夹创建一个:

$ tree -d
.
├── cap_01
|    ├── 0101000001.json
|    ├── 0101000002.json
|    ├── 0101000003.json
|
├── cap_02
|    ├── 0201000001.json
|    ├── 0201000002.json
|    ├── 0201001003.json
|
├── cap_03
|    ├── 0301000001.json
|    ├── 0301000002.json
|    ├── 0301000003.json
| 
├── docs
|    ├── map.txt
|    ├── main_data.xml
|
├── core_files
     ├── core_master
     ├── core_slave

这是我应用的内容,并带有注释,以使您更好地理解该过程。

$ cat zip_cap_dirs.py 
""" Zip 'cap_*' directories. """           
import os                                                                       
import zipfile as zf                                                            


for root, dirs, files in sorted(os.walk('.')):                                                                                               
    if 'cap_' in root:                                                          
        print(f"Compressing: {root}")                                           
        # Defining .zip name, according to Capítulo.                            
        cap_dir_zip = '{}.zip'.format(root)                                     
        # Opening zipfile context for current root dir.                         
        with zf.ZipFile(cap_dir_zip, 'w', zf.ZIP_DEFLATED) as new_zip:          
            # Iterating over os.walk list of files for the current root dir.    
            for f in files:                                                     
                # Defining relative path to files from current root dir.        
                f_path = os.path.join(root, f)                                  
                # Writing the file on the .zip file of the context              
                new_zip.write(f_path) 

基本上,每次迭代过os.walk(path),我打开了情境zipfile设置,之后,迭代循环访问files,这是一个list从文件root目录,形成了基于当前的每个文件的相对路径root的目录,附加到zipfile其运行的背景下。

输出显示如下:

$ python3 zip_cap_dirs.py
Compressing: ./cap_01
Compressing: ./cap_02
Compressing: ./cap_03

要查看每个.zip目录的内容,可以使用以下less命令:

$ less cap_01.zip

Archive:  cap_01.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
  22017  Defl:N     2471  89% 2019-09-05 08:05 7a3b5ec6  cap_01/0101000001.json
  21998  Defl:N     2471  89% 2019-09-05 08:05 155bece7  cap_01/0101000002.json
  23236  Defl:N     2573  89% 2019-09-05 08:05 55fced20  cap_01/0101000003.json
--------          ------- ---                           -------
  67251             7515  89%                            3 files

So many answers here, and I hope I might contribute with my own version, which is based on the original answer (by the way), but with a more graphical perspective, also using context for each zipfile setup and sorting os.walk(), in order to have a ordered output.

Having these folders and them files (among other folders), I wanted to create a .zip for each cap_ folder:

$ tree -d
.
├── cap_01
|    ├── 0101000001.json
|    ├── 0101000002.json
|    ├── 0101000003.json
|
├── cap_02
|    ├── 0201000001.json
|    ├── 0201000002.json
|    ├── 0201001003.json
|
├── cap_03
|    ├── 0301000001.json
|    ├── 0301000002.json
|    ├── 0301000003.json
| 
├── docs
|    ├── map.txt
|    ├── main_data.xml
|
├── core_files
     ├── core_master
     ├── core_slave

Here’s what I applied, with comments for better understanding of the process.

$ cat zip_cap_dirs.py 
""" Zip 'cap_*' directories. """           
import os                                                                       
import zipfile as zf                                                            


for root, dirs, files in sorted(os.walk('.')):                                                                                               
    if 'cap_' in root:                                                          
        print(f"Compressing: {root}")                                           
        # Defining .zip name, according to Capítulo.                            
        cap_dir_zip = '{}.zip'.format(root)                                     
        # Opening zipfile context for current root dir.                         
        with zf.ZipFile(cap_dir_zip, 'w', zf.ZIP_DEFLATED) as new_zip:          
            # Iterating over os.walk list of files for the current root dir.    
            for f in files:                                                     
                # Defining relative path to files from current root dir.        
                f_path = os.path.join(root, f)                                  
                # Writing the file on the .zip file of the context              
                new_zip.write(f_path) 

Basically, for each iteration over os.walk(path), I’m opening a context for zipfile setup and afterwards, iterating iterating over files, which is a list of files from root directory, forming the relative path for each file based on the current root directory, appending to the zipfile context which is running.

And the output is presented like this:

$ python3 zip_cap_dirs.py
Compressing: ./cap_01
Compressing: ./cap_02
Compressing: ./cap_03

To see the contents of each .zip directory, you can use less command:

$ less cap_01.zip

Archive:  cap_01.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
  22017  Defl:N     2471  89% 2019-09-05 08:05 7a3b5ec6  cap_01/0101000001.json
  21998  Defl:N     2471  89% 2019-09-05 08:05 155bece7  cap_01/0101000002.json
  23236  Defl:N     2573  89% 2019-09-05 08:05 55fced20  cap_01/0101000003.json
--------          ------- ---                           -------
  67251             7515  89%                            3 files

回答 18

这是使用pathlib和上下文管理器的一种现代方法。将文件直接放在zip中,而不放在子文件夹中。

def zip_dir(filename: str, dir_to_zip: pathlib.Path):
    with zipfile.ZipFile(filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Use glob instead of iterdir(), to cover all subdirectories.
        for directory in dir_to_zip.glob('**'):
            for file in directory.iterdir():
                if not file.is_file():
                    continue
                # Strip the first component, so we don't create an uneeded subdirectory
                # containing everything.
                zip_path = pathlib.Path(*file.parts[1:])
                # Use a string, since zipfile doesn't support pathlib  directly.
                zipf.write(str(file), str(zip_path))

Here’s a modern approach, using pathlib, and a context manager. Puts the files directly in the zip, rather than in a subfolder.

def zip_dir(filename: str, dir_to_zip: pathlib.Path):
    with zipfile.ZipFile(filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Use glob instead of iterdir(), to cover all subdirectories.
        for directory in dir_to_zip.glob('**'):
            for file in directory.iterdir():
                if not file.is_file():
                    continue
                # Strip the first component, so we don't create an uneeded subdirectory
                # containing everything.
                zip_path = pathlib.Path(*file.parts[1:])
                # Use a string, since zipfile doesn't support pathlib  directly.
                zipf.write(str(file), str(zip_path))

回答 19

我通过将Mark Byers的解决方案与Reimund和Morten Zilmer的注释(相对路径,包括空目录)合并在一起来准备函数。最佳实践with是在ZipFile的文件构造中使用。

该函数还准备一个默认的zip文件名,带有压缩的目录名和’.zip’扩展名。因此,它仅适用于一个参数:要压缩的源目录。

import os
import zipfile

def zip_dir(path_dir, path_file_zip=''):
if not path_file_zip:
    path_file_zip = os.path.join(
        os.path.dirname(path_dir), os.path.basename(path_dir)+'.zip')
with zipfile.ZipFile(path_file_zip, 'wb', zipfile.ZIP_DEFLATED) as zip_file:
    for root, dirs, files in os.walk(path_dir):
        for file_or_dir in files + dirs:
            zip_file.write(
                os.path.join(root, file_or_dir),
                os.path.relpath(os.path.join(root, file_or_dir),
                                os.path.join(path_dir, os.path.pardir)))

I prepared a function by consolidating Mark Byers’ solution with Reimund and Morten Zilmer’s comments (relative path and including empty directories). As a best practice, with is used in ZipFile’s file construction.

The function also prepares a default zip file name with the zipped directory name and ‘.zip’ extension. Therefore, it works with only one argument: the source directory to be zipped.

import os
import zipfile

def zip_dir(path_dir, path_file_zip=''):
if not path_file_zip:
    path_file_zip = os.path.join(
        os.path.dirname(path_dir), os.path.basename(path_dir)+'.zip')
with zipfile.ZipFile(path_file_zip, 'wb', zipfile.ZIP_DEFLATED) as zip_file:
    for root, dirs, files in os.walk(path_dir):
        for file_or_dir in files + dirs:
            zip_file.write(
                os.path.join(root, file_or_dir),
                os.path.relpath(os.path.join(root, file_or_dir),
                                os.path.join(path_dir, os.path.pardir)))

回答 20

# import required python modules
# You have to install zipfile package using pip install

import os,zipfile

# Change the directory where you want your new zip file to be

os.chdir('Type your destination')

# Create a new zipfile ( I called it myfile )

zf = zipfile.ZipFile('myfile.zip','w')

# os.walk gives a directory tree. Access the files using a for loop

for dirnames,folders,files in os.walk('Type your directory'):
    zf.write('Type your Directory')
    for file in files:
        zf.write(os.path.join('Type your directory',file))
# import required python modules
# You have to install zipfile package using pip install

import os,zipfile

# Change the directory where you want your new zip file to be

os.chdir('Type your destination')

# Create a new zipfile ( I called it myfile )

zf = zipfile.ZipFile('myfile.zip','w')

# os.walk gives a directory tree. Access the files using a for loop

for dirnames,folders,files in os.walk('Type your directory'):
    zf.write('Type your Directory')
    for file in files:
        zf.write(os.path.join('Type your directory',file))

回答 21

好了,在阅读建议之后,我想到了一种与2.7.x相似的方式,而不创建“有趣的”目录名称(类似绝对的名称),并且只会在zip中创建指定的文件夹。

或者,以防万一您需要您的zip包含一个包含所选目录内容的文件夹。

def zipDir( path, ziph ) :
 """
 Inserts directory (path) into zipfile instance (ziph)
 """
 for root, dirs, files in os.walk( path ) :
  for file in files :
   ziph.write( os.path.join( root, file ) , os.path.basename( os.path.normpath( path ) ) + "\\" + file )

def makeZip( pathToFolder ) :
 """
 Creates a zip file with the specified folder
 """
 zipf = zipfile.ZipFile( pathToFolder + 'file.zip', 'w', zipfile.ZIP_DEFLATED )
 zipDir( pathToFolder, zipf )
 zipf.close()
 print( "Zip file saved to: " + pathToFolder)

makeZip( "c:\\path\\to\\folder\\to\\insert\\into\\zipfile" )

Well, after reading the suggestions I came up with a very similar way that works with 2.7.x without creating “funny” directory names (absolute-like names), and will only create the specified folder inside the zip.

Or just in case you needed your zip to contain a folder inside with the contents of the selected directory.

def zipDir( path, ziph ) :
 """
 Inserts directory (path) into zipfile instance (ziph)
 """
 for root, dirs, files in os.walk( path ) :
  for file in files :
   ziph.write( os.path.join( root, file ) , os.path.basename( os.path.normpath( path ) ) + "\\" + file )

def makeZip( pathToFolder ) :
 """
 Creates a zip file with the specified folder
 """
 zipf = zipfile.ZipFile( pathToFolder + 'file.zip', 'w', zipfile.ZIP_DEFLATED )
 zipDir( pathToFolder, zipf )
 zipf.close()
 print( "Zip file saved to: " + pathToFolder)

makeZip( "c:\\path\\to\\folder\\to\\insert\\into\\zipfile" )

回答 22

创建zip文件的功能。

def CREATEZIPFILE(zipname, path):
    #function to create a zip file
    #Parameters: zipname - name of the zip file; path - name of folder/file to be put in zip file

    zipf = zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED)
    zipf.setpassword(b"password") #if you want to set password to zipfile

    #checks if the path is file or directory
    if os.path.isdir(path):
        for files in os.listdir(path):
            zipf.write(os.path.join(path, files), files)

    elif os.path.isfile(path):
        zipf.write(os.path.join(path), path)
    zipf.close()

Function to create zip file.

def CREATEZIPFILE(zipname, path):
    #function to create a zip file
    #Parameters: zipname - name of the zip file; path - name of folder/file to be put in zip file

    zipf = zipfile.ZipFile(zipname, 'w', zipfile.ZIP_DEFLATED)
    zipf.setpassword(b"password") #if you want to set password to zipfile

    #checks if the path is file or directory
    if os.path.isdir(path):
        for files in os.listdir(path):
            zipf.write(os.path.join(path, files), files)

    elif os.path.isfile(path):
        zipf.write(os.path.join(path), path)
    zipf.close()

回答 23

使用zipfly

import zipfly

paths = [
    {
        'fs': '/path/to/large/file'
    },
]

zfly = zipfly.ZipFly( paths = paths )

with open("large.zip", "wb") as f:
    for i in zfly.generator():
        f.write(i)

Using zipfly

import zipfly

paths = [
    {
        'fs': '/path/to/large/file'
    },
]

zfly = zipfly.ZipFly( paths = paths )

with open("large.zip", "wb") as f:
    for i in zfly.generator():
        f.write(i)

用Python解压缩文件

问题:用Python解压缩文件

我通读了zipfile文档,但不明白如何解压缩文件,只能解压缩文件。如何将zip文件的所有内容解压缩到同一目录中?

I read through the zipfile documentation, but couldn’t understand how to unzip a file, only how to zip a file. How do I unzip all the contents of a zip file into the same directory?


回答 0

import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

差不多了!

import zipfile
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

That’s pretty much it!


回答 1

如果您使用的是Python 3.2或更高版本:

import zipfile
with zipfile.ZipFile("file.zip","r") as zip_ref:
    zip_ref.extractall("targetdir")

您不需要使用closetry / catch,因为它使用了 上下文管理器构造。

If you are using Python 3.2 or later:

import zipfile
with zipfile.ZipFile("file.zip","r") as zip_ref:
    zip_ref.extractall("targetdir")

You dont need to use the close or try/catch with this as it uses the context manager construction.


回答 2

extractall如果您使用的是Python 2.6+,请使用该方法

zip = ZipFile('file.zip')
zip.extractall()

Use the extractall method, if you’re using Python 2.6+

zip = ZipFile('file.zip')
zip.extractall()

回答 3

您也只能导入ZipFile

from zipfile import ZipFile
zf = ZipFile('path_to_file/file.zip', 'r')
zf.extractall('path_to_extract_folder')
zf.close()

适用于Python 2Python 3

You can also import only ZipFile:

from zipfile import ZipFile
zf = ZipFile('path_to_file/file.zip', 'r')
zf.extractall('path_to_extract_folder')
zf.close()

Works in Python 2 and Python 3.


回答 4

尝试这个 :


import zipfile
def un_zipFiles(path):
    files=os.listdir(path)
    for file in files:
        if file.endswith('.zip'):
            filePath=path+'/'+file
            zip_file = zipfile.ZipFile(filePath)
            for names in zip_file.namelist():
                zip_file.extract(names,path)
            zip_file.close() 

path:解压缩文件的路径

try this :


import zipfile
def un_zipFiles(path):
    files=os.listdir(path)
    for file in files:
        if file.endswith('.zip'):
            filePath=path+'/'+file
            zip_file = zipfile.ZipFile(filePath)
            for names in zip_file.namelist():
                zip_file.extract(names,path)
            zip_file.close() 

path : unzip file’s path


回答 5

import os 
zip_file_path = "C:\AA\BB"
file_list = os.listdir(path)
abs_path = []
for a in file_list:
    x = zip_file_path+'\\'+a
    print x
    abs_path.append(x)
for f in abs_path:
    zip=zipfile.ZipFile(f)
    zip.extractall(zip_file_path)

如果文件不是zip,则不包含对该文件的验证。如果文件夹包含非.zip文件,它将失败。

import os 
zip_file_path = "C:\AA\BB"
file_list = os.listdir(path)
abs_path = []
for a in file_list:
    x = zip_file_path+'\\'+a
    print x
    abs_path.append(x)
for f in abs_path:
    zip=zipfile.ZipFile(f)
    zip.extractall(zip_file_path)

This does not contain validation for the file if its not zip. If the folder contains non .zip file it will fail.