标签归档:directory-listing

来自os.listdir()的非字母数字列表顺序

问题:来自os.listdir()的非字母数字列表顺序

我经常使用python处理数据目录。最近,我注意到列表的默认顺序已更改为几乎毫无意义的内容。例如,如果我位于包含以下子目录的当前目录中:run01,run02,…,run19,run20,然后从以下命令生成列表:

dir = os.listdir(os.getcwd())

然后我通常会按以下顺序获得列表:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

等等。该顺序曾经是字母数字。但是这个新订单已经存在了一段时间。

是什么决定这些列表的(显示)顺序?

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, … run19, run20, and then I generate a list from the following command:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

What is determining the (displayed) order of these lists?


回答 0

我认为顺序与文件在FileSystem上建立索引的方式有关。如果您确实要使其遵循某些顺序,则可以在获取文件后始终对列表进行排序。

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.


回答 1

您可以使用内置sorted函数对字符串进行任意排序。根据您的描述,

sorted(os.listdir(whatever_directory))

或者,您可以使用.sort列表的方法:

lst = os.listdir(whatever_directory)
lst.sort()

我认为应该可以解决问题。

请注意,os.listdir获取文件名的顺序可能完全取决于您的文件系统。

You can use the builtin sorted function to sort the strings however you want. Based on what you describe,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sort method of a list:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

Note that the order that os.listdir gets the filenames is probably completely dependent on your filesystem.


回答 2

根据文档

os.listdir(路径)

返回一个列表,其中包含由path给出的目录中条目的名称。该列表按任意顺序排列。它不包括特殊条目“。” 和“ ..”,即使它们存在于目录中。

不能依赖顺序,它是文件系统的产物。

要对结果进行排序,请使用sorted(os.listdir(path))

Per the documentation:

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

Order cannot be relied upon and is an artifact of the filesystem.

To sort the result, use sorted(os.listdir(path)).


回答 3

不管出于什么原因,Python都没有内置的方法来进行自然排序(意味着1、2、10而不是1、10、2),因此您必须自己编写:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

现在,您可以使用此功能对列表进行排序:

dirlist = sorted_alphanumeric(os.listdir(...))

问题: 如果您使用上述函数对字符串(例如文件夹名称)进行排序,并希望像Windows资源管理器一样对它们进行排序,则在某些情况下无法正常使用。
如果您的文件夹名称中带有某些“特殊”字符,则此排序功能将在Windows上返回不正确的结果。例如,此函数将排序1, !1, !a, a,而Windows资源管理器将排序!1, 1, !a, a

因此,如果您想像Python中的Windows资源管理器那样进行排序,则必须通过ctypes 使用Windows内置函数StrCmpLogicalW(这当然在Unix上不起作用):

from ctypes import wintypes, windll
from functools import cmp_to_key
def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

此功能比稍慢sorted_alphanumeric()

奖励:winsort还可以在Windows上对完整路径进行排序

另外,尤其是在使用Unix的情况下,可以使用natsort库(pip install natsort)以正确的方式对完整路径进行排序(意味着子文件夹位于正确的位置)。

您可以像这样使用它来排序完整路径:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

不要将其用于仅对文件夹名称(或通常为字符串)进行常规排序,因为它比sorted_alphanumeric()上面的函数要慢很多。如果您期望Windows资源管理器排序,该
natsorted库将给您不正确的结果,因此可以使用winsort()它。

Python for whatever reason does not come with a built-in way to have natural sorting (meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

dirlist = sorted_alphanumeric(os.listdir(...))

PROBLEMS: In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain ‘special’ characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

So if you want to sort exactly like Windows Explorer does in Python you have to use the Windows built-in function StrCmpLogicalW via ctypes (this of course won’t work on Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key
def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

Bonus: winsort can also sort full paths on Windows.

Alternatively, especially if you use Unix, you can use the natsort library (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

You can use it like this to sort full paths:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Don’t use it for normal sorting of just folder names (or strings in general), as it’s quite a bit slower than then sorted_alphanumeric() function above.
natsorted library will give you incorrect results if you expect Windows Explorer sorting, so use winsort() for that.


回答 4

我认为默认情况下,顺序由ASCII值确定。这个问题的解决方案是这样

dir = sorted(os.listdir(os.getcwd()), key=len)

I think by default the order is determined with the ASCII value. The solution to this problem is this

dir = sorted(os.listdir(os.getcwd()), key=len)

回答 5

这可能只是C的readdir()返回顺序。尝试运行此C程序:

#include <dirent.h>
#include <stdio.h>
int main(void)
{   DIR *dirp;
    struct dirent* de;
    dirp = opendir(".");
    while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
    closedir(dirp);
    return 0;
}

构建线应类似于gcc -o foo foo.c

PS只需运行此代码和您的Python代码,它们都给了我排序的输出,所以我无法重现您看到的内容。

It’s probably just the order that C’s readdir() returns. Try running this C program:

#include <dirent.h>
#include <stdio.h>
int main(void)
{   DIR *dirp;
    struct dirent* de;
    dirp = opendir(".");
    while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
    closedir(dirp);
    return 0;
}

The build line should be something like gcc -o foo foo.c.

P.S. Just ran this and your Python code, and they both gave me sorted output, so I can’t reproduce what you’re seeing.


回答 6

aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

由于在案件的矿山要求我有这样的情况row_163.pkl在这里os.path.splitext('row_163.pkl')将它分成('row_163', '.pkl')所以需要根据“_”也把它分解。

但如果您有需要,您可以做类似的事情

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

哪里

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

对于目录检索,您也可以 sorted(os.listdir(path))

对于like 'run01.txt''run01.csv'您可以这样做

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))
aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pkl here os.path.splitext('row_163.pkl') will break it into ('row_163', '.pkl') so need to split it based on ‘_’ also.

but in case of your requirement you can do something like

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

and for the case of like 'run01.txt' or 'run01.csv' you can do like this

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))

回答 7

我发现“排序”并不总是按预期进行。例如,我有一个如下目录,“ sort”给我一个非常奇怪的结果:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

看起来它首先比较第一个字符,如果最大,那就是最后一个。

I found “sort” does not always do what I expected. eg, I have a directory as below, and the “sort” give me a very strange result:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.


回答 8

文档中

该列表以任意顺序排列,并且不包括特殊条目“。”。和“ ..”,即使它们存在于目录中。

这意味着该顺序可能与OS /文件系统相关,没有特别有意义的顺序,因此不能保证特定顺序。提到了很多答案:如果需要,可以对检索到的列表进行排序。

干杯:)

From the documentation:

The list is in arbitrary order, and does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.

This means that the order is probably OS/filesystem dependent, has no particularly meaningful order, and is therefore not guaranteed to be anything in particular. As many answers mentioned: if preferred, the retrieved list can be sorted.

Cheers :)


回答 9

艾略特的答案可以很好地解决它,但是由于它是评论,因此没有引起注意,因此为了帮助某人,我在此重申它为解决方案。

使用natsort库:

使用以下命令为Ubuntu和其他Debian版本安装库

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

有关如何使用此库的详细信息,请参见此处

Elliot’s answer solves it perfectly but because it is a comment, it goes unnoticed so with the aim of helping someone, I am reiterating it as a solution.

Use natsort library:

Install the library with the following command for Ubuntu and other Debian versions

Python 2

sudo pip install natsort

Python 3

sudo pip3 install natsort

Details of how to use this library is found here


回答 10

In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.
In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.

回答 11

os.listdirsorted命令的建议组合产生的结果与Linux下的ls -l命令相同。以下示例验证了此假设:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

因此,对于想要在其Python代码中重现著名的ls -l命令的结果的人来说,sorted(os.listdir(DIR))效果很好。

The proposed combination of os.listdir and sorted commands generates the same result as ls -l command under Linux. The following example verifies this assumption:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

So, for someone who wants to reproduce the result of the well-known ls -l command in their python code, sorted( os.listdir( DIR ) ) works pretty well.


获取目录中文件的过滤列表

问题:获取目录中文件的过滤列表

我正在尝试使用Python获取目录中的文件列表,但是我不想要所有文件的列表。

我本质上想要的是能够执行以下操作但使用Python而不执行ls的功能。

ls 145592*.jpg

如果没有内置方法,我目前正在考虑编写一个for循环以遍历an的结果。 os.listdir()并将所有匹配的文件附加到新列表中。

但是,该目录中有很多文件,因此我希望有一种更有效的方法(或内置方法)。

I am trying to get a list of files in a directory using Python, but I do not want a list of ALL the files.

What I essentially want is the ability to do something like the following but using Python and not executing ls.

ls 145592*.jpg

If there is no built-in method for this, I am currently thinking of writing a for loop to iterate through the results of an os.listdir() and to append all the matching files to a new list.

However, there are a lot of files in that directory and therefore I am hoping there is a more efficient method (or a built-in method).


回答 0

import glob

jpgFilenamesList = glob.glob('145592*.jpg')

See glob in python documenttion


回答 1

glob.glob()绝对是做到这一点的方式(根据Ignacio)。但是,如果您确实需要更复杂的匹配,则可以使用列表理解和来完成re.match(),例如:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

更加灵活,但是您注意到效率更低。

glob.glob() is definitely the way to do it (as per Ignacio). However, if you do need more complicated matching, you can do it with a list comprehension and re.match(), something like so:

files = [f for f in os.listdir('.') if re.match(r'[0-9]+.*\.jpg', f)]

More flexible, but as you note, less efficient.


回答 2

把事情简单化:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

我更喜欢这种形式的列表理解,因为它的英文读起来很好。

我将第四行读为:对于os.listdir中路径的每个fn,请仅提供与我包含的任何扩展名匹配的那些fn。

对于新手python程序员来说,可能很难真正习惯于使用列表推导进行过滤,并且对于非常大的数据集,它可能会有一些内存开销,但是对于列出目录和其他简单的字符串过滤任务,列表推导会导致更干净可记录的代码。

这种设计的唯一之处在于,它不能保护您避免犯错误,而不是传递字符串而不是列表。例如,如果您不小心将字符串转换为列表,并最终检查了字符串的所有字符,则可能最终会得到一系列误报。

但是,拥有一个易于解决的问题比解决一个难以理解的解决方案要好。

Keep it simple:

import os
relevant_path = "[path to folder]"
included_extensions = ['jpg','jpeg', 'bmp', 'png', 'gif']
file_names = [fn for fn in os.listdir(relevant_path)
              if any(fn.endswith(ext) for ext in included_extensions)]

I prefer this form of list comprehensions because it reads well in English.

I read the fourth line as: For each fn in os.listdir for my path, give me only the ones that match any one of my included extensions.

It may be hard for novice python programmers to really get used to using list comprehensions for filtering, and it can have some memory overhead for very large data sets, but for listing a directory and other simple string filtering tasks, list comprehensions lead to more clean documentable code.

The only thing about this design is that it doesn’t protect you against making the mistake of passing a string instead of a list. For example if you accidentally convert a string to a list and end up checking against all the characters of a string, you could end up getting a slew of false positives.

But it’s better to have a problem that’s easy to fix than a solution that’s hard to understand.


回答 3

另外的选择:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html

Another option:

>>> import os, fnmatch
>>> fnmatch.filter(os.listdir('.'), '*.py')
['manage.py']

https://docs.python.org/3/library/fnmatch.html


回答 4

过滤glob模块:

导入球

import glob

通配符:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

接头扩展.txt

files = glob.glob("/home/ach/*/*.txt")

一个字符

glob.glob("/home/ach/file?.txt")

编号范围

glob.glob("/home/ach/*[0-9]*")

字母范围

glob.glob("/home/ach/[a-c]*")

Filter with glob module:

Import glob

import glob

Wild Cards:

files=glob.glob("data/*")
print(files)

Out:

['data/ks_10000_0', 'data/ks_1000_0', 'data/ks_100_0', 'data/ks_100_1',
'data/ks_100_2', 'data/ks_106_0', 'data/ks_19_0', 'data/ks_200_0', 'data/ks_200_1', 
'data/ks_300_0', 'data/ks_30_0', 'data/ks_400_0', 'data/ks_40_0', 'data/ks_45_0', 
'data/ks_4_0', 'data/ks_500_0', 'data/ks_50_0', 'data/ks_50_1', 'data/ks_60_0', 
'data/ks_82_0', 'data/ks_lecture_dp_1', 'data/ks_lecture_dp_2']

Fiter extension .txt:

files = glob.glob("/home/ach/*/*.txt")

A single character

glob.glob("/home/ach/file?.txt")

Number Ranges

glob.glob("/home/ach/*[0-9]*")

Alphabet Ranges

glob.glob("/home/ach/[a-c]*")

回答 5

初步代码

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

解决方案1-使用“ glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

解决方案2-使用“操作系统” +“ fnmatch”

版本2.1-在当前目录中查找

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

版本2.2-递归查找

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

结果

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

解决方案3使用“ pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

笔记:

  1. 在Python 3.4上测试
  2. 仅在Python 3.4中添加了模块“ pathlib”
  3. Python 3.5添加了glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob递归查找的功能 。由于我的机器安装了Python 3.4,因此尚未进行测试。

Preliminary code

import glob
import fnmatch
import pathlib
import os

pattern = '*.py'
path = '.'

Solution 1 – use “glob”

# lookup in current dir
glob.glob(pattern)

In [2]: glob.glob(pattern)
Out[2]: ['wsgi.py', 'manage.py', 'tasks.py']

Solution 2 – use “os” + “fnmatch”

Variant 2.1 – Lookup in current dir

# lookup in current dir
fnmatch.filter(os.listdir(path), pattern)

In [3]: fnmatch.filter(os.listdir(path), pattern)
Out[3]: ['wsgi.py', 'manage.py', 'tasks.py']

Variant 2.2 – Lookup recursive

# lookup recursive
for dirpath, dirnames, filenames in os.walk(path):

    if not filenames:
        continue

    pythonic_files = fnmatch.filter(filenames, pattern)
    if pythonic_files:
        for file in pythonic_files:
            print('{}/{}'.format(dirpath, file))

Result

./wsgi.py
./manage.py
./tasks.py
./temp/temp.py
./apps/diaries/urls.py
./apps/diaries/signals.py
./apps/diaries/actions.py
./apps/diaries/querysets.py
./apps/library/tests/test_forms.py
./apps/library/migrations/0001_initial.py
./apps/polls/views.py
./apps/polls/formsets.py
./apps/polls/reports.py
./apps/polls/admin.py

Solution 3 – use “pathlib”

# lookup in current dir
path_ = pathlib.Path('.')
tuple(path_.glob(pattern))

# lookup recursive
tuple(path_.rglob(pattern))

Notes:

  1. Tested on the Python 3.4
  2. The module “pathlib” was added only in the Python 3.4
  3. The Python 3.5 added a feature for recursive lookup with glob.glob https://docs.python.org/3.5/library/glob.html#glob.glob. Since my machine is installed with Python 3.4, I have not tested that.

回答 6

使用os.walk递归列出您的文件

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

use os.walk to recursively list your files

import os
root = "/home"
pattern = "145992"
alist_filter = ['jpg','bmp','png','gif'] 
path=os.path.join(root,"mydir_to_scan")
for r,d,f in os.walk(path):
    for file in f:
        if file[-3:] in alist_filter and pattern in file:
            print os.path.join(root,file)

回答 7

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

这将为您提供jpg文件及其完整路径的列表。您可以替换x[0]+"/"+ff的只是文件名。您也可以f.endswith(".jpg")用所需的任何字符串条件替换。

import os

dir="/path/to/dir"
[x[0]+"/"+f for x in os.walk(dir) for f in x[2] if f.endswith(".jpg")]

This will give you a list of jpg files with their full path. You can replace x[0]+"/"+f with f for just filenames. You can also replace f.endswith(".jpg") with whatever string condition you wish.


回答 8

您可能还需要更高级的方法(我已经实现并打包为findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

可以安装

pip install findtools

you might also like a more high-level approach (I have implemented and packaged as findtools):

from findtools.find_files import (find_files, Match)


# Recursively find all *.txt files in **/home/**
txt_files_pattern = Match(filetype='f', name='*.txt')
found_files = find_files(path='/home', match=txt_files_pattern)

for found_file in found_files:
    print found_file

can be installed with

pip install findtools

回答 9

“ path / to / images”中带有“ jpg”和“ png”扩展名的文件名:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

Filenames with “jpg” and “png” extensions in “path/to/images”:

import os
accepted_extensions = ["jpg", "png"]
filenames = [fn for fn in os.listdir("path/to/images") if fn.split(".")[-1] in accepted_extensions]

回答 10

您可以使用Python标准库3.4及更高版本中提供的pathlib

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

You can use pathlib that is available in Python standard library 3.4 and above.

from pathlib import Path

files = [f for f in Path.cwd().iterdir() if f.match("145592*.jpg")]

回答 11

您可以定义模式并进行检查。在这里,我采用了开始和结束模式,并在文件名中查找它们。FILES包含目录中所有文件的列表。

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE and PATTERN_END in FILE:
            print FILE

You can define pattern and check for it. Here I have taken both start and end pattern and looking for them in the filename. FILES contains the list of all the files in a directory.

import os
PATTERN_START = "145592"
PATTERN_END = ".jpg"
CURRENT_DIR = os.path.dirname(os.path.realpath(__file__))
for r,d,FILES in os.walk(CURRENT_DIR):
    for FILE in FILES:
        if PATTERN_START in FILE.startwith(PATTERN_START) and PATTERN_END in FILE.endswith(PATTERN_END):
            print FILE

回答 12

str.split()怎么样?没什么可导入的。

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

How about str.split()? Nothing to import.

import os

image_names = [f for f in os.listdir(path) if len(f.split('.jpg')) == 2]

回答 13

您可以使用subprocess.check_ouput()作为

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

当然,引号之间的字符串可以是您要在shell中执行并存储输出的任何内容。

You can use subprocess.check_ouput() as

import subprocess

list_files = subprocess.check_output("ls 145992*.jpg", shell=True) 

Of course, the string between quotes can be anything you want to execute in the shell, and store the output.