问题:全局排除模式

我有一个目录,里面有一堆文件:eee2314asd3442…和eph

我想排除所有eph以该glob功能开头的文件。

我该怎么做?

I have a directory with a bunch of files inside: eee2314, asd3442 … and eph.

I want to exclude all files that start with eph with the glob function.

How can I do it?


回答 0

glob的模式规则不是正则表达式。相反,它们遵循标准的Unix路径扩展规则。仅有几个特殊字符:支持两种不同的通配符,并且支持字符范围[from glob ]。

因此,您可以排除某些带有模式的文件。
例如,要排除清单文件(以开头的文件_)和glob,可以使用:

files = glob.glob('files_path/[!_]*')

The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. There are only a few special characters: two different wild-cards, and character ranges are supported [from glob].

So you can exclude some files with patterns.
For example to exclude manifests files (files starting with _) with glob, you can use:

files = glob.glob('files_path/[!_]*')

回答 1

您可以扣除集合:

set(glob("*")) - set(glob("eph*"))

You can deduct sets:

set(glob("*")) - set(glob("eph*"))

回答 2

您不能使用该glob功能排除模式,Glob仅允许包含模式。通配符语法非常有限(即使[!..]字符类也必须与字符匹配,所以它是一个对于不在类中的每个字符,包含模式)。

您必须自己进行过滤;列表理解通常在这里很有效:

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

You can’t exclude patterns with the glob function, globs only allow for inclusion patterns. Globbing syntax is very limited (even a [!..] character class must match a character, so it is an inclusion pattern for every character that is not in the class).

You’ll have to do your own filtering; a list comprehension usually works nicely here:

files = [fn for fn in glob('somepath/*.txt') 
         if not os.path.basename(fn).startswith('eph')]

回答 3

游戏晚了,但是您也可以将pythonfilter应用于结果glob

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

或将Lambda替换为适当的正则表达式搜索等。

编辑:我只是意识到,如果您使用完整路径startswith将无法正常工作,因此您需要一个正则表达式

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']

In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

Late to the game but you could alternatively just apply a python filter to the result of a glob:

files = glob.iglob('your_path_here')
files_i_care_about = filter(lambda x: not x.startswith("eph"), files)

or replacing the lambda with an appropriate regex search, etc…

EDIT: I just realized that if you’re using full paths the startswith won’t work, so you’d need a regex

In [10]: a
Out[10]: ['/some/path/foo', 'some/path/bar', 'some/path/eph_thing']

In [11]: filter(lambda x: not re.search('/eph', x), a)
Out[11]: ['/some/path/foo', 'some/path/bar']

回答 4

如何在遍历文件夹中的所有文件时跳过特定文件!下面的代码将跳过所有以’eph’开头的Excel文件

import glob
import re
for file in glob.glob('*.xlsx'):
    if re.match('eph.*\.xlsx',file):
        continue
    else:
        #do your stuff here
        print(file)

这样,您可以使用更复杂的正则表达式模式在文件夹中包含/排除一组特定的文件。

How about skipping the particular file while iterating over all the files in the folder! Below code would skip all excel files that start with ‘eph’

import glob
import re
for file in glob.glob('*.xlsx'):
    if re.match('eph.*\.xlsx',file):
        continue
    else:
        #do your stuff here
        print(file)

This way you can use more complex regex patterns to include/exclude a particular set of files in a folder.


回答 5

glob,我建议比较一下,pathlib过滤一个模式很简单。

from pathlib import Path

p = Path(YOUR_PATH)
filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]

如果要过滤更复杂的模式,可以定义一个函数来执行此操作,就像:

def not_in_pattern(x):
    return (not x.name.startswith("eph")) and not x.name.startswith("epi")


filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]

使用该代码,您可以过滤以eph或开头的所有文件epi

Compare with glob, I recommend pathlib, filter one pattern is very simple.

from pathlib import Path

p = Path(YOUR_PATH)
filtered = [x for x in p.glob("**/*") if not x.name.startswith("eph")]

and if you want to filter more complex pattern, you can define a function to do that, just like:

def not_in_pattern(x):
    return (not x.name.startswith("eph")) and not x.name.startswith("epi")


filtered = [x for x in p.glob("**/*") if not_in_pattern(x)]

use that code, you can filter all files that start with eph or start with epi.


回答 6

更一般而言,要排除不符合某些shell regexp的文件,可以使用module fnmatch

import fnmatch

file_list = glob('somepath')    
for ind, ii in enumerate(file_list):
    if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'):
        file_list.pop(ind)

上面的代码将首先从给定的路径生成一个列表,然后弹出不满足正则表达式要求的约束的文件。

More generally, to exclude files that don’t comply with some shell regexp, you could use module fnmatch:

import fnmatch

file_list = glob('somepath')    
for ind, ii in enumerate(file_list):
    if not fnmatch.fnmatch(ii, 'bash_regexp_with_exclude'):
        file_list.pop(ind)

The above will first generate a list from a given path and next pop out the files that won’t satisfy the regular expression with the desired constraint.


回答 7

如公认的答案所述,您不能使用glob排除模式,因此以下是一种过滤glob结果的方法。

公认的答案可能是最好的pythonic做事方式,但是如果您认为列表理解看起来有些丑陋,并且无论如何都想使代码最大化numpythonic(就像我一样),那么您可以这样做(但是请注意,这可能效率较低)比列表理解方法):

import glob

data_files = glob.glob("path_to_files/*.fits")

light_files = np.setdiff1d( data_files, glob.glob("*BIAS*"))
light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))

(以我为例,我在一个目录中有一些图像帧,偏置帧和平面帧,而我只想要这些图像帧)

As mentioned by the accepted answer, you can’t exclude patterns with glob, so the following is a method to filter your glob result.

The accepted answer is probably the best pythonic way to do things but if you think list comprehensions look a bit ugly and want to make your code maximally numpythonic anyway (like I did) then you can do this (but note that this is probably less efficient than the list comprehension method):

import glob

data_files = glob.glob("path_to_files/*.fits")

light_files = np.setdiff1d( data_files, glob.glob("*BIAS*"))
light_files = np.setdiff1d(light_files, glob.glob("*FLAT*"))

(In my case, I had some image frames, bias frames, and flat frames all in one directory and I just wanted the image frames)


回答 8

如果字符的位置并不重要,那就是例如排除清单文件(无论它被发现_),与globre正则表达式的操作,您可以使用:

import glob
import re
for file in glob.glob('*.txt'):
    if re.match(r'.*\_.*', file):
        continue
    else:
        print(file)

或者以一种更优雅的方式- list comprehension

filtered = [f for f in glob.glob('*.txt') if not re.match(r'.*\_.*', f)]

for mach in filtered:
    print(mach)

If the position of the character isn’t important, that is for example to exclude manifests files (wherever it is found _) with glob and reregular expression operations, you can use:

import glob
import re
for file in glob.glob('*.txt'):
    if re.match(r'.*\_.*', file):
        continue
    else:
        print(file)

Or with in a more elegant way – list comprehension

filtered = [f for f in glob.glob('*.txt') if not re.match(r'.*\_.*', f)]

for mach in filtered:
    print(mach)

回答 9

您可以使用以下方法:

# Get all the files
allFiles = glob.glob("*")
# Files starting with eph
ephFiles = glob.glob("eph*")
# Files which doesnt start with eph
noephFiles = []
for file in allFiles:
    if file not in ephFiles:
        noephFiles.append(file)
# noepchFiles has all the file which doesnt start with eph.

Thank you.  

You can use the below method:

# Get all the files
allFiles = glob.glob("*")
# Files starting with eph
ephFiles = glob.glob("eph*")
# Files which doesnt start with eph
noephFiles = []
for file in allFiles:
    if file not in ephFiles:
        noephFiles.append(file)
# noepchFiles has all the file which doesnt start with eph.

Thank you.  

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。