标签归档：image-processing

知识问答

如何分类我的爪子？

2021年8月17日 Python实用宝典

问题：如何分类我的爪子？

在我之前的问题中，我得到了一个很好的答案，可以帮助我检测出爪子在哪里压板，但是现在我很难将这些结果与相应的爪子联系起来：

我手动注释了爪子（RF =右前，RH =右后，LF =左前，LH =左后）。

正如您所看到的，显然有一个重复的模式，并且几乎在所有测量中都会返回。这是指向6条手动注释的试验的演示文稿的链接。

我最初的想法是使用启发式进行排序，例如：

前爪和后爪之间的负重比约为60-40％；
后爪的表面通常较小。
爪子（通常）在空间上分为左右两半。

但是，我对我的启发式方法有些怀疑，因为一旦遇到我从未想到的变化，它们就会对我失败。他们也将无法应付la狗的测量，la狗可能有自己的规则。

此外，乔建议的注释有时会弄乱，并且没有考虑到爪子的实际外观。

基于我对爪子内峰值检测问题的回答，我希望有更多高级解决方案可以对爪子进行分类。特别是因为每个单独的爪子的压力分布及其进程都不同，几乎就像指纹一样。我希望有一种方法可以用它来对我的爪子进行聚类，而不仅仅是按照发生的顺序对其进行排序。

因此，我正在寻找一种更好的方法来对结果和相应的爪进行排序。

对于要应对挑战的任何人，我都腌制了一个词典，其中包含所有包含每个爪的压力数据的切片切片（通过测量捆绑）以及描述其位置（切片在板上和时间上）的切片。

澄清一下：walk_sliced_data是一个字典，其中包含[‘ser_3’，’ser_2’，’sel_1’，’sel_2’，’ser_1’，’sel_3’]，这是测量的名称。每个度量都包含另一个字典[0、1、2、3、4、5、6、7、8、9、10]（来自“ sel_1”的示例），代表提取的影响。

还要注意，可以忽略“假”影响，例如对脚掌进行部分测量（在空间或时间上）。它们仅是有用的，因为它们可以帮助识别模式，但不会进行分析。

对于感兴趣的任何人，我都会保留一个博客，其中包含有关该项目的所有更新！

In my previous question I got an excellent answer that helped me detect where a paw hit a pressure plate, but now I’m struggling to link these results to their corresponding paws:

alt text

I manually annotated the paws (RF=right front, RH= right hind, LF=left front, LH=left hind).

As you can see there’s clearly a repeating pattern and it comes back in almost every measurement. Here’s a link to a presentation of 6 trials that were manually annotated.

My initial thought was to use heuristics to do the sorting, like:

There’s a ~60-40% ratio in weight bearing between the front and hind paws;
The hind paws are generally smaller in surface;
The paws are (often) spatially divided in left and right.

However, I’m a bit skeptical about my heuristics, as they would fail on me as soon as I encounter a variation I hadn’t thought off. They also won’t be able to cope with measurements from lame dogs, whom probably have rules of their own.

Furthermore, the annotation suggested by Joe sometimes get’s messed up and doesn’t take into account what the paw actually looks like.

Based on the answers I received on my question about peak detection within the paw, I’m hoping there are more advanced solutions to sort the paws. Especially because the pressure distribution and the progression thereof are different for each separate paw, almost like a fingerprint. I hope there’s a method that can use this to cluster my paws, rather than just sorting them in order of occurrence.

alt text

So I’m looking for a better way to sort the results with their corresponding paw.

For anyone up to the challenge, I have pickled a dictionary with all the sliced arrays that contain the pressure data of each paw (bundled by measurement) and the slice that describes their location (location on the plate and in time).

To clarfiy: walk_sliced_data is a dictionary that contains [‘ser_3’, ‘ser_2’, ‘sel_1’, ‘sel_2’, ‘ser_1’, ‘sel_3’], which are the names of the measurements. Each measurement contains another dictionary, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] (example from ‘sel_1’) which represent the impacts that were extracted.

Also note that ‘false’ impacts, such as where the paw is partially measured (in space or time) can be ignored. They are only useful because they can help recognizing a pattern, but won’t be analyzed.

And for anyone interested, I’m keeping a blog with all the updates regarding the project!

回答 0

好的！我终于设法使某些东西始终如一！这个问题使我困扰了好几天…好玩的东西！很抱歉这个答案的长度，但是我需要详细说明一些事情……（尽管我可能创下有史以来最长的非垃圾邮件stackoverflow答案的记录！）

附带说明一下，我正在使用Ivo 在其原始问题中提供的链接的完整数据集。这是一系列rar文件（每个狗一个），每个文件包含以ascii数组存储的几种不同的实验运行。与其尝试将独立的代码示例复制粘贴到此问题中，不如这里是一个带有完整的独立代码的位存储库。您可以使用克隆

hg clone https://joferkington@bitbucket.org/joferkington/paw-analysis

总览

正如您在问题中指出的那样，解决问题基本上有两种方法。我实际上将以不同的方式使用两者。

使用脚掌冲击的（时间和空间）顺序确定哪个脚掌是哪个。
尝试仅根据其形状来识别“爪印”。

基本上，第一种方法适用于狗的爪子，遵循上面Ivo问题中所示的梯形样式，但是只要爪子不遵循这种样式，它就会失败。以编程方式检测何时不起作用是很容易的。

因此，我们可以在实际工作中使用测量结果来建立训练数据集（约30只不同狗的约2000爪影响），以识别出哪一只爪，并将问题归结为监督分类（带有一些额外的皱纹）。 ..图像识别比“常规”监督分类问题要难一些）。

模式分析

为了详细说明第一种方法，当一条狗正常走路（不跑！）（其中一些狗可能不会走路）时，我们希望爪子按以下顺序冲击：前左，后右，右前，后左，左前等。模式可能从左前爪或右前爪开始。

如果总是这样，我们可以简单地按初始接触时间对冲击进行分类，并使用模数4将其按爪进行分组。

但是，即使一切都“正常”，这也不起作用。这是由于图案的梯形形状。后爪在空间上位于前一个前爪的后面。

因此，最初的前爪撞击后的后爪撞击通常会从传感器板上掉落，因此不会被记录下来。同样，最后的爪子撞击通常不是序列中的下一个爪子，因为爪子撞击发生在传感器板上之前，没有被记录下来。

但是，我们可以使用爪子撞击模式的形状来确定何时发生这种情况，以及是否从左前爪或右前爪开始。（我实际上忽略了这里最后影响的问题。不过，添加它并不难。）

def group_paws(data_slices, time):   
    # Sort slices by initial contact time
    data_slices.sort(key=lambda s: s[-1].start)

    # Get the centroid for each paw impact...
    paw_coords = []
    for x,y,z in data_slices:
        paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
    paw_coords = np.array(paw_coords)

    # Make a vector between each sucessive impact...
    dx, dy = np.diff(paw_coords, axis=0).T

    #-- Group paws -------------------------------------------
    paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
    paw_number = np.arange(len(paw_coords))

    # Did we miss the hind paw impact after the first 
    # front paw impact? If so, first dx will be positive...
    if dx[0] > 0: 
        paw_number[1:] += 1

    # Are we starting with the left or right front paw...
    # We assume we're starting with the left, and check dy[0].
    # If dy[0] > 0 (i.e. the next paw impacts to the left), then
    # it's actually the right front paw, instead of the left.
    if dy[0] > 0: # Right front paw impact...
        paw_number += 2

    # Now we can determine the paw with a simple modulo 4..
    paw_codes = paw_number % 4
    paw_labels = [paw_code[code] for code in paw_codes]

    return paw_labels

尽管如此，它经常无法正常工作。完整数据集中的许多狗似乎都在奔跑，而且爪子的撞击与狗走路时的时间顺序不同。（或者这只狗有严重的髋关节问题…）

幸运的是，我们仍然可以通过编程方式检测爪子撞击是否遵循我们预期的空间模式：

def paw_pattern_problems(paw_labels, dx, dy):
    """Check whether or not the label sequence "paw_labels" conforms to our
    expected spatial pattern of paw impacts. "paw_labels" should be a sequence
    of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
    # Check for problems... (This could be written a _lot_ more cleanly...)
    problems = False
    last = paw_labels[0]
    for paw, dy, dx in zip(paw_labels[1:], dy, dx):
        # Going from a left paw to a right, dy should be negative
        if last.startswith('L') and paw.startswith('R') and (dy > 0):
            problems = True
            break
        # Going from a right paw to a left, dy should be positive
        if last.startswith('R') and paw.startswith('L') and (dy < 0):
            problems = True
            break
        # Going from a front paw to a hind paw, dx should be negative
        if last.endswith('F') and paw.endswith('H') and (dx > 0):
            problems = True
            break
        # Going from a hind paw to a front paw, dx should be positive
        if last.endswith('H') and paw.endswith('F') and (dx < 0):
            problems = True
            break
        last = paw
    return problems

因此，即使简单的空间分类并不能始终有效，我们仍可以合理地确定何时进行分类。

训练数据集

从正确运行的基于模式的分类中，我们可以建立一个非常大的训练数据集，以正确分类的爪子（32只不同的狗约有2400爪子撞击！）。

现在，我们可以开始查看“平均”左前爪的外观，等等。

为此，我们需要某种“爪度量”，它对任何狗都具有相同的维数。（在完整的数据集中，有很大的狗也有很小的狗！）与玩具贵宾犬的爪子印相相比，爱尔兰埃尔克猎犬的爪子印相既宽又“重”。我们需要重新调整每个爪印的比例，以便a）它们具有相同的像素数，b）压力值已标准化。为此，我将每个爪印重新采样到20×20的网格上，并根据爪影响的最大，最小和平均压力值重新调整压力值。

def paw_image(paw):
    from scipy.ndimage import map_coordinates
    ny, nx = paw.shape

    # Trim off any "blank" edges around the paw...
    mask = paw > 0.01 * paw.max()
    y, x = np.mgrid[:ny, :nx]
    ymin, ymax = y[mask].min(), y[mask].max()
    xmin, xmax = x[mask].min(), x[mask].max()

    # Make a 20x20 grid to resample the paw pressure values onto
    numx, numy = 20, 20
    xi = np.linspace(xmin, xmax, numx)
    yi = np.linspace(ymin, ymax, numy)
    xi, yi = np.meshgrid(xi, yi)  

    # Resample the values onto the 20x20 grid
    coords = np.vstack([yi.flatten(), xi.flatten()])
    zi = map_coordinates(paw, coords)
    zi = zi.reshape((numy, numx))

    # Rescale the pressure values
    zi -= zi.min()
    zi /= zi.max()
    zi -= zi.mean() #<- Helps distinguish front from hind paws...
    return zi

完成所有这些操作之后，我们终于可以看看平均左前，右后等爪的外观。请注意，这是在> 30头大小相差很大的狗中得到的平均值，我们似乎获得了一致的结果！

但是，在对此进行任何分析之前，我们需要减去平均值（所有狗的所有腿的平均爪）。

现在，我们可以分析与均值的差异，这更容易识别：

基于图像的爪子识别

好的，我们终于有了一组模式，可以开始尝试与之匹配的爪子。每个爪都可以当作一个400维向量（由paw_image函数返回），可以与这四个400维向量进行比较。

不幸的是，如果我们仅使用“常规”监督分类算法（即使用简单的距离来找到4个图案中的哪个最接近特定的爪印），它就不能始终如一地工作。实际上，它并没有比训练数据集上的随机机会好得多。

这是图像识别中的常见问题。由于输入数据的高维性以及图像的“模糊”性质（即，相邻像素具有较高的协方差），仅查看图像与模板图像的差异并不能很好地衡量图像的质量。它们的形状相似。

特征爪

为了解决这个问题，我们需要构建一组“特征爪”（就像面部识别中的“特征脸”一样），并将每个爪印描述为这些特征爪的组合。这与主成分分析相同，并且基本上提供了减少数据维数的方法，因此距离是衡量形状的好方法。

因为我们拥有的训练图像多于尺寸（2400与400），所以不需要为速度做“奇特”线性代数。我们可以直接使用训练数据集的协方差矩阵：

def make_eigenpaws(paw_data):
    """Creates a set of eigenpaws based on paw_data.
    paw_data is a numdata by numdimensions matrix of all of the observations."""
    average_paw = paw_data.mean(axis=0)
    paw_data -= average_paw

    # Determine the eigenvectors of the covariance matrix of the data
    cov = np.cov(paw_data.T)
    eigvals, eigvecs = np.linalg.eig(cov)

    # Sort the eigenvectors by ascending eigenvalue (largest is last)
    eig_idx = np.argsort(eigvals)
    sorted_eigvecs = eigvecs[:,eig_idx]
    sorted_eigvals = eigvals[:,eig_idx]

    # Now choose a cutoff number of eigenvectors to use 
    # (50 seems to work well, but it's arbirtrary...
    num_basis_vecs = 50
    basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]

    return basis_vecs

这些basis_vecs是“本征爪子”。

要使用这些，我们只需将每个爪图像（作为400维矢量，而不是20×20图像）与基本矢量点（即矩阵相乘）。这为我们提供了一个50维向量（每个基本向量一个元素），可用于对图像进行分类。而不是将20×20图像与每个“模板”爪子的20×20图像进行比较，我们将50维变换图像与每个50维变换模板爪进行比较。这对于每个脚趾的确切位置的微小变化等不太敏感，并且基本上将问题的维数减小到仅相关的维数。

基于特征根的爪子分类

现在，我们可以简单地使用每条腿的50维向量和“模板”向量之间的距离来分类哪个爪子是哪个：

codebook = np.load('codebook.npy') # Template vectors for each paw
average_paw = np.load('average_paw.npy')
basis_stds = np.load('basis_stds.npy') # Needed to "whiten" the dataset...
basis_vecs = np.load('basis_vecs.npy')
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
def classify(paw):
    paw = paw.flatten()
    paw -= average_paw
    scores = paw.dot(basis_vecs) / basis_stds
    diff = codebook - scores
    diff *= diff
    diff = np.sqrt(diff.sum(axis=1))
    return paw_code[diff.argmin()]

以下是一些结果：

仍然存在的问题

仍然存在一些问题，特别是对于太小而无法形成清晰脚印的狗……（它对大狗最有效，因为脚趾在传感器的分辨率下更明显地分开了。）而且，此功能无法识别部分脚印系统，而它们可以与基于梯形图案的系统一起使用。

但是，由于本征分析本质上使用距离度量，因此我们可以对两种方式进行分类，当本征分析与“密码本”的最小距离超过某个阈值时，可以退回到基于梯形模式的系统。我还没有实现这个。

ew …好长！我的帽子对Ivo提出了这样一个有趣的问题！

Alright! I’ve finally managed to get something working consistently! This problem pulled me in for several days… Fun stuff! Sorry for the length of this answer, but I need to elaborate a bit on some things… (Though I may set a record for the longest non-spam stackoverflow answer ever!)

As a side note, I’m using the full dataset that Ivo provided a link to in his original question. It’s a series of rar files (one-per-dog) each containing several different experiment runs stored as ascii arrays. Rather than try to copy-paste stand-alone code examples into this question, here’s a bitbucket mercurial repository with full, stand-alone code. You can clone it with

hg clone https://joferkington@bitbucket.org/joferkington/paw-analysis

Overview

There are essentially two ways to approach the problem, as you noted in your question. I’m actually going to use both in different ways.

Use the (temporal and spatial) order of the paw impacts to determine which paw is which.
Try to identify the “pawprint” based purely on its shape.

Basically, the first method works with the dog’s paws follow the trapezoidal-like pattern shown in Ivo’s question above, but fails whenever the paws don’t follow that pattern. It’s fairly easy to programatically detect when it doesn’t work.

Therefore, we can use the measurements where it did work to build up a training dataset (of ~2000 paw impacts from ~30 different dogs) to recognize which paw is which, and the problem reduces to a supervised classification (With some additional wrinkles… Image recognition is a bit harder than a “normal” supervised classification problem).

Pattern Analysis

To elaborate on the first method, when a dog is walking (not running!) normally (which some of these dogs may not be), we expect paws to impact in the order of: Front Left, Hind Right, Front Right, Hind Left, Front Left, etc. The pattern may start with either the front left or front right paw.

If this were always the case, we could simply sort the impacts by initial contact time and use a modulo 4 to group them by paw.

Normal Impact Sequence

However, even when everything is “normal”, this doesn’t work. This is due to the trapezoid-like shape of the pattern. A hind paw spatially falls behind the previous front paw.

Therefore, the hind paw impact after the initial front paw impact often falls off the sensor plate, and isn’t recorded. Similarly, the last paw impact is often not the next paw in the sequence, as the paw impact before it occured off the sensor plate and wasn’t recorded.

Missed Hind Paw

Nonetheless, we can use the shape of the paw impact pattern to determine when this has happened, and whether we’ve started with a left or right front paw. (I’m actually ignoring problems with the last impact here. It’s not too hard to add it, though.)

def group_paws(data_slices, time):   
    # Sort slices by initial contact time
    data_slices.sort(key=lambda s: s[-1].start)

    # Get the centroid for each paw impact...
    paw_coords = []
    for x,y,z in data_slices:
        paw_coords.append([(item.stop + item.start) / 2.0 for item in (x,y)])
    paw_coords = np.array(paw_coords)

    # Make a vector between each sucessive impact...
    dx, dy = np.diff(paw_coords, axis=0).T

    #-- Group paws -------------------------------------------
    paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
    paw_number = np.arange(len(paw_coords))

    # Did we miss the hind paw impact after the first 
    # front paw impact? If so, first dx will be positive...
    if dx[0] > 0: 
        paw_number[1:] += 1

    # Are we starting with the left or right front paw...
    # We assume we're starting with the left, and check dy[0].
    # If dy[0] > 0 (i.e. the next paw impacts to the left), then
    # it's actually the right front paw, instead of the left.
    if dy[0] > 0: # Right front paw impact...
        paw_number += 2

    # Now we can determine the paw with a simple modulo 4..
    paw_codes = paw_number % 4
    paw_labels = [paw_code[code] for code in paw_codes]

    return paw_labels

In spite of all of this, it frequently doesn’t work correctly. Many of the dogs in the full dataset appear to be running, and the paw impacts don’t follow the same temporal order as when the dog is walking. (Or perhaps the dog just has severe hip problems…)

Abnormal Impact Sequence

Fortunately, we can still programatically detect whether or not the paw impacts follow our expected spatial pattern:

def paw_pattern_problems(paw_labels, dx, dy):
    """Check whether or not the label sequence "paw_labels" conforms to our
    expected spatial pattern of paw impacts. "paw_labels" should be a sequence
    of the strings: "LH", "RH", "LF", "RF" corresponding to the different paws"""
    # Check for problems... (This could be written a _lot_ more cleanly...)
    problems = False
    last = paw_labels[0]
    for paw, dy, dx in zip(paw_labels[1:], dy, dx):
        # Going from a left paw to a right, dy should be negative
        if last.startswith('L') and paw.startswith('R') and (dy > 0):
            problems = True
            break
        # Going from a right paw to a left, dy should be positive
        if last.startswith('R') and paw.startswith('L') and (dy < 0):
            problems = True
            break
        # Going from a front paw to a hind paw, dx should be negative
        if last.endswith('F') and paw.endswith('H') and (dx > 0):
            problems = True
            break
        # Going from a hind paw to a front paw, dx should be positive
        if last.endswith('H') and paw.endswith('F') and (dx < 0):
            problems = True
            break
        last = paw
    return problems

Therefore, even though the simple spatial classification doesn’t work all of the time, we can determine when it does work with reasonable confidence.

Training Dataset

From the pattern-based classifications where it worked correctly, we can build up a very large training dataset of correctly classified paws (~2400 paw impacts from 32 different dogs!).

We can now start to look at what an “average” front left, etc, paw looks like.

To do this, we need some sort of “paw metric” that is the same dimensionality for any dog. (In the full dataset, there are both very large and very small dogs!) A paw print from an Irish elkhound will be both much wider and much “heavier” than a paw print from a toy poodle. We need to rescale each paw print so that a) they have the same number of pixels, and b) the pressure values are standardized. To do this, I resampled each paw print onto a 20×20 grid and rescaled the pressure values based on the maximum, mininum, and mean pressure value for the paw impact.

def paw_image(paw):
    from scipy.ndimage import map_coordinates
    ny, nx = paw.shape

    # Trim off any "blank" edges around the paw...
    mask = paw > 0.01 * paw.max()
    y, x = np.mgrid[:ny, :nx]
    ymin, ymax = y[mask].min(), y[mask].max()
    xmin, xmax = x[mask].min(), x[mask].max()

    # Make a 20x20 grid to resample the paw pressure values onto
    numx, numy = 20, 20
    xi = np.linspace(xmin, xmax, numx)
    yi = np.linspace(ymin, ymax, numy)
    xi, yi = np.meshgrid(xi, yi)  

    # Resample the values onto the 20x20 grid
    coords = np.vstack([yi.flatten(), xi.flatten()])
    zi = map_coordinates(paw, coords)
    zi = zi.reshape((numy, numx))

    # Rescale the pressure values
    zi -= zi.min()
    zi /= zi.max()
    zi -= zi.mean() #<- Helps distinguish front from hind paws...
    return zi

After all of this, we can finally take a look at what an average left front, hind right, etc paw looks like. Note that this is averaged across >30 dogs of greatly different sizes, and we seem to be getting consistent results!

Average Paws

However, before we do any analysis on these, we need to subtract the mean (the average paw for all legs of all dogs).

Mean Paw

Now we can analyize the differences from the mean, which are a bit easier to recognize:

Differential Paws

Image-based Paw Recognition

Ok… We finally have a set of patterns that we can begin to try to match the paws against. Each paw can be treated as a 400-dimensional vector (returned by the paw_image function) that can be compared to these four 400-dimensional vectors.

Unfortunately, if we just use a “normal” supervised classification algorithm (i.e. find which of the 4 patterns is closest to a particular paw print using a simple distance), it doesn’t work consistently. In fact, it doesn’t do much better than random chance on the training dataset.

This is a common problem in image recognition. Due to the high dimensionality of the input data, and the somewhat “fuzzy” nature of images (i.e. adjacent pixels have a high covariance), simply looking at the difference of an image from a template image does not give a very good measure of the similarity of their shapes.

Eigenpaws

To get around this we need to build a set of “eigenpaws” (just like “eigenfaces” in facial recognition), and describe each paw print as a combination of these eigenpaws. This is identical to principal components analysis, and basically provides a way to reduce the dimensionality of our data, so that distance is a good measure of shape.

Because we have more training images than dimensions (2400 vs 400), there’s no need to do “fancy” linear algebra for speed. We can work directly with the covariance matrix of the training data set:

def make_eigenpaws(paw_data):
    """Creates a set of eigenpaws based on paw_data.
    paw_data is a numdata by numdimensions matrix of all of the observations."""
    average_paw = paw_data.mean(axis=0)
    paw_data -= average_paw

    # Determine the eigenvectors of the covariance matrix of the data
    cov = np.cov(paw_data.T)
    eigvals, eigvecs = np.linalg.eig(cov)

    # Sort the eigenvectors by ascending eigenvalue (largest is last)
    eig_idx = np.argsort(eigvals)
    sorted_eigvecs = eigvecs[:,eig_idx]
    sorted_eigvals = eigvals[:,eig_idx]

    # Now choose a cutoff number of eigenvectors to use 
    # (50 seems to work well, but it's arbirtrary...
    num_basis_vecs = 50
    basis_vecs = sorted_eigvecs[:,-num_basis_vecs:]

    return basis_vecs

These basis_vecs are the “eigenpaws”.

Eigenpaws

To use these, we simply dot (i.e. matrix multiplication) each paw image (as a 400-dimensional vector, rather than a 20×20 image) with the basis vectors. This gives us a 50-dimensional vector (one element per basis vector) that we can use to classify the image. Instead of comparing a 20×20 image to the 20×20 image of each “template” paw, we compare the 50-dimensional, transformed image to each 50-dimensional transformed template paw. This is much less sensitive to small variations in exactly how each toe is positioned, etc, and basically reduces the dimensionality of the problem to just the relevant dimensions.

Eigenpaw-based Paw Classification

Now we can simply use the distance between the 50-dimensional vectors and the “template” vectors for each leg to classify which paw is which:

codebook = np.load('codebook.npy') # Template vectors for each paw
average_paw = np.load('average_paw.npy')
basis_stds = np.load('basis_stds.npy') # Needed to "whiten" the dataset...
basis_vecs = np.load('basis_vecs.npy')
paw_code = {0:'LF', 1:'RH', 2:'RF', 3:'LH'}
def classify(paw):
    paw = paw.flatten()
    paw -= average_paw
    scores = paw.dot(basis_vecs) / basis_stds
    diff = codebook - scores
    diff *= diff
    diff = np.sqrt(diff.sum(axis=1))
    return paw_code[diff.argmin()]

Here are some of the results: alt text

Remaining Problems

There are still some problems, particularly with dogs too small to make a clear pawprint… (It works best with large dogs, as the toes are more clearly seperated at the sensor’s resolution.) Also, partial pawprints aren’t recognized with this system, while they can be with the trapezoidal-pattern-based system.

However, because the eigenpaw analysis inherently uses a distance metric, we can classify the paws both ways, and fall back to the trapezoidal-pattern-based system when the eigenpaw analysis’s smallest distance from the “codebook” is over some threshold. I haven’t implemented this yet, though.

Phew… That was long! My hat is off to Ivo for having such a fun question!

回答 1

纯粹基于持续时间使用信息，我认为您可以应用运动学建模中的技术。即逆运动学。结合方向，长度，持续时间和总重量，可以提供一定程度的周期性，我希望这可能是尝试解决“脚掌分类”问题的第一步。

所有这些数据都可以用于创建有界多边形（或元组）的列表，您可以使用这些列表按步长然后按爪子[index]进行排序。

Using the information purely based on duration, I think you could apply techniques from modeling kinematics; namely Inverse Kinematics. Combined with orientation, length, duration, and total weight it gives some level of periodicity which, I would hope could be the first step trying to solve your “sorting of paws” problem.

All that data could be used to create a list of bounded polygons (or tuples), which you could use to sort by step size then by paw-ness [index].

回答 2

您可以让运行测试的技术人员手动输入第一个（或前两个）爪子吗？该过程可能是：

向技术人员显示步骤图像的顺序，并要求他们注释第一个爪子。
根据第一个爪子标记其他爪子，并允许技术人员进行更正或重新运行测试。这允许la脚或三足狗。

Can you have the technician running the test manually enter the first paw (or first two)? The process might be:

Show tech the order of steps image and require them to annotate the first paw.
Label the other paws based on the first paw and allow the tech to make corrections or re-run the test. This allows for lame or 3-legged dogs.

知识问答

获取图像大小而无需将图像加载到内存中

2021年8月11日 Python实用宝典

问题：获取图像大小而无需将图像加载到内存中

我了解您可以通过以下方式使用PIL获得图像尺寸

from PIL import Image
im = Image.open(image_filename)
width, height = im.size

但是，我想获取图像的宽度和高度，而不必将图像加载到内存中。那可能吗？我只做图像尺寸的统计，并不关心图像内容。我只是想加快处理速度。

I understand that you can get the image size using PIL in the following fashion

from PIL import Image
im = Image.open(image_filename)
width, height = im.size

However, I would like to get the image width and height without having to load the image in memory. Is that possible? I am only doing statistics on image sizes and dont care for the image contents. I just want to make my processing faster.

回答 0

正如注释所暗示的那样，PIL在调用时不会将图像加载到内存中.open。查看的文档PIL 1.1.7，文档字符串.open说：

def open(fp, mode="r"):
    "Open an image file, without loading the raster data"

源代码中有一些文件操作，例如：

 ...
 prefix = fp.read(16)
 ...
 fp.seek(0)
 ...

但是这些几乎不构成读取整个文件。实际上，.open仅在成功时返回文件对象和文件名。另外文档说：

打开（文件，模式=“ r”）

打开并标识给定的图像文件。

这是一个懒惰的操作；此功能可识别文件，但在尝试处理数据（或调用load方法）之前，不会从文件中读取实际图像数据。

深入研究，我们看到.open调用_open是特定于图像格式的重载。每个实现_open都可以在新文件中找到，例如。.jpeg文件位于中JpegImagePlugin.py。让我们深入研究一下。

这里的事情似乎有些棘手，其中有一个无限循环，当找到jpeg标记时，该循环就会中断：

    while True:

        s = s + self.fp.read(1)
        i = i16(s)

        if i in MARKER:
            name, description, handler = MARKER[i]
            # print hex(i), name, description
            if handler is not None:
                handler(self, i)
            if i == 0xFFDA: # start of scan
                rawmode = self.mode
                if self.mode == "CMYK":
                    rawmode = "CMYK;I" # assume adobe conventions
                self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))]
                # self.__offset = self.fp.tell()
                break
            s = self.fp.read(1)
        elif i == 0 or i == 65535:
            # padded marker or junk; move on
            s = "\xff"
        else:
            raise SyntaxError("no marker found")

看起来如果文件格式错误，它可以读取整个文件。但是，如果读取信息标记“确定”，则应尽早爆发。该功能handler最终设置self.size图像的尺寸。

As the comments allude, PIL does not load the image into memory when calling .open. Looking at the docs of PIL 1.1.7, the docstring for .open says:

def open(fp, mode="r"):
    "Open an image file, without loading the raster data"

There are a few file operations in the source like:

 ...
 prefix = fp.read(16)
 ...
 fp.seek(0)
 ...

but these hardly constitute reading the whole file. In fact .open simply returns a file object and the filename on success. In addition the docs say:

open(file, mode=”r”)

Opens and identifies the given image file.

This is a lazy operation; this function identifies the file, but the actual image data is not read from the file until you try to process the data (or call the load method).

Digging deeper, we see that .open calls _open which is a image-format specific overload. Each of the implementations to _open can be found in a new file, eg. .jpeg files are in JpegImagePlugin.py. Let’s look at that one in depth.

Here things seem to get a bit tricky, in it there is an infinite loop that gets broken out of when the jpeg marker is found:

    while True:

        s = s + self.fp.read(1)
        i = i16(s)

        if i in MARKER:
            name, description, handler = MARKER[i]
            # print hex(i), name, description
            if handler is not None:
                handler(self, i)
            if i == 0xFFDA: # start of scan
                rawmode = self.mode
                if self.mode == "CMYK":
                    rawmode = "CMYK;I" # assume adobe conventions
                self.tile = [("jpeg", (0,0) + self.size, 0, (rawmode, ""))]
                # self.__offset = self.fp.tell()
                break
            s = self.fp.read(1)
        elif i == 0 or i == 65535:
            # padded marker or junk; move on
            s = "\xff"
        else:
            raise SyntaxError("no marker found")

Which looks like it could read the whole file if it was malformed. If it reads the info marker OK however, it should break out early. The function handler ultimately sets self.size which are the dimensions of the image.

回答 1

如果您不关心图像内容，则PIL可能是一个过大的选择。

我建议解析python magic模块的输出：

>>> t = magic.from_file('teste.png')
>>> t
'PNG image data, 782 x 602, 8-bit/color RGBA, non-interlaced'
>>> re.search('(\d+) x (\d+)', t).groups()
('782', '602')

这是围绕libmagic的包装，该包装读取尽可能少的字节以标识文件类型签名。

[更新]

不幸的是，嗯，当应用于jpeg时，上面给出的是“’JPEG图像数据，EXIF标准2.21’”。没有图像尺寸！–亚历克斯·弗林特

似乎jpeg具有抗魔性。:-)

我可以看到原因：为了获得JPEG文件的图像尺寸，您可能需要读取比libmagic喜欢读取的字节更多的字节。

卷起袖子，附带这个未经测试的代码段（从GitHub获取），不需要第三方模块。

#-------------------------------------------------------------------------------
# Name:        get_image_size
# Purpose:     extract image dimensions given a file path using just
#              core modules
#
# Author:      Paulo Scardine (based on code from Emmanuel VAÏSSE)
#
# Created:     26/09/2013
# Copyright:   (c) Paulo Scardine 2013
# Licence:     MIT
#-------------------------------------------------------------------------------
#!/usr/bin/env python
import os
import struct

class UnknownImageFormat(Exception):
    pass

def get_image_size(file_path):
    """
    Return (width, height) for a given img file content - no external
    dependencies except the os and struct modules from core
    """
    size = os.path.getsize(file_path)

    with open(file_path) as input:
        height = -1
        width = -1
        data = input.read(25)

        if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'):
            # GIFs
            w, h = struct.unpack("<HH", data[6:10])
            width = int(w)
            height = int(h)
        elif ((size >= 24) and data.startswith('\211PNG\r\n\032\n')
              and (data[12:16] == 'IHDR')):
            # PNGs
            w, h = struct.unpack(">LL", data[16:24])
            width = int(w)
            height = int(h)
        elif (size >= 16) and data.startswith('\211PNG\r\n\032\n'):
            # older PNGs?
            w, h = struct.unpack(">LL", data[8:16])
            width = int(w)
            height = int(h)
        elif (size >= 2) and data.startswith('\377\330'):
            # JPEG
            msg = " raised while trying to decode as JPEG."
            input.seek(0)
            input.read(2)
            b = input.read(1)
            try:
                while (b and ord(b) != 0xDA):
                    while (ord(b) != 0xFF): b = input.read(1)
                    while (ord(b) == 0xFF): b = input.read(1)
                    if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                        input.read(3)
                        h, w = struct.unpack(">HH", input.read(4))
                        break
                    else:
                        input.read(int(struct.unpack(">H", input.read(2))[0])-2)
                    b = input.read(1)
                width = int(w)
                height = int(h)
            except struct.error:
                raise UnknownImageFormat("StructError" + msg)
            except ValueError:
                raise UnknownImageFormat("ValueError" + msg)
            except Exception as e:
                raise UnknownImageFormat(e.__class__.__name__ + msg)
        else:
            raise UnknownImageFormat(
                "Sorry, don't know how to get information from this file."
            )

    return width, height

[2019年更新]

检验Rust的实现：https : //github.com/scardine/imsz

If you don’t care about the image contents, PIL is probably an overkill.

I suggest parsing the output of the python magic module:

>>> t = magic.from_file('teste.png')
>>> t
'PNG image data, 782 x 602, 8-bit/color RGBA, non-interlaced'
>>> re.search('(\d+) x (\d+)', t).groups()
('782', '602')

This is a wrapper around libmagic which read as few bytes as possible in order to identify a file type signature.

Relevant version of script:

https://raw.githubusercontent.com/scardine/image_size/master/get_image_size.py

[update]

Hmmm, unfortunately, when applied to jpegs, the above gives “‘JPEG image data, EXIF standard 2.21′”. No image size! – Alex Flint

Seems like jpegs are magic-resistant. :-)

I can see why: in order to get the image dimensions for JPEG files, you may have to read more bytes than libmagic likes to read.

Rolled up my sleeves and came with this very untested snippet (get it from GitHub) that requires no third-party modules.

Look, Ma! No deps!

#-------------------------------------------------------------------------------
# Name:        get_image_size
# Purpose:     extract image dimensions given a file path using just
#              core modules
#
# Author:      Paulo Scardine (based on code from Emmanuel VAÏSSE)
#
# Created:     26/09/2013
# Copyright:   (c) Paulo Scardine 2013
# Licence:     MIT
#-------------------------------------------------------------------------------
#!/usr/bin/env python
import os
import struct

class UnknownImageFormat(Exception):
    pass

def get_image_size(file_path):
    """
    Return (width, height) for a given img file content - no external
    dependencies except the os and struct modules from core
    """
    size = os.path.getsize(file_path)

    with open(file_path) as input:
        height = -1
        width = -1
        data = input.read(25)

        if (size >= 10) and data[:6] in ('GIF87a', 'GIF89a'):
            # GIFs
            w, h = struct.unpack("<HH", data[6:10])
            width = int(w)
            height = int(h)
        elif ((size >= 24) and data.startswith('\211PNG\r\n\032\n')
              and (data[12:16] == 'IHDR')):
            # PNGs
            w, h = struct.unpack(">LL", data[16:24])
            width = int(w)
            height = int(h)
        elif (size >= 16) and data.startswith('\211PNG\r\n\032\n'):
            # older PNGs?
            w, h = struct.unpack(">LL", data[8:16])
            width = int(w)
            height = int(h)
        elif (size >= 2) and data.startswith('\377\330'):
            # JPEG
            msg = " raised while trying to decode as JPEG."
            input.seek(0)
            input.read(2)
            b = input.read(1)
            try:
                while (b and ord(b) != 0xDA):
                    while (ord(b) != 0xFF): b = input.read(1)
                    while (ord(b) == 0xFF): b = input.read(1)
                    if (ord(b) >= 0xC0 and ord(b) <= 0xC3):
                        input.read(3)
                        h, w = struct.unpack(">HH", input.read(4))
                        break
                    else:
                        input.read(int(struct.unpack(">H", input.read(2))[0])-2)
                    b = input.read(1)
                width = int(w)
                height = int(h)
            except struct.error:
                raise UnknownImageFormat("StructError" + msg)
            except ValueError:
                raise UnknownImageFormat("ValueError" + msg)
            except Exception as e:
                raise UnknownImageFormat(e.__class__.__name__ + msg)
        else:
            raise UnknownImageFormat(
                "Sorry, don't know how to get information from this file."
            )

    return width, height

[update 2019]

Check out a Rust implementation: https://github.com/scardine/imsz

回答 2

在pypi上有一个名为的程序包imagesize目前对我有用，尽管它看起来不太活跃。

安装：

pip install imagesize

用法：

import imagesize

width, height = imagesize.get("test.png")
print(width, height)

主页：https：//github.com/shibukawa/imagesize_py

PyPi：https：//pypi.org/project/imagesize/

There is a package on pypi called imagesize that currently works for me, although it doesn’t look like it is very active.

Install:

pip install imagesize

Usage:

import imagesize

width, height = imagesize.get("test.png")
print(width, height)

Homepage: https://github.com/shibukawa/imagesize_py

PyPi: https://pypi.org/project/imagesize/

回答 3

我经常在Internet上获取图像大小。当然，您不能下载图像然后加载它以解析信息。太浪费时间了。我的方法是将大块数据馈送到图像容器，并测试它是否每次都能解析图像。当我得到我想要的信息时，停止循环。

我提取了代码的核心，并对其进行了修改以解析本地文件。

from PIL import ImageFile

ImPar=ImageFile.Parser()
with open(r"D:\testpic\test.jpg", "rb") as f:
    ImPar=ImageFile.Parser()
    chunk = f.read(2048)
    count=2048
    while chunk != "":
        ImPar.feed(chunk)
        if ImPar.image:
            break
        chunk = f.read(2048)
        count+=2048
    print(ImPar.image.size)
    print(count)

输出：

(2240, 1488)
38912

实际文件大小为1,543,580字节，您仅读取38,912字节即可获取图像大小。希望这会有所帮助。

I often fetch image sizes on the Internet. Of course, you can’t download the image and then load it to parse the information. It’s too time consuming. My method is to feed chunks to an image container and test whether it can parse the image every time. Stop the loop when I get the information I want.

I extracted the core of my code and modified it to parse local files.

from PIL import ImageFile

ImPar=ImageFile.Parser()
with open(r"D:\testpic\test.jpg", "rb") as f:
    ImPar=ImageFile.Parser()
    chunk = f.read(2048)
    count=2048
    while chunk != "":
        ImPar.feed(chunk)
        if ImPar.image:
            break
        chunk = f.read(2048)
        count+=2048
    print(ImPar.image.size)
    print(count)

Output:

(2240, 1488)
38912

The actual file size is 1,543,580 bytes and you only read 38,912 bytes to get the image size. Hope this will help.

回答 4

在Unix系统上执行此操作的另一种简短方法。这取决于file我不确定所有系统上的输出是否都标准化。可能不应该在生产代码中使用它。此外，大多数JPEG不会报告图像尺寸。

import subprocess, re
image_size = list(map(int, re.findall('(\d+)x(\d+)', subprocess.getoutput("file " + filename))[-1]))

Another short way of doing it on Unix systems. It depends on the output of file which I am not sure is standardized on all systems. This should probably not be used in production code. Moreover most JPEGs don’t report the image size.

import subprocess, re
image_size = list(map(int, re.findall('(\d+)x(\d+)', subprocess.getoutput("file " + filename))[-1]))

回答 5

这个答案有另一个好的解决方法，但是缺少pgm格式。这个答案解决了pgm。然后我添加了bmp。

代码如下

import struct, imghdr, re, magic

def get_image_size(fname):
    '''Determine the image type of fhandle and return its size.
    from draco'''
    with open(fname, 'rb') as fhandle:
        head = fhandle.read(32)
        if len(head) != 32:
            return
        if imghdr.what(fname) == 'png':
            check = struct.unpack('>i', head[4:8])[0]
            if check != 0x0d0a1a0a:
                return
            width, height = struct.unpack('>ii', head[16:24])
        elif imghdr.what(fname) == 'gif':
            width, height = struct.unpack('<HH', head[6:10])
        elif imghdr.what(fname) == 'jpeg':
            try:
                fhandle.seek(0) # Read 0xff next
                size = 2
                ftype = 0
                while not 0xc0 <= ftype <= 0xcf:
                    fhandle.seek(size, 1)
                    byte = fhandle.read(1)
                    while ord(byte) == 0xff:
                        byte = fhandle.read(1)
                    ftype = ord(byte)
                    size = struct.unpack('>H', fhandle.read(2))[0] - 2
                # We are at a SOFn block
                fhandle.seek(1, 1)  # Skip `precision' byte.
                height, width = struct.unpack('>HH', fhandle.read(4))
            except Exception: #IGNORE:W0703
                return
        elif imghdr.what(fname) == 'pgm':
            header, width, height, maxval = re.search(
                b"(^P5\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", head).groups()
            width = int(width)
            height = int(height)
        elif imghdr.what(fname) == 'bmp':
            _, width, height, depth = re.search(
                b"((\d+)\sx\s"
                b"(\d+)\sx\s"
                b"(\d+))", str).groups()
            width = int(width)
            height = int(height)
        else:
            return
        return width, height

This answer has an another good resolution, but missing the pgm format. This answer has resolved the pgm. And I add the bmp.

Codes is below

import struct, imghdr, re, magic

def get_image_size(fname):
    '''Determine the image type of fhandle and return its size.
    from draco'''
    with open(fname, 'rb') as fhandle:
        head = fhandle.read(32)
        if len(head) != 32:
            return
        if imghdr.what(fname) == 'png':
            check = struct.unpack('>i', head[4:8])[0]
            if check != 0x0d0a1a0a:
                return
            width, height = struct.unpack('>ii', head[16:24])
        elif imghdr.what(fname) == 'gif':
            width, height = struct.unpack('<HH', head[6:10])
        elif imghdr.what(fname) == 'jpeg':
            try:
                fhandle.seek(0) # Read 0xff next
                size = 2
                ftype = 0
                while not 0xc0 <= ftype <= 0xcf:
                    fhandle.seek(size, 1)
                    byte = fhandle.read(1)
                    while ord(byte) == 0xff:
                        byte = fhandle.read(1)
                    ftype = ord(byte)
                    size = struct.unpack('>H', fhandle.read(2))[0] - 2
                # We are at a SOFn block
                fhandle.seek(1, 1)  # Skip `precision' byte.
                height, width = struct.unpack('>HH', fhandle.read(4))
            except Exception: #IGNORE:W0703
                return
        elif imghdr.what(fname) == 'pgm':
            header, width, height, maxval = re.search(
                b"(^P5\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n])*"
                b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", head).groups()
            width = int(width)
            height = int(height)
        elif imghdr.what(fname) == 'bmp':
            _, width, height, depth = re.search(
                b"((\d+)\sx\s"
                b"(\d+)\sx\s"
                b"(\d+))", str).groups()
            width = int(width)
            height = int(height)
        else:
            return
        return width, height

知识问答

cv2.imshow命令在opencv-python中无法正常工作

2021年8月11日 Python实用宝典

问题：cv2.imshow命令在opencv-python中无法正常工作

我正在使用opencv 2.4.2，python 2.7下面的简单代码创建了一个正确名称的窗口，但其内容为空白，并且不显示图像：

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow',img)

有谁知道这个问题吗？

I’m using opencv 2.4.2, python 2.7 The following simple code created a window of the correct name, but its content is just blank and doesn’t show the image:

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow',img)

does anyone knows about this issue?

回答 0

imshow()仅适用于waitKey()：

import cv2
img = cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow', img)
cv2.waitKey()

（更新窗口所需的整个消息循环都隐藏在其中。）

imshow() only works with waitKey():

import cv2
img = cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow', img)
cv2.waitKey()

(The whole message-loop necessary for updating the window is hidden in there.)

回答 1

我在这里找到了最适合我的答案：http : //txt.arboreus.com/2012/07/11/highgui-opencv-window-from-ipython.html

如果您运行交互式ipython会话，并希望使用highgui窗口，请首先执行cv2.startWindowThread（）。

详细而言：HighGUI是简化的界面，用于显示来自OpenCV代码的图像和视频。它应该像这样简单：

import cv2
img = cv2.imread("image.jpg")
cv2.startWindowThread()
cv2.namedWindow("preview")
cv2.imshow("preview", img)

I found the answer that worked for me here: http://txt.arboreus.com/2012/07/11/highgui-opencv-window-from-ipython.html

If you run an interactive ipython session, and want to use highgui windows, do cv2.startWindowThread() first.

In detail: HighGUI is a simplified interface to display images and video from OpenCV code. It should be as easy as:

import cv2
img = cv2.imread("image.jpg")
cv2.startWindowThread()
cv2.namedWindow("preview")
cv2.imshow("preview", img)

回答 2

您必须cv2.waitKey(0)在之后使用cv2.imshow("window",img)。只有这样，它才能起作用。

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('Window',img)
cv2.waitKey(0)

You must use cv2.waitKey(0) after cv2.imshow("window",img). Only then will it work.

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('Window',img)
cv2.waitKey(0)

回答 3

如果您在Python控制台中运行，请执行以下操作：

img = cv2.imread("yourimage.jpg")

cv2.imshow("img", img); cv2.waitKey(0); cv2.destroyAllWindows()

然后，如果您按Enter图像，它将成功关闭图像，您可以继续运行其他命令。

If you are running inside a Python console, do this:

img = cv2.imread("yourimage.jpg")

cv2.imshow("img", img); cv2.waitKey(0); cv2.destroyAllWindows()

Then if you press Enter on the image, it will successfully close the image and you can proceed running other commands.

回答 4

我遇到了同样的问题。我试图从IDLE中读取图像，并尝试使用显示它cv2.imshow()，但是显示窗口冻结，并且显示pythonw.exe在尝试关闭窗口时无响应。

下面的帖子给出了可能发生这种情况的可能解释

pythonw.exe没有响应

“ 基本上，不要从IDLE中执行此操作。编写脚本并从shell或脚本（如果在Windows中）直接运行它，方法是使用.pyw扩展名命名并双击它。显然，IDLE自己的事件之间存在冲突循环和来自GUI工具包的循环。 ”

当我imshow()在脚本中使用并执行它，而不是直接在IDLE上运行它时，它就起作用了。

I faced the same issue. I tried to read an image from IDLE and tried to display it using cv2.imshow(), but the display window freezes and shows pythonw.exe is not responding when trying to close the window.

The post below gives a possible explanation for why this is happening

pythonw.exe is not responding

“Basically, don’t do this from IDLE. Write a script and run it from the shell or the script directly if in windows, by naming it with a .pyw extension and double clicking it. There is apparently a conflict between IDLE’s own event loop and the ones from GUI toolkits.“

When I used imshow() in a script and execute it rather than running it directly over IDLE, it worked.

回答 5

最后添加cv2.waitKey(0)。

add cv2.waitKey(0) in the end.

回答 6

对我来说，数字大于0的waitKey（）有效

    cv2.waitKey(1)

For me waitKey() with number greater than 0 worked

    cv2.waitKey(1)

回答 7

您在此线程中的某处拥有了所有必要的部分：

if cv2.waitKey(): cv2.destroyAllWindows()

在IDLE中对我来说效果很好。

You’ve got all the necessary pieces somewhere in this thread:

if cv2.waitKey(): cv2.destroyAllWindows()

works fine for me in IDLE.

回答 8

如果您没有使它起作用，那么最好放

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('Window',img)
cv2.waitKey(0)

成一个文件并运行它。

If you have not made this working, you better put

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('Window',img)
cv2.waitKey(0)

into one file and run it.

回答 9

之后不需要任何其他方法waitKey(0)（回复上面的代码）

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow',img)
cv2.waitKey(0)

出现窗口->单击窗口，然后单击Enter。窗口将关闭。

Doesn’t need any additional methods after waitKey(0) (reply for above code)

import cv2
img=cv2.imread('C:/Python27/03323_HD.jpg')
cv2.imshow('ImageWindow',img)
cv2.waitKey(0)

Window appears -> Click on the Window & Click on Enter. Window will close.

回答 10

如果选择使用“ cv2.waitKey（0）”，请确保已编写“ cv2.waitKey（0）”而不是“ cv2.waitkey（0）”，因为小写的“ k”也可能冻结程序。

If you choose to use “cv2.waitKey(0)”, be sure that you have written “cv2.waitKey(0)” instead of “cv2.waitkey(0)”, because that lowercase “k” might freeze your program too.

回答 11

我也遇到-215错误。我以为imshow是问题所在，但是当我将imread更改为读取不存在的文件时，那里没有错误。因此，我将图像文件放在工作文件夹中，并添加了cv2.waitKey（0）并成功运行。

I also had a -215 error. I thought imshow was the issue, but when I changed imread to read in a non-existent file I got no error there. So I put the image file in the working folder and added cv2.waitKey(0) and it worked.

回答 12

错误：函数imshow中的（-215）size.width> 0 && size.height> 0

因为找不到图像，所以产生此错误。因此，这不是imshow函数的错误。

error: (-215) size.width>0 && size.height>0 in function imshow

This error is produced because the image is not found. So it’s not an error of imshow function.

回答 13

我遇到了相同的215错误，可以通过提供图像的完整路径来解决该错误，例如在C：\ Folder1 \ Folder2 \ filename.ext中

I had the same 215 error, which I was able to overcome by giving the full path to the image, as in, C:\Folder1\Folder2\filename.ext

知识问答

将图像从PIL转换为openCV格式

2021年8月8日 Python实用宝典

问题：将图像从PIL转换为openCV格式

我正在尝试将图像从转换PIL为OpenCV格式。我正在使用OpenCV 2.4.3。这是到目前为止我一直尝试的。

>>> from PIL import Image
>>> import cv2 as cv
>>> pimg = Image.open('D:\\traffic.jpg')                           #PIL Image
>>> cimg = cv.cv.CreateImageHeader(pimg.size,cv.IPL_DEPTH_8U,3)    #CV Image
>>> cv.cv.SetData(cimg,pimg.tostring())
>>> cv.cv.NamedWindow('cimg')
>>> cv.cv.ShowImage('cimg',cimg)
>>> cv.cv.WaitKey()

但我认为图像未转换为CV格式。窗口向我显示了一个大的棕色图像。将图像从转换PIL为CV格式时，我在哪里出错？

另外，为什么我需要输入cv.cv访问功能？

I’m trying to convert image from PIL to OpenCV format. I’m using OpenCV 2.4.3. here is what I’ve attempted till now.

>>> from PIL import Image
>>> import cv2 as cv
>>> pimg = Image.open('D:\\traffic.jpg')                           #PIL Image
>>> cimg = cv.cv.CreateImageHeader(pimg.size,cv.IPL_DEPTH_8U,3)    #CV Image
>>> cv.cv.SetData(cimg,pimg.tostring())
>>> cv.cv.NamedWindow('cimg')
>>> cv.cv.ShowImage('cimg',cimg)
>>> cv.cv.WaitKey()

But I think the image is not getting converted to CV format. The Window shows me a large brown image. Where am I going wrong in Converting image from PIL to CV format?

Also , why do i need to type cv.cv to access functions?

回答 0

用这个：

pil_image = PIL.Image.open('Image.jpg').convert('RGB') 
open_cv_image = numpy.array(pil_image) 
# Convert RGB to BGR 
open_cv_image = open_cv_image[:, :, ::-1].copy()

use this:

pil_image = PIL.Image.open('Image.jpg').convert('RGB') 
open_cv_image = numpy.array(pil_image) 
# Convert RGB to BGR 
open_cv_image = open_cv_image[:, :, ::-1].copy()

回答 1

这是我能找到的最短版本，可以保存/隐藏额外的转换：

pil_image = PIL.Image.open('image.jpg')
opencvImage = cv2.cvtColor(numpy.array(pil_image), cv2.COLOR_RGB2BGR)

如果从URL读取文件：

import cStringIO
import urllib
file = cStringIO.StringIO(urllib.urlopen(r'http://stackoverflow.com/a_nice_image.jpg').read())
pil_image = PIL.Image.open(file)
opencvImage = cv2.cvtColor(numpy.array(pil_image), cv2.COLOR_RGB2BGR)

This is the shortest version I could find,saving/hiding an extra conversion:

pil_image = PIL.Image.open('image.jpg')
opencvImage = cv2.cvtColor(numpy.array(pil_image), cv2.COLOR_RGB2BGR)

If reading a file from a URL:

import cStringIO
import urllib
file = cStringIO.StringIO(urllib.urlopen(r'http://stackoverflow.com/a_nice_image.jpg').read())
pil_image = PIL.Image.open(file)
opencvImage = cv2.cvtColor(numpy.array(pil_image), cv2.COLOR_RGB2BGR)

知识问答

如何使用OpenCV2.0和Python2.6调整图像大小

2021年8月4日 Python实用宝典

问题：如何使用OpenCV2.0和Python2.6调整图像大小

我想使用OpenCV2.0和Python2.6显示调整大小的图像。我在http://opencv.willowgarage.com/documentation/python/cookbook.html上使用并采用了该示例，但是不幸的是，该代码是针对OpenCV2.1的，并且似乎不适用于2.0。这是我的代码：

import os, glob
import cv

ulpath = "exampleshq/"

for infile in glob.glob( os.path.join(ulpath, "*.jpg") ):
    im = cv.LoadImage(infile)
    thumbnail = cv.CreateMat(im.rows/10, im.cols/10, cv.CV_8UC3)
    cv.Resize(im, thumbnail)
    cv.NamedWindow(infile)
    cv.ShowImage(infile, thumbnail)
    cv.WaitKey(0)
    cv.DestroyWindow(name)

由于我不能使用

cv.LoadImageM

我用了

cv.LoadImage

而是在其他应用程序中没有问题。但是，cv.iplimage没有属性行，列或大小。谁能给我一个提示，如何解决这个问题？谢谢。

I want to use OpenCV2.0 and Python2.6 to show resized images. I used and adopted this example but unfortunately, this code is for OpenCV2.1 and does not seem to be working on 2.0. Here my code:

import os, glob
import cv

ulpath = "exampleshq/"

for infile in glob.glob( os.path.join(ulpath, "*.jpg") ):
    im = cv.LoadImage(infile)
    thumbnail = cv.CreateMat(im.rows/10, im.cols/10, cv.CV_8UC3)
    cv.Resize(im, thumbnail)
    cv.NamedWindow(infile)
    cv.ShowImage(infile, thumbnail)
    cv.WaitKey(0)
    cv.DestroyWindow(name)

Since I cannot use

cv.LoadImageM

I used

cv.LoadImage

instead, which was no problem in other applications. Nevertheless, cv.iplimage has no attribute rows, cols or size. Can anyone give me a hint, how to solve this problem?

回答 0

如果要使用CV2，则需要使用该resize功能。

例如，这会将两个轴的大小调整一半：

small = cv2.resize(image, (0,0), fx=0.5, fy=0.5)

并将图像调整为100列（宽度）和50行（高度）：

resized_image = cv2.resize(image, (100, 50))

另一种选择是使用scipy模块，方法是：

small = scipy.misc.imresize(image, 0.5)

显然，您可以在这些函数的文档中阅读更多选项（cv2.resize，scipy.misc.imresize）。

更新：
根据SciPy文档：

imresize被弃用的SciPy的1.0.0，并且将在1.2.0被删除。
使用skimage.transform.resize代替。

请注意，如果您要按一个大小调整大小，则可能确实需要skimage.transform.rescale。

If you wish to use CV2, you need to use the resize function.

For example, this will resize both axes by half:

small = cv2.resize(image, (0,0), fx=0.5, fy=0.5)

and this will resize the image to have 100 cols (width) and 50 rows (height):

resized_image = cv2.resize(image, (100, 50))

Another option is to use scipy module, by using:

small = scipy.misc.imresize(image, 0.5)

There are obviously more options you can read in the documentation of those functions (cv2.resize, scipy.misc.imresize).

Update:
According to the SciPy documentation:

imresize is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use skimage.transform.resize instead.

Note that if you’re looking to resize by a factor, you may actually want skimage.transform.rescale.

回答 1

示例将图像尺寸加倍

调整图像大小有两种方法。可以指定新的大小：

手动

height, width = src.shape[:2]

dst = cv2.resize(src, (2*width, 2*height), interpolation = cv2.INTER_CUBIC)
通过比例因子。

dst = cv2.resize(src, None, fx = 2, fy = 2, interpolation = cv2.INTER_CUBIC)，其中fx是沿水平轴的缩放比例，fy是沿垂直轴的缩放比例。

要缩小图像，通常使用INTER_AREA插值时效果最佳，而要放大图像，通常使用INTER_CUBIC（速度慢）或INTER_LINEAR（速度更快，但仍然可以看到）来最好。

示例缩小图像以适合最大高度/宽度（保持宽高比）

import cv2

img = cv2.imread('YOUR_PATH_TO_IMG')

height, width = img.shape[:2]
max_height = 300
max_width = 300

# only shrink if img is bigger than required
if max_height < height or max_width < width:
    # get scaling factor
    scaling_factor = max_height / float(height)
    if max_width/float(width) < scaling_factor:
        scaling_factor = max_width / float(width)
    # resize image
    img = cv2.resize(img, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)

cv2.imshow("Shrinked image", img)
key = cv2.waitKey()

在cv2中使用代码

import cv2 as cv

im = cv.imread(path)

height, width = im.shape[:2]

thumbnail = cv.resize(im, (round(width / 10), round(height / 10)), interpolation=cv.INTER_AREA)

cv.imshow('exampleshq', thumbnail)
cv.waitKey(0)
cv.destroyAllWindows()

Example doubling the image size

There are two ways to resize an image. The new size can be specified:

Manually;

height, width = src.shape[:2]

dst = cv2.resize(src, (2*width, 2*height), interpolation = cv2.INTER_CUBIC)
By a scaling factor.

dst = cv2.resize(src, None, fx = 2, fy = 2, interpolation = cv2.INTER_CUBIC), where fx is the scaling factor along the horizontal axis and fy along the vertical axis.

To shrink an image, it will generally look best with INTER_AREA interpolation, whereas to enlarge an image, it will generally look best with INTER_CUBIC (slow) or INTER_LINEAR (faster but still looks OK).

Example shrink image to fit a max height/width (keeping aspect ratio)

import cv2

img = cv2.imread('YOUR_PATH_TO_IMG')

height, width = img.shape[:2]
max_height = 300
max_width = 300

# only shrink if img is bigger than required
if max_height < height or max_width < width:
    # get scaling factor
    scaling_factor = max_height / float(height)
    if max_width/float(width) < scaling_factor:
        scaling_factor = max_width / float(width)
    # resize image
    img = cv2.resize(img, None, fx=scaling_factor, fy=scaling_factor, interpolation=cv2.INTER_AREA)

cv2.imshow("Shrinked image", img)
key = cv2.waitKey()

Using your code with cv2

import cv2 as cv

im = cv.imread(path)

height, width = im.shape[:2]

thumbnail = cv.resize(im, (round(width / 10), round(height / 10)), interpolation=cv.INTER_AREA)

cv.imshow('exampleshq', thumbnail)
cv.waitKey(0)
cv.destroyAllWindows()

回答 2

您可以使用GetSize函数获取这些信息，cv.GetSize（im）将返回一个具有图像宽度和高度的元组。您还可以使用im.depth和img.nChan获得更多信息。

为了调整图像的大小，我将使用略有不同的过程，使用另一个图像而不是矩阵。最好尝试使用相同类型的数据：

size = cv.GetSize(im)
thumbnail = cv.CreateImage( ( size[0] / 10, size[1] / 10), im.depth, im.nChannels)
cv.Resize(im, thumbnail)

希望这可以帮助 ;）

朱利安

You could use the GetSize function to get those information, cv.GetSize(im) would return a tuple with the width and height of the image. You can also use im.depth and img.nChan to get some more information.

And to resize an image, I would use a slightly different process, with another image instead of a matrix. It is better to try to work with the same type of data:

size = cv.GetSize(im)
thumbnail = cv.CreateImage( ( size[0] / 10, size[1] / 10), im.depth, im.nChannels)
cv.Resize(im, thumbnail)

Hope this helps ;)

Julien

回答 3

def rescale_by_height(image, target_height, method=cv2.INTER_LANCZOS4):
    """Rescale `image` to `target_height` (preserving aspect ratio)."""
    w = int(round(target_height * image.shape[1] / image.shape[0]))
    return cv2.resize(image, (w, target_height), interpolation=method)

def rescale_by_width(image, target_width, method=cv2.INTER_LANCZOS4):
    """Rescale `image` to `target_width` (preserving aspect ratio)."""
    h = int(round(target_width * image.shape[0] / image.shape[1]))
    return cv2.resize(image, (target_width, h), interpolation=method)

def rescale_by_height(image, target_height, method=cv2.INTER_LANCZOS4):
    """Rescale `image` to `target_height` (preserving aspect ratio)."""
    w = int(round(target_height * image.shape[1] / image.shape[0]))
    return cv2.resize(image, (w, target_height), interpolation=method)

def rescale_by_width(image, target_width, method=cv2.INTER_LANCZOS4):
    """Rescale `image` to `target_width` (preserving aspect ratio)."""
    h = int(round(target_width * image.shape[0] / image.shape[1]))
    return cv2.resize(image, (target_width, h), interpolation=method)

回答 4

这是一个在保持宽高比的同时按所需宽度或高度按比例缩放图像的功能

# Resizes a image and maintains aspect ratio
def maintain_aspect_ratio_resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    # Grab the image size and initialize dimensions
    dim = None
    (h, w) = image.shape[:2]

    # Return original image if no need to resize
    if width is None and height is None:
        return image

    # We are resizing height if width is none
    if width is None:
        # Calculate the ratio of the height and construct the dimensions
        r = height / float(h)
        dim = (int(w * r), height)
    # We are resizing width if height is none
    else:
        # Calculate the ratio of the width and construct the dimensions
        r = width / float(w)
        dim = (width, int(h * r))

    # Return the resized image
    return cv2.resize(image, dim, interpolation=inter)

用法

import cv2

image = cv2.imread('1.png')
cv2.imshow('width_100', maintain_aspect_ratio_resize(image, width=100))
cv2.imshow('width_300', maintain_aspect_ratio_resize(image, width=300))
cv2.waitKey()

使用此示例图片

只需缩小到width=100（左）或放大到width=300（右）

Here’s a function to upscale or downscale an image by desired width or height while maintaining aspect ratio

# Resizes a image and maintains aspect ratio
def maintain_aspect_ratio_resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    # Grab the image size and initialize dimensions
    dim = None
    (h, w) = image.shape[:2]

    # Return original image if no need to resize
    if width is None and height is None:
        return image

    # We are resizing height if width is none
    if width is None:
        # Calculate the ratio of the height and construct the dimensions
        r = height / float(h)
        dim = (int(w * r), height)
    # We are resizing width if height is none
    else:
        # Calculate the ratio of the width and construct the dimensions
        r = width / float(w)
        dim = (width, int(h * r))

    # Return the resized image
    return cv2.resize(image, dim, interpolation=inter)

Usage

import cv2

image = cv2.imread('1.png')
cv2.imshow('width_100', maintain_aspect_ratio_resize(image, width=100))
cv2.imshow('width_300', maintain_aspect_ratio_resize(image, width=300))
cv2.waitKey()

Using this example image

Simply downscale to width=100 (left) or upscale to width=300 (right)

知识问答

如何使用PIL将透明png图像与另一个图像合并

2021年8月4日 Python实用宝典

问题：如何使用PIL将透明png图像与另一个图像合并

我有一个透明的png图像“ foo.png”，并且用

im = Image.open("foo2.png");

现在我需要将foo.png与foo2.png合并。

（foo.png包含一些文本，我想在foo2.png上打印该文本）

I have a transparent png image “foo.png” and I’ve opened another image with

im = Image.open("foo2.png");

now what i need is to merge foo.png with foo2.png.

( foo.png contains some text and I want to print that text on foo2.png )

回答 0

import Image

background = Image.open("test1.png")
foreground = Image.open("test2.png")

background.paste(foreground, (0, 0), foreground)
background.show()

的第一个参数.paste()是要粘贴的图像。第二个是坐标，秘密调味料是第三个参数。它表示将用于粘贴图像的遮罩。如果通过透明图像，则Alpha通道将用作遮罩。

检查文档。

import Image

background = Image.open("test1.png")
foreground = Image.open("test2.png")

background.paste(foreground, (0, 0), foreground)
background.show()

First parameter to .paste() is the image to paste. Second are coordinates, and the secret sauce is the third parameter. It indicates a mask that will be used to paste the image. If you pass a image with transparency, then the alpha channel is used as mask.

Check the docs.

回答 1

Image.paste当背景图像也包含透明度时，将无法正常工作。您需要使用真正的Alpha合成。

枕头2.0包含alpha_composite执行此操作的功能。

background = Image.open("test1.png")
foreground = Image.open("test2.png")

Image.alpha_composite(background, foreground).save("test3.png")

编辑：两个图像都必须是RGBA类型。因此，convert('RGBA')如果它们带有调色板等，则需要调用。如果背景没有Alpha通道，则可以使用常规的粘贴方法（应该更快）。

Image.paste does not work as expected when the background image also contains transparency. You need to use real Alpha Compositing.

Pillow 2.0 contains an alpha_composite function that does this.

background = Image.open("test1.png")
foreground = Image.open("test2.png")

Image.alpha_composite(background, foreground).save("test3.png")

EDIT: Both images need to be of the type RGBA. So you need to call convert('RGBA') if they are paletted, etc.. If the background does not have an alpha channel, then you can use the regular paste method (which should be faster).

回答 2

正如olt已经指出的那样，Image.paste当源和目标都包含alpha 时，将无法正常工作。

请考虑以下情形：

两个测试图像都包含alpha：

layer1 = Image.open("layer1.png")
layer2 = Image.open("layer2.png")

Image.paste像这样合成图像：

final1 = Image.new("RGBA", layer1.size)
final1.paste(layer1, (0,0), layer1)
final1.paste(layer2, (0,0), layer2)

产生以下图像（红色像素的叠加部分完全取自第二层。像素未正确混合）：

Image.alpha_composite像这样合成图像：

final2 = Image.new("RGBA", layer1.size)
final2 = Image.alpha_composite(final2, layer1)
final2 = Image.alpha_composite(final2, layer2)

产生以下（正确）图像：

As olt already pointed out, Image.paste doesn’t work properly, when source and destination both contain alpha.

Consider the following scenario:

Two test images, both contain alpha:

layer1 = Image.open("layer1.png")
layer2 = Image.open("layer2.png")

Compositing image using Image.paste like so:

final1 = Image.new("RGBA", layer1.size)
final1.paste(layer1, (0,0), layer1)
final1.paste(layer2, (0,0), layer2)

produces the following image (the alpha part of the overlayed red pixels is completely taken from the 2nd layer. The pixels are not blended correctly):

Compositing image using Image.alpha_composite like so:

final2 = Image.new("RGBA", layer1.size)
final2 = Image.alpha_composite(final2, layer1)
final2 = Image.alpha_composite(final2, layer2)

produces the following (correct) image:

回答 3

也可以使用混合：

im1 = Image.open("im1.png")
im2 = Image.open("im2.png")
blended = Image.blend(im1, im2, alpha=0.5)
blended.save("blended.png")

One can also use blending:

im1 = Image.open("im1.png")
im2 = Image.open("im2.png")
blended = Image.blend(im1, im2, alpha=0.5)
blended.save("blended.png")

回答 4

def trans_paste(bg_img,fg_img,box=(0,0)):
    fg_img_trans = Image.new("RGBA",bg_img.size)
    fg_img_trans.paste(fg_img,box,mask=fg_img)
    new_img = Image.alpha_composite(bg_img,fg_img_trans)
    return new_img

def trans_paste(bg_img,fg_img,box=(0,0)):
    fg_img_trans = Image.new("RGBA",bg_img.size)
    fg_img_trans.paste(fg_img,box,mask=fg_img)
    new_img = Image.alpha_composite(bg_img,fg_img_trans)
    return new_img

回答 5

有类似的问题，很难找到答案。通过以下功能，您可以将具有透明度参数的图像以特定的偏移量粘贴到另一幅图像上。

import Image

def trans_paste(fg_img,bg_img,alpha=1.0,box=(0,0)):
    fg_img_trans = Image.new("RGBA",fg_img.size)
    fg_img_trans = Image.blend(fg_img_trans,fg_img,alpha)
    bg_img.paste(fg_img_trans,box,fg_img_trans)
    return bg_img

bg_img = Image.open("bg.png")
fg_img = Image.open("fg.png")
p = trans_paste(fg_img,bg_img,.7,(250,100))
p.show()

Had a similar question and had difficulty finding an answer. The following function allows you to paste an image with a transparency parameter over another image at a specific offset.

import Image

def trans_paste(fg_img,bg_img,alpha=1.0,box=(0,0)):
    fg_img_trans = Image.new("RGBA",fg_img.size)
    fg_img_trans = Image.blend(fg_img_trans,fg_img,alpha)
    bg_img.paste(fg_img_trans,box,fg_img_trans)
    return bg_img

bg_img = Image.open("bg.png")
fg_img = Image.open("fg.png")
p = trans_paste(fg_img,bg_img,.7,(250,100))
p.show()

回答 6

我结束了自己的编码的建议此评论用户@ P.Melch一个项目我正在做，并建议通过@Mithril。

我也编码了安全性，这是它的代码。（我链接了一个特定的提交，因为在此存储库的将来情况可能会发生变化）

注意：我希望图像中有numpy数组，例如np.array(Image.open(...))，输入A和B copy_from以及此链接的函数overlay参数。

依赖项是位于其之前的函数，copy_from方法和numpy数组，它们是要切片的PIL图像内容。

尽管该文件是非常面向类的，但是如果要使用该函数overlay_transparent，请确保将重命名self.frame为背景图像numpy数组。

或者，您可以仅复制整个文件（可能删除一些导入和Utils类），然后与此Frame类进行交互，如下所示：

# Assuming you named the file frame.py in the same directory
from frame import Frame

background = Frame()
overlay = Frame()

background.load_from_path("your path here")
overlay.load_from_path("your path here")

background.overlay_transparent(overlay.frame, x=300, y=200)

然后，您将其background.frame作为叠加和alpha合成数组，可以使用overlayed = Image.fromarray(background.frame)或类似的东西从中获取PIL图像：

overlayed = Frame()
overlayed.load_from_array(background.frame)

或者就像background.save("save path")直接取自alpha复合内部self.frame变量一样。

您可以读取该文件，并找到其它的一些功能，这个实现我喜欢的编码方法get_rgb_frame_array，resize_by_ratio，resize_to_resolution，rotate，gaussian_blur，transparency，vignetting:)

您可能想要删除该resolve_pending项目专用的方法。

很高兴能为您提供帮助，请务必查看我正在谈论的项目的回购协议，该问题和线程对我的发展大有帮助：)

I ended up coding myself the suggestion of this comment made by the user @P.Melch and suggested by @Mithril on a project I’m working on.

I coded out of bounds safety as well, here’s the code for it. (I linked a specific commit because things can change in the future of this repository)

Note: I expect numpy arrays from the images like so np.array(Image.open(...)) as the inputs A and B from copy_from and this linked function overlay arguments.

The dependencies are the function right before it, the copy_from method, and numpy arrays as the PIL Image content for slicing.

Though the file is very class oriented, if you want to use that function overlay_transparent, be sure to rename the self.frame to your background image numpy array.

Or you can just copy the whole file (probably remove some imports and the Utils class) and interact with this Frame class like so:

# Assuming you named the file frame.py in the same directory
from frame import Frame

background = Frame()
overlay = Frame()

background.load_from_path("your path here")
overlay.load_from_path("your path here")

background.overlay_transparent(overlay.frame, x=300, y=200)

Then you have your background.frame as the overlayed and alpha composited array, you can get a PIL image from it with overlayed = Image.fromarray(background.frame) or something like:

overlayed = Frame()
overlayed.load_from_array(background.frame)

Or just background.save("save path") as that takes directly from the alpha composited internal self.frame variable.

You can read the file and find some other nice functions with this implementation I coded like the methods get_rgb_frame_array, resize_by_ratio, resize_to_resolution, rotate, gaussian_blur, transparency, vignetting :)

You’d probably want to remove the resolve_pending method as that is specific for that project.

Glad if I helped you, be sure to check out the repo of the project I’m talking about, this question and thread helped me a lot on the development :)

知识问答

如何改善我的爪子检测？

2021年8月2日 Python实用宝典

问题：如何改善我的爪子检测？

在上一个关于在每个爪子中寻找脚趾的问题提出之后，我开始加载其他测量值以查看其承受力。不幸的是，我很快就遇到了以下步骤之一的问题：识别爪子。

您会看到，我的概念证明基本上随时间推移获取了每个传感器的最大压力，并且将开始寻找每一行的总和，直到找到！= 0.0。然后，它对列执行相同的操作，并且一旦发现多于2的行又为零。它将最小和最大行和列值存储到某个索引。

正如您在图中看到的，在大多数情况下，此方法效果很好。但是，这种方法有很多缺点（除了非常原始之外）：

人类可以拥有“空心脚”，这意味着足迹本身内有几行空行。由于我担心这种情况也会发生在（大型）狗身上，因此我在切断爪子之前至少等待了2或3行。

如果另一个联系人在到达数个空行之前在另一列中建立了联系，则会扩大面积。我认为我可以比较这些列，看看它们是否超过某个值，它们必须是单独的爪子。
当狗很小或走得更快时，问题会变得更糟。发生的情况是前爪的脚趾仍在接触，而后爪的脚趾刚开始在与前爪相同的区域内接触！

使用我的简单脚本，它将无法将这两个部分分开，因为它必须确定该区域的哪些帧属于哪个爪子，而目前，我只需要查看所有帧上的最大值即可。

它开始出错的示例：

因此，现在我正在寻找识别和分离爪子的更好方法（在此之后，我将要解决确定它是哪只爪子的问题！）。

更新：

我一直在努力地实现Joe（真棒！）的答案，但是我很难从文件中提取实际的爪子数据。

当应用于最大压力图像时（见上文），coded_paws显示了所有不同的爪子。但是，解决方案遍历每一帧（以分离重叠的爪子）并设置四个Rectangle属性，例如坐标或高度/宽度。

我无法弄清楚如何获取这些属性并将其存储在可以应用于测量数据的某个变量中。因为我需要知道每个爪子的位置，所以在每个框架中它的位置是什么，并将其耦合到哪个爪子（前/后，左/右）。

那么，如何使用Rectangles属性为每个爪子提取这些值？

我在公共Dropbox文件夹（示例1，示例2，示例3）中有问题设置中使用的度量。对于有兴趣的人，我还建立了一个博客来让您保持最新：-)

After my previous question on finding toes within each paw, I started loading up other measurements to see how it would hold up. Unfortunately, I quickly ran into a problem with one of the preceding steps: recognizing the paws.

You see, my proof of concept basically took the maximal pressure of each sensor over time and would start looking for the sum of each row, until it finds on that != 0.0. Then it does the same for the columns and as soon as it finds more than 2 rows with that are zero again. It stores the minimal and maximal row and column values to some index.

alt text

As you can see in the figure, this works quite well in most cases. However, there are a lot of downsides to this approach (other than being very primitive):

Humans can have ‘hollow feet’ which means there are several empty rows within the footprint itself. Since I feared this could happen with (large) dogs too, I waited for at least 2 or 3 empty rows before cutting off the paw.

This creates a problem if another contact made in a different column before it reaches several empty rows, thus expanding the area. I figure I could compare the columns and see if they exceed a certain value, they must be separate paws.
The problem gets worse when the dog is very small or walks at a higher pace. What happens is that the front paw’s toes are still making contact, while the hind paw’s toes just start to make contact within the same area as the front paw!

With my simple script, it won’t be able to split these two, because it would have to determine which frames of that area belong to which paw, while currently I would only have to look at the maximal values over all frames.

Examples of where it starts going wrong:

alt text

So now I’m looking for a better way of recognizing and separating the paws (after which I’ll get to the problem of deciding which paw it is!).

Update:

I’ve been tinkering to get Joe’s (awesome!) answer implemented, but I’m having difficulties extracting the actual paw data from my files.

alt text

The coded_paws shows me all the different paws, when applied to the maximal pressure image (see above). However, the solution goes over each frame (to separate overlapping paws) and sets the four Rectangle attributes, such as coordinates or height/width.

I can’t figure out how to take these attributes and store them in some variable that I can apply to the measurement data. Since I need to know for each paw, what its location is during which frames and couple this to which paw it is (front/hind, left/right).

So how can I use the Rectangles attributes to extract these values for each paw?

I have the measurements I used in the question setup in my public Dropbox folder (example 1, example 2, example 3). For anyone interested I also set up a blog to keep you up to date :-)

回答 0

如果只需要（半个）连续区域，则Python中已经有一个简单的实现：SciPy的ndimage.morphology模块。这是相当常见的图像形态学操作。

基本上，您有5个步骤：

def find_paws(data, smooth_radius=5, threshold=0.0001):
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    thresh = data > threshold
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    coded_paws, num_paws = sp.ndimage.label(filled)
    data_slices = sp.ndimage.find_objects(coded_paws)
    return object_slices

稍微模糊输入数据，以确保脚掌有连续的足迹。（仅使用更大的内核（structure各种scipy.ndimage.morphology功能的争夺者）会更有效率，但是由于某些原因，它不能正常工作…）
阈值数组，这样您就可以得到一个压力超过某个阈值（例如thresh = data > value）的布尔数组
填充所有内部孔，以使区域更干净（filled = sp.ndimage.morphology.binary_fill_holes(thresh)）
找到单独的连续区域（coded_paws, num_paws = sp.ndimage.label(filled)）。这将返回一个数组，其中的区域用数字编码（每个区域都是唯一整数的连续区域（直到爪数为1），其他所有位置均为零）。
使用隔离相邻区域data_slices = sp.ndimage.find_objects(coded_paws)。这将返回slice对象元组的列表，因此您可以使用来获取每个爪子的数据区域[data[x] for x in data_slices]。相反，我们将基于这些切片绘制一个矩形，这需要更多的工作。

下面的两个动画显示了“重叠的爪子”和“分组的爪子”示例数据。该方法似乎运行良好。（不管它的价值如何，它的运行情况都比我机器上下面的GIF图像要平稳得多，因此爪子检测算法相当快…）

这是一个完整的示例（现在有更详细的说明）。其中绝大多数是读取输入内容并制作动画。实际的爪子检测只有5行代码。

import numpy as np
import scipy as sp
import scipy.ndimage

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def animate(input_filename):
    """Detects paws and animates the position and raw data of each frame
    in the input file"""
    # With matplotlib, it's much, much faster to just update the properties
    # of a display object than it is to create a new one, so we'll just update
    # the data and position of the same objects throughout this animation...

    infile = paw_file(input_filename)

    # Since we're making an animation with matplotlib, we need 
    # ion() instead of show()...
    plt.ion()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    fig.suptitle(input_filename)

    # Make an image based on the first frame that we'll update later
    # (The first frame is never actually displayed)
    im = ax.imshow(infile.next()[1])

    # Make 4 rectangles that we can later move to the position of each paw
    rects = [Rectangle((0,0), 1,1, fc='none', ec='red') for i in range(4)]
    [ax.add_patch(rect) for rect in rects]

    title = ax.set_title('Time 0.0 ms')

    # Process and display each frame
    for time, frame in infile:
        paw_slices = find_paws(frame)

        # Hide any rectangles that might be visible
        [rect.set_visible(False) for rect in rects]

        # Set the position and size of a rectangle for each paw and display it
        for slice, rect in zip(paw_slices, rects):
            dy, dx = slice
            rect.set_xy((dx.start, dy.start))
            rect.set_width(dx.stop - dx.start + 1)
            rect.set_height(dy.stop - dy.start + 1)
            rect.set_visible(True)

        # Update the image data and title of the plot
        title.set_text('Time %0.2f ms' % time)
        im.set_data(frame)
        im.set_clim([frame.min(), frame.max()])
        fig.canvas.draw()

def find_paws(data, smooth_radius=5, threshold=0.0001):
    """Detects and isolates contiguous regions in the input array"""
    # Blur the input data a bit so the paws have a continous footprint 
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    # Threshold the blurred data (this needs to be a bit > 0 due to the blur)
    thresh = data > threshold
    # Fill any interior holes in the paws to get cleaner regions...
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    # Label each contiguous paw
    coded_paws, num_paws = sp.ndimage.label(filled)
    # Isolate the extent of each paw
    data_slices = sp.ndimage.find_objects(coded_paws)
    return data_slices

def paw_file(filename):
    """Returns a iterator that yields the time and data in each frame
    The infile is an ascii file of timesteps formatted similar to this:

    Frame 0 (0.00 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0

    Frame 1 (0.53 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0
    ...
    """
    with open(filename) as infile:
        while True:
            try:
                time, data = read_frame(infile)
                yield time, data
            except StopIteration:
                break

def read_frame(infile):
    """Reads a frame from the infile."""
    frame_header = infile.next().strip().split()
    time = float(frame_header[-2][1:])
    data = []
    while True:
        line = infile.next().strip().split()
        if line == []:
            break
        data.append(line)
    return time, np.array(data, dtype=np.float)

if __name__ == '__main__':
    animate('Overlapping paws.bin')
    animate('Grouped up paws.bin')
    animate('Normal measurement.bin')

更新：就确定什么时间与传感器接触的爪子而言，最简单的解决方案是仅进行相同的分析，但立即使用所有数据。（即，将输入堆叠到3D数组中，然后使用它，而不是单独的时间范围。）由于SciPy的ndimage函数旨在用于n维数组，因此我们不必修改原始的爪查找函数完全没有

# This uses functions (and imports) in the previous code example!!
def paw_regions(infile):
    # Read in and stack all data together into a 3D array
    data, time = [], []
    for t, frame in paw_file(infile):
        time.append(t)
        data.append(frame)
    data = np.dstack(data)
    time = np.asarray(time)

    # Find and label the paw impacts
    data_slices, coded_paws = find_paws(data, smooth_radius=4)

    # Sort by time of initial paw impact... This way we can determine which
    # paws are which relative to the first paw with a simple modulo 4.
    # (Assuming a 4-legged dog, where all 4 paws contacted the sensor)
    data_slices.sort(key=lambda dat_slice: dat_slice[2].start)

    # Plot up a simple analysis
    fig = plt.figure()
    ax1 = fig.add_subplot(2,1,1)
    annotate_paw_prints(time, data, data_slices, ax=ax1)
    ax2 = fig.add_subplot(2,1,2)
    plot_paw_impacts(time, data_slices, ax=ax2)
    fig.suptitle(infile)

def plot_paw_impacts(time, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Group impacts by paw...
    for i, dat_slice in enumerate(data_slices):
        dx, dy, dt = dat_slice
        paw = i%4 + 1
        # Draw a bar over the time interval where each paw is in contact
        ax.barh(bottom=paw, width=time[dt].ptp(), height=0.2, 
                left=time[dt].min(), align='center', color='red')
    ax.set_yticks(range(1, 5))
    ax.set_yticklabels(['Paw 1', 'Paw 2', 'Paw 3', 'Paw 4'])
    ax.set_xlabel('Time (ms) Since Beginning of Experiment')
    ax.yaxis.grid(True)
    ax.set_title('Periods of Paw Contact')

def annotate_paw_prints(time, data, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Display all paw impacts (sum over time)
    ax.imshow(data.sum(axis=2).T)

    # Annotate each impact with which paw it is
    # (Relative to the first paw to hit the sensor)
    x, y = [], []
    for i, region in enumerate(data_slices):
        dx, dy, dz = region
        # Get x,y center of slice...
        x0 = 0.5 * (dx.start + dx.stop)
        y0 = 0.5 * (dy.start + dy.stop)
        x.append(x0); y.append(y0)

        # Annotate the paw impacts         
        ax.annotate('Paw %i' % (i%4 +1), (x0, y0),  
            color='red', ha='center', va='bottom')

    # Plot line connecting paw impacts
    ax.plot(x,y, '-wo')
    ax.axis('image')
    ax.set_title('Order of Steps')

If you’re just wanting (semi) contiguous regions, there’s already an easy implementation in Python: SciPy‘s ndimage.morphology module. This is a fairly common image morphology operation.

Basically, you have 5 steps:

def find_paws(data, smooth_radius=5, threshold=0.0001):
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    thresh = data > threshold
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    coded_paws, num_paws = sp.ndimage.label(filled)
    data_slices = sp.ndimage.find_objects(coded_paws)
    return object_slices

Blur the input data a bit to make sure the paws have a continuous footprint. (It would be more efficient to just use a larger kernel (the structure kwarg to the various scipy.ndimage.morphology functions) but this isn’t quite working properly for some reason…)
Threshold the array so that you have a boolean array of places where the pressure is over some threshold value (i.e. thresh = data > value)
Fill any internal holes, so that you have cleaner regions (filled = sp.ndimage.morphology.binary_fill_holes(thresh))
Find the separate contiguous regions (coded_paws, num_paws = sp.ndimage.label(filled)). This returns an array with the regions coded by number (each region is a contiguous area of a unique integer (1 up to the number of paws) with zeros everywhere else)).
Isolate the contiguous regions using data_slices = sp.ndimage.find_objects(coded_paws). This returns a list of tuples of slice objects, so you could get the region of the data for each paw with [data[x] for x in data_slices]. Instead, we’ll draw a rectangle based on these slices, which takes slightly more work.

The two animations below show your “Overlapping Paws” and “Grouped Paws” example data. This method seems to be working perfectly. (And for whatever it’s worth, this runs much more smoothly than the GIF images below on my machine, so the paw detection algorithm is fairly fast…)

Overlapping Paws Grouped Paws

Here’s a full example (now with much more detailed explanations). The vast majority of this is reading the input and making an animation. The actual paw detection is only 5 lines of code.

import numpy as np
import scipy as sp
import scipy.ndimage

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def animate(input_filename):
    """Detects paws and animates the position and raw data of each frame
    in the input file"""
    # With matplotlib, it's much, much faster to just update the properties
    # of a display object than it is to create a new one, so we'll just update
    # the data and position of the same objects throughout this animation...

    infile = paw_file(input_filename)

    # Since we're making an animation with matplotlib, we need 
    # ion() instead of show()...
    plt.ion()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    fig.suptitle(input_filename)

    # Make an image based on the first frame that we'll update later
    # (The first frame is never actually displayed)
    im = ax.imshow(infile.next()[1])

    # Make 4 rectangles that we can later move to the position of each paw
    rects = [Rectangle((0,0), 1,1, fc='none', ec='red') for i in range(4)]
    [ax.add_patch(rect) for rect in rects]

    title = ax.set_title('Time 0.0 ms')

    # Process and display each frame
    for time, frame in infile:
        paw_slices = find_paws(frame)

        # Hide any rectangles that might be visible
        [rect.set_visible(False) for rect in rects]

        # Set the position and size of a rectangle for each paw and display it
        for slice, rect in zip(paw_slices, rects):
            dy, dx = slice
            rect.set_xy((dx.start, dy.start))
            rect.set_width(dx.stop - dx.start + 1)
            rect.set_height(dy.stop - dy.start + 1)
            rect.set_visible(True)

        # Update the image data and title of the plot
        title.set_text('Time %0.2f ms' % time)
        im.set_data(frame)
        im.set_clim([frame.min(), frame.max()])
        fig.canvas.draw()

def find_paws(data, smooth_radius=5, threshold=0.0001):
    """Detects and isolates contiguous regions in the input array"""
    # Blur the input data a bit so the paws have a continous footprint 
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    # Threshold the blurred data (this needs to be a bit > 0 due to the blur)
    thresh = data > threshold
    # Fill any interior holes in the paws to get cleaner regions...
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    # Label each contiguous paw
    coded_paws, num_paws = sp.ndimage.label(filled)
    # Isolate the extent of each paw
    data_slices = sp.ndimage.find_objects(coded_paws)
    return data_slices

def paw_file(filename):
    """Returns a iterator that yields the time and data in each frame
    The infile is an ascii file of timesteps formatted similar to this:

    Frame 0 (0.00 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0

    Frame 1 (0.53 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0
    ...
    """
    with open(filename) as infile:
        while True:
            try:
                time, data = read_frame(infile)
                yield time, data
            except StopIteration:
                break

def read_frame(infile):
    """Reads a frame from the infile."""
    frame_header = infile.next().strip().split()
    time = float(frame_header[-2][1:])
    data = []
    while True:
        line = infile.next().strip().split()
        if line == []:
            break
        data.append(line)
    return time, np.array(data, dtype=np.float)

if __name__ == '__main__':
    animate('Overlapping paws.bin')
    animate('Grouped up paws.bin')
    animate('Normal measurement.bin')

Update: As far as identifying which paw is in contact with the sensor at what times, the simplest solution is to just do the same analysis, but use all of the data at once. (i.e. stack the input into a 3D array, and work with it, instead of the individual time frames.) Because SciPy’s ndimage functions are meant to work with n-dimensional arrays, we don’t have to modify the original paw-finding function at all.

# This uses functions (and imports) in the previous code example!!
def paw_regions(infile):
    # Read in and stack all data together into a 3D array
    data, time = [], []
    for t, frame in paw_file(infile):
        time.append(t)
        data.append(frame)
    data = np.dstack(data)
    time = np.asarray(time)

    # Find and label the paw impacts
    data_slices, coded_paws = find_paws(data, smooth_radius=4)

    # Sort by time of initial paw impact... This way we can determine which
    # paws are which relative to the first paw with a simple modulo 4.
    # (Assuming a 4-legged dog, where all 4 paws contacted the sensor)
    data_slices.sort(key=lambda dat_slice: dat_slice[2].start)

    # Plot up a simple analysis
    fig = plt.figure()
    ax1 = fig.add_subplot(2,1,1)
    annotate_paw_prints(time, data, data_slices, ax=ax1)
    ax2 = fig.add_subplot(2,1,2)
    plot_paw_impacts(time, data_slices, ax=ax2)
    fig.suptitle(infile)

def plot_paw_impacts(time, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Group impacts by paw...
    for i, dat_slice in enumerate(data_slices):
        dx, dy, dt = dat_slice
        paw = i%4 + 1
        # Draw a bar over the time interval where each paw is in contact
        ax.barh(bottom=paw, width=time[dt].ptp(), height=0.2, 
                left=time[dt].min(), align='center', color='red')
    ax.set_yticks(range(1, 5))
    ax.set_yticklabels(['Paw 1', 'Paw 2', 'Paw 3', 'Paw 4'])
    ax.set_xlabel('Time (ms) Since Beginning of Experiment')
    ax.yaxis.grid(True)
    ax.set_title('Periods of Paw Contact')

def annotate_paw_prints(time, data, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Display all paw impacts (sum over time)
    ax.imshow(data.sum(axis=2).T)

    # Annotate each impact with which paw it is
    # (Relative to the first paw to hit the sensor)
    x, y = [], []
    for i, region in enumerate(data_slices):
        dx, dy, dz = region
        # Get x,y center of slice...
        x0 = 0.5 * (dx.start + dx.stop)
        y0 = 0.5 * (dy.start + dy.stop)
        x.append(x0); y.append(y0)

        # Annotate the paw impacts         
        ax.annotate('Paw %i' % (i%4 +1), (x0, y0),  
            color='red', ha='center', va='bottom')

    # Plot line connecting paw impacts
    ax.plot(x,y, '-wo')
    ax.axis('image')
    ax.set_title('Order of Steps')

alt text

回答 1

我不是图像检测方面的专家，而且我也不了解Python，但是我会给它一个警告。

要检测单个爪子，首先应只选择压力大于某个较小阈值的所有东西，而根本没有压力。高于此像素/像素的每个像素/点均应进行“标记”。然后，与所有“已标记”像素相邻的每个像素都将被标记，此过程重复几次。会形成完全连接的质量，因此您有不同的对象。然后，每个“对象”都有一个最小和最大x和y值，因此可以将边界框整齐地包装在它们周围。

伪代码：

(MARK) ALL PIXELS ABOVE (0.5)

(MARK) ALL PIXELS (ADJACENT) TO (MARK) PIXELS

REPEAT (STEP 2) (5) TIMES

SEPARATE EACH TOTALLY CONNECTED MASS INTO A SINGLE OBJECT

MARK THE EDGES OF EACH OBJECT, AND CUT APART TO FORM SLICES.

那应该做。

I’m no expert in image detection, and I don’t know Python, but I’ll give it a whack…

To detect individual paws, you should first only select everything with a pressure greater than some small threshold, very close to no pressure at all. Every pixel/point that is above this should be “marked.” Then, every pixel adjacent to all “marked” pixels becomes marked, and this process is repeated a few times. Masses that are totally connected would be formed, so you have distinct objects. Then, each “object” has a minimum and maximum x and y value, so bounding boxes can be packed neatly around them.

Pseudocode:

(MARK) ALL PIXELS ABOVE (0.5)

(MARK) ALL PIXELS (ADJACENT) TO (MARK) PIXELS

REPEAT (STEP 2) (5) TIMES

SEPARATE EACH TOTALLY CONNECTED MASS INTO A SINGLE OBJECT

MARK THE EDGES OF EACH OBJECT, AND CUT APART TO FORM SLICES.

That should about do it.

回答 2

注意：我说的是像素，但这可能是使用像素平均值的区域。优化是另一个问题。

听起来您需要分析每个像素的函数（随时间变化的压力）并确定函数的旋转方向（当它在另一个方向上变化> X时，它被认为是对付错误的转向）。

如果您知道旋转的帧数，您将知道两个爪子之间压力最大的框架，而压力最小的框架。从理论上讲，您将知道两个爪子受力最大的两个帧，并且可以计算出这些间隔的平均值。

在那之后，我将解决决定它是哪只爪子的问题！

这与以前的旅程相同，知道每个爪子何时施加最大的压力有助于您做出决定。

Note: I say pixel, but this could be regions using an average of the pixels. Optimization is another issue…

Sounds like you need to analyze a function (pressure over time) for each pixel and determine where the function turns (when it changes > X in the other direction it is considered a turn to counter errors).

If you know at what frames it turns, you will know the frame where the pressure was the most hard and you will know where it was the least hard between the two paws. In theory, you then would know the two frames where the paws pressed the most hard and can calculate an average of those intervals.

after which I’ll get to the problem of deciding which paw it is!

This is the same tour as before, knowing when each paw applies the most pressure helps you decide.

知识问答

如何量化两个图像之间的差异？

2021年8月2日 Python实用宝典

问题：如何量化两个图像之间的差异？

这是我想做的：

我定期使用网络摄像头拍照。有点像延时的东西。但是，如果什么都没有真正改变，即图片看起来几乎相同，则我不想存储最新的快照。

我想有某种量化差异的方法，而我将不得不凭经验确定阈值。

我在寻找简单而不是完美。我正在使用python。

Here’s what I would like to do:

I’m taking pictures with a webcam at regular intervals. Sort of like a time lapse thing. However, if nothing has really changed, that is, the picture pretty much looks the same, I don’t want to store the latest snapshot.

I imagine there’s some way of quantifying the difference, and I would have to empirically determine a threshold.

I’m looking for simplicity rather than perfection. I’m using python.

回答 0

大概的概念

选项1：将两个图像都加载为数组（scipy.misc.imread），然后计算逐个元素（逐像素）的差异。计算差异范数。

选项2：加载两个图像。计算每个特征向量的某些特征向量（如直方图）。计算特征向量之间的距离，而不是图像。

但是，首先要做出一些决定。

问题

您应该首先回答以下问题：

图像的形状和尺寸是否相同？

如果没有，您可能需要调整大小或裁剪它们。PIL库将帮助您使用Python做到这一点。

如果使用相同的设置和相同的设备拍摄它们，则它们可能是相同的。
图像是否对齐正确？

如果不是，则可能要先运行互相关，然后首先找到最佳对齐方式。SciPy具有执行此功能的功能。

如果相机和场景静止不动，则图像可能对齐良好。
图像的曝光总是一样吗？（亮度/对比度是否相同？）

如果不是，则可能要规范化图像。

但是要小心，在某些情况下，这样做可能弊大于利。例如，深色背景上的单个明亮像素将使标准化图像非常不同。
颜色信息重要吗？

如果要注意颜色变化，则将具有每个点的颜色值向量，而不是灰度图像中的标量值。编写此类代码时，您需要更多注意。
图像中是否有明显的边缘？他们可能会移动吗？

如果是，则可以先应用边缘检测算法（例如，使用Sobel或Prewitt变换计算梯度，应用一些阈值），然后将第一个图像上的边缘与第二个图像上的边缘进行比较。
图像中是否有噪点？

所有传感器都会在一定程度上污染图像。低成本传感器噪声更大。您可能希望在比较图像之前进行一些降噪处理。在这里，模糊是最简单（但不是最好）的方法。
您想注意哪些变化？

这可能会影响要用于图像之间差异的标准的选择。

考虑使用曼哈顿范数（绝对值的总和）或零范数（元素数量不等于零）来衡量图像已更改了多少。前者将告诉您图像偏移了多少，后者将仅告诉您有多少像素不同。

例

我认为您的图像对齐良好，大小和形状相同，可能具有不同的曝光度。为简单起见，即使它们是彩色（RGB）图像，我也将它们转换为灰度。

您将需要这些导入：

import sys

from scipy.misc import imread
from scipy.linalg import norm
from scipy import sum, average

主要功能，读取两张图像，转换为灰度，比较并打印结果：

def main():
    file1, file2 = sys.argv[1:1+2]
    # read images as 2D arrays (convert to grayscale for simplicity)
    img1 = to_grayscale(imread(file1).astype(float))
    img2 = to_grayscale(imread(file2).astype(float))
    # compare
    n_m, n_0 = compare_images(img1, img2)
    print "Manhattan norm:", n_m, "/ per pixel:", n_m/img1.size
    print "Zero norm:", n_0, "/ per pixel:", n_0*1.0/img1.size

如何比较。img1和img2是2D SciPy的阵列，在这里：

def compare_images(img1, img2):
    # normalize to compensate for exposure difference, this may be unnecessary
    # consider disabling it
    img1 = normalize(img1)
    img2 = normalize(img2)
    # calculate the difference and its norms
    diff = img1 - img2  # elementwise for scipy arrays
    m_norm = sum(abs(diff))  # Manhattan norm
    z_norm = norm(diff.ravel(), 0)  # Zero norm
    return (m_norm, z_norm)

如果文件是彩色图像，则imread返回一个3D数组，平均RGB通道（最后一个数组轴）以获取强度。对于灰度图像（例如.pgm）无需这样做：

def to_grayscale(arr):
    "If arr is a color image (3D array), convert it to grayscale (2D array)."
    if len(arr.shape) == 3:
        return average(arr, -1)  # average over the last axis (color channels)
    else:
        return arr

标准化很简单，您可以选择标准化为[0,1]而不是[0,255]。arr是一个SciPy数组，因此所有操作都是按元素进行的：

def normalize(arr):
    rng = arr.max()-arr.min()
    amin = arr.min()
    return (arr-amin)*255/rng

运行main函数：

if __name__ == "__main__":
    main()

现在，您可以将所有内容放入脚本中并针对两个图像运行。如果我们将图像与其自身进行比较，则没有区别：

$ python compare.py one.jpg one.jpg
Manhattan norm: 0.0 / per pixel: 0.0
Zero norm: 0 / per pixel: 0.0

如果我们模糊图像并与原始图像进行比较，则存在一些差异：

$ python compare.py one.jpg one-blurred.jpg 
Manhattan norm: 92605183.67 / per pixel: 13.4210411116
Zero norm: 6900000 / per pixel: 1.0

PS整个compare.py脚本。

更新：相关技术

由于问题是关于视频序列的，其中的帧可能几乎是相同的，并且您在寻找不寻常的东西，所以我想提及一些可能相关的替代方法：

背景扣除和分割（以检测前景对象）
稀疏光流（检测运动）
比较直方图或其他统计信息而不是图像

我强烈建议您阅读“学习OpenCV”这本书，第9章（图像部分和分割）和第10章（跟踪和运动）。前者教导使用背景减法，后者给出有关光流方法的一些信息。所有方法都在OpenCV库中实现。如果使用Python，我建议使用OpenCV≥2.3及其cv2Python模块。

最简单的背景减法版本：

了解背景每个像素的平均值μ和标准偏差σ
比较当前像素值到（μ-2σ，μ+2σ）或（μ-σ，μ+σ）的范围

更高级的版本会考虑每个像素的时间序列，并处理非静态场景（例如移动的树木或草地）。

光流的概念是拍摄两个或更多帧，然后将速度矢量分配给每个像素（密集光流）或分配给其中一些像素（稀疏光流）。要估计稀疏的光流，可以使用Lucas-Kanade方法（它也在OpenCV中实现）。显然，如果有很多流量（在速度场的最大值上具有较高的平均值），则说明帧中正在移动某些内容，并且后续图像会有所不同。

比较直方图可能有助于检测连续帧之间的突然变化。Courbon等人，2010年使用了这种方法：

连续帧的相似性。测量两个连续帧之间的距离。如果过高，则意味着第二帧已损坏，因此图像被消除。所述的Kullback-Leibler距离，或互熵，对两帧的直方图：

$d（p，q）= \ sum_i p（i）\ log（p（i）/ q（i））$

其中p和q是帧的直方图。阈值固定为0.2。

General idea

Option 1: Load both images as arrays (scipy.misc.imread) and calculate an element-wise (pixel-by-pixel) difference. Calculate the norm of the difference.

Option 2: Load both images. Calculate some feature vector for each of them (like a histogram). Calculate distance between feature vectors rather than images.

However, there are some decisions to make first.

Questions

You should answer these questions first:

Are images of the same shape and dimension?

If not, you may need to resize or crop them. PIL library will help to do it in Python.

If they are taken with the same settings and the same device, they are probably the same.
Are images well-aligned?

If not, you may want to run cross-correlation first, to find the best alignment first. SciPy has functions to do it.

If the camera and the scene are still, the images are likely to be well-aligned.
Is exposure of the images always the same? (Is lightness/contrast the same?)

If not, you may want to normalize images.

But be careful, in some situations this may do more wrong than good. For example, a single bright pixel on a dark background will make the normalized image very different.
Is color information important?

If you want to notice color changes, you will have a vector of color values per point, rather than a scalar value as in gray-scale image. You need more attention when writing such code.
Are there distinct edges in the image? Are they likely to move?

If yes, you can apply edge detection algorithm first (e.g. calculate gradient with Sobel or Prewitt transform, apply some threshold), then compare edges on the first image to edges on the second.
Is there noise in the image?

All sensors pollute the image with some amount of noise. Low-cost sensors have more noise. You may wish to apply some noise reduction before you compare images. Blur is the most simple (but not the best) approach here.
What kind of changes do you want to notice?

This may affect the choice of norm to use for the difference between images.

Consider using Manhattan norm (the sum of the absolute values) or zero norm (the number of elements not equal to zero) to measure how much the image has changed. The former will tell you how much the image is off, the latter will tell only how many pixels differ.

Example

I assume your images are well-aligned, the same size and shape, possibly with different exposure. For simplicity, I convert them to grayscale even if they are color (RGB) images.

You will need these imports:

import sys

from scipy.misc import imread
from scipy.linalg import norm
from scipy import sum, average

Main function, read two images, convert to grayscale, compare and print results:

def main():
    file1, file2 = sys.argv[1:1+2]
    # read images as 2D arrays (convert to grayscale for simplicity)
    img1 = to_grayscale(imread(file1).astype(float))
    img2 = to_grayscale(imread(file2).astype(float))
    # compare
    n_m, n_0 = compare_images(img1, img2)
    print "Manhattan norm:", n_m, "/ per pixel:", n_m/img1.size
    print "Zero norm:", n_0, "/ per pixel:", n_0*1.0/img1.size

How to compare. img1 and img2 are 2D SciPy arrays here:

def compare_images(img1, img2):
    # normalize to compensate for exposure difference, this may be unnecessary
    # consider disabling it
    img1 = normalize(img1)
    img2 = normalize(img2)
    # calculate the difference and its norms
    diff = img1 - img2  # elementwise for scipy arrays
    m_norm = sum(abs(diff))  # Manhattan norm
    z_norm = norm(diff.ravel(), 0)  # Zero norm
    return (m_norm, z_norm)

If the file is a color image, imread returns a 3D array, average RGB channels (the last array axis) to obtain intensity. No need to do it for grayscale images (e.g. .pgm):

def to_grayscale(arr):
    "If arr is a color image (3D array), convert it to grayscale (2D array)."
    if len(arr.shape) == 3:
        return average(arr, -1)  # average over the last axis (color channels)
    else:
        return arr

Normalization is trivial, you may choose to normalize to [0,1] instead of [0,255]. arr is a SciPy array here, so all operations are element-wise:

def normalize(arr):
    rng = arr.max()-arr.min()
    amin = arr.min()
    return (arr-amin)*255/rng

Run the main function:

if __name__ == "__main__":
    main()

Now you can put this all in a script and run against two images. If we compare image to itself, there is no difference:

$ python compare.py one.jpg one.jpg
Manhattan norm: 0.0 / per pixel: 0.0
Zero norm: 0 / per pixel: 0.0

If we blur the image and compare to the original, there is some difference:

$ python compare.py one.jpg one-blurred.jpg 
Manhattan norm: 92605183.67 / per pixel: 13.4210411116
Zero norm: 6900000 / per pixel: 1.0

P.S. Entire compare.py script.

Update: relevant techniques

As the question is about a video sequence, where frames are likely to be almost the same, and you look for something unusual, I’d like to mention some alternative approaches which may be relevant:

background subtraction and segmentation (to detect foreground objects)
sparse optical flow (to detect motion)
comparing histograms or some other statistics instead of images

I strongly recommend taking a look at “Learning OpenCV” book, Chapters 9 (Image parts and segmentation) and 10 (Tracking and motion). The former teaches to use Background subtraction method, the latter gives some info on optical flow methods. All methods are implemented in OpenCV library. If you use Python, I suggest to use OpenCV ≥ 2.3, and its cv2 Python module.

The most simple version of the background subtraction:

learn the average value μ and standard deviation σ for every pixel of the background
compare current pixel values to the range of (μ-2σ,μ+2σ) or (μ-σ,μ+σ)

More advanced versions make take into account time series for every pixel and handle non-static scenes (like moving trees or grass).

The idea of optical flow is to take two or more frames, and assign velocity vector to every pixel (dense optical flow) or to some of them (sparse optical flow). To estimate sparse optical flow, you may use Lucas-Kanade method (it is also implemented in OpenCV). Obviously, if there is a lot of flow (high average over max values of the velocity field), then something is moving in the frame, and subsequent images are more different.

Comparing histograms may help to detect sudden changes between consecutive frames. This approach was used in Courbon et al, 2010:

Similarity of consecutive frames. The distance between two consecutive frames is measured. If it is too high, it means that the second frame is corrupted and thus the image is eliminated. The Kullback–Leibler distance, or mutual entropy, on the histograms of the two frames:

$d(p,q) = \sum_i p(i) \log (p(i)/q(i))$

where p and q are the histograms of the frames is used. The threshold is fixed on 0.2.

回答 1

一个简单的解决方案：

将图像编码为jpeg，并在filesize中寻找实质性的变化。

我已经用视频缩略图实现了类似的功能，并且取得了很多成功和可扩展性。

A simple solution:

Encode the image as a jpeg and look for a substantial change in filesize.

I’ve implemented something similar with video thumbnails, and had a lot of success and scalability.

回答 2

您可以使用PIL的功能比较两个图像。

import Image
import ImageChops

im1 = Image.open("splash.png")
im2 = Image.open("splash2.png")

diff = ImageChops.difference(im2, im1)

diff对象是一个图像，其中每个像素都是从第一图像减去第二图像中该像素的颜色值的结果。使用差异图像，您可以做几件事。最简单的一种是diff.getbbox()功能。它将告诉您包含两个图像之间所有更改的最小矩形。

您可能也可以使用PIL中的函数来实现此处提到的其他内容的近似值。

You can compare two images using functions from PIL.

import Image
import ImageChops

im1 = Image.open("splash.png")
im2 = Image.open("splash2.png")

diff = ImageChops.difference(im2, im1)

The diff object is an image in which every pixel is the result of the subtraction of the color values of that pixel in the second image from the first image. Using the diff image you can do several things. The simplest one is the diff.getbbox() function. It will tell you the minimal rectangle that contains all the changes between your two images.

You can probably implement approximations of the other stuff mentioned here using functions from PIL as well.

回答 3

两种流行且相对简单的方法是：（a）已经建议的欧几里得距离，或（b）归一化互相关。与简单的互相关相比，归一化互相关趋向于在光照变化方面更加明显。维基百科给出了归一化互相关的公式。也存在更复杂的方法，但是它们需要大量工作。

使用类似numpy的语法，

dist_euclidean = sqrt（sum（（（i1-i2）^ 2））/ i1.size

dist_manhattan = sum（abs（i1-i2））/ i1.size

dist_ncc = sum（（i1-平均值（i1））*（i2-平均值（i2）））/（
  （i1.size-1）* stdev（i1）* stdev（i2））

假设i1和i2是2D灰度图像阵列。

Two popular and relatively simple methods are: (a) the Euclidean distance already suggested, or (b) normalized cross-correlation. Normalized cross-correlation tends to be noticeably more robust to lighting changes than simple cross-correlation. Wikipedia gives a formula for the normalized cross-correlation. More sophisticated methods exist too, but they require quite a bit more work.

Using numpy-like syntax,

dist_euclidean = sqrt(sum((i1 - i2)^2)) / i1.size

dist_manhattan = sum(abs(i1 - i2)) / i1.size

dist_ncc = sum( (i1 - mean(i1)) * (i2 - mean(i2)) ) / (
  (i1.size - 1) * stdev(i1) * stdev(i2) )

assuming that i1 and i2 are 2D grayscale image arrays.

回答 4

尝试的琐碎事情：

将两个图像重新采样为较小的缩略图（例如64 x 64），并将缩略图与特定阈值逐像素进行比较。如果原始图像几乎相同，则重新采样的缩略图将非常相似甚至完全相同。此方法要注意特别是在弱光场景中可能发生的噪声。如果使用灰度，可能会更好。

A trivial thing to try:

Resample both images to small thumbnails (e.g. 64 x 64) and compare the thumbnails pixel-by-pixel with a certain threshold. If the original images are almost the same, the resampled thumbnails will be very similar or even exactly the same. This method takes care of noise that can occur especially in low-light scenes. It may even be better if you go grayscale.

回答 5

我专门解决的问题是如何计算它们是否“足够不同”。我假设您可以弄清楚如何一一减去像素。

首先，我将拍摄一束没有任何变化的图像，并找出仅由于捕获的变化，成像系统中的噪声，JPEG压缩伪像以及照明的瞬时变化而导致的任何像素变化的最大量。。也许您会发现，即使什么都没有发生，也将期待1或2位的差异。

然后，对于“真实”测试，您需要这样的条件：

如果最多P个像素相差不超过E，则相同。

因此，也许，如果E = 0.02，P = 1000，则（大约）意味着，如果任何单个像素变化超过约5个单位（假设为8位图像），或者如果超过1000个，它将是“不同的”像素根本没有任何错误。

这主要是作为一种良好的“分类”技术，可以快速识别足够接近不需要进一步检查的图像。然后，“失败”的图像可能更多地是一种更为复杂/昂贵的技术，例如，如果相机晃动了一点，或者对照明的变化更鲁棒，则不会出现误报。

我运行一个开放源代码项目OpenImageIO，其中包含一个名为“ idiff”的实用程序，该实用程序将差异与类似的阈值进行比较（实际上更加精细）。即使您不想使用此软件，也可能要查看源代码来了解我们是如何做到的。它在商业上已经大量使用，并且开发了此阈值技术，以便我们可以拥有一个用于渲染和图像处理软件的测试套件，其中的“参考图像”可能与平台之间或平台之间的细微差异有所不同。 tha算法，因此我们希望进行“公差内匹配”操作。

I am addressing specifically the question of how to compute if they are “different enough”. I assume you can figure out how to subtract the pixels one by one.

First, I would take a bunch of images with nothing changing, and find out the maximum amount that any pixel changes just because of variations in the capture, noise in the imaging system, JPEG compression artifacts, and moment-to-moment changes in lighting. Perhaps you’ll find that 1 or 2 bit differences are to be expected even when nothing moves.

Then for the “real” test, you want a criterion like this:

same if up to P pixels differ by no more than E.

So, perhaps, if E = 0.02, P = 1000, that would mean (approximately) that it would be “different” if any single pixel changes by more than ~5 units (assuming 8-bit images), or if more than 1000 pixels had any errors at all.

This is intended mainly as a good “triage” technique to quickly identify images that are close enough to not need further examination. The images that “fail” may then more to a more elaborate/expensive technique that wouldn’t have false positives if the camera shook bit, for example, or was more robust to lighting changes.

I run an open source project, OpenImageIO, that contains a utility called “idiff” that compares differences with thresholds like this (even more elaborate, actually). Even if you don’t want to use this software, you may want to look at the source to see how we did it. It’s used commercially quite a bit and this thresholding technique was developed so that we could have a test suite for rendering and image processing software, with “reference images” that might have small differences from platform-to-platform or as we made minor tweaks to tha algorithms, so we wanted a “match within tolerance” operation.

回答 6

我在工作中遇到了类似的问题，我正在重写图像转换端点，并且想检查新版本与旧版本产生的输出相同或几乎相同。所以我这样写：

https://github.com/nicolashahn/diffimg

它在相同大小的图像上并且在每个像素级别上运行，测量每个通道上的值差：R，G，B（，A），取这些通道的平均差，然后对所有像素，并返回比率。

例如，对于一个10×10的白色像素图像，而同一图像但一个像素变为红色，则该像素处的差异为1/3或0.33 …（RGB 0,0,0与255,0,0 ），其他所有像素均为0。如果总共有100个像素，则0.33 … / 100 =图像差异约为0.33％。

我相信这对于OP的项目将是完美的（我意识到这是一个非常老的帖子，但是为将来想要也比较python中的图像的StackOverflowers发布）。

I had a similar problem at work, I was rewriting our image transform endpoint and I wanted to check that the new version was producing the same or nearly the same output as the old version. So I wrote this:

https://github.com/nicolashahn/diffimg

Which operates on images of the same size, and at a per-pixel level, measures the difference in values at each channel: R, G, B(, A), takes the average difference of those channels, and then averages the difference over all pixels, and returns a ratio.

For example, with a 10×10 image of white pixels, and the same image but one pixel has changed to red, the difference at that pixel is 1/3 or 0.33… (RGB 0,0,0 vs 255,0,0) and at all other pixels is 0. With 100 pixels total, 0.33…/100 = a ~0.33% difference in image.

I believe this would work perfectly for OP’s project (I realize this is a very old post now, but posting for future StackOverflowers who also want to compare images in python).

回答 7

给出的大多数答案都不会涉及照明水平。

在进行比较之前，我首先将图像标准化为标准亮度。

Most of the answers given won’t deal with lighting levels.

I would first normalize the image to a standard light level before doing the comparison.

回答 8

衡量两个图像之间相似度的另一种不错的简单方法：

import sys
from skimage.measure import compare_ssim
from skimage.transform import resize
from scipy.ndimage import imread

# get two images - resize both to 1024 x 1024
img_a = resize(imread(sys.argv[1]), (2**10, 2**10))
img_b = resize(imread(sys.argv[2]), (2**10, 2**10))

# score: {-1:1} measure of the structural similarity between the images
score, diff = compare_ssim(img_a, img_b, full=True)
print(score)

如果其他人对比较图像相似性的更强大方法感兴趣，我会组合一个教程和Web 应用程序，以使用Tensorflow测量和可视化相似图像。

Another nice, simple way to measure the similarity between two images:

import sys
from skimage.measure import compare_ssim
from skimage.transform import resize
from scipy.ndimage import imread

# get two images - resize both to 1024 x 1024
img_a = resize(imread(sys.argv[1]), (2**10, 2**10))
img_b = resize(imread(sys.argv[2]), (2**10, 2**10))

# score: {-1:1} measure of the structural similarity between the images
score, diff = compare_ssim(img_a, img_b, full=True)
print(score)

If others are interested in a more powerful way to compare image similarity, I put together a tutorial and web app for measuring and visualizing similar images using Tensorflow.

回答 9

您是否看到过寻找相似图像的算法问题？检查一下以查看建议。

我建议对您的帧进行小波变换（我使用Haar变换为此编写了C扩展）；然后，比较两个图片之间最大（按比例）小波因子的索引，您应该得到数值相似度近似值。

Have you seen the Algorithm for finding similar images question? Check it out to see suggestions.

I would suggest a wavelet transformation of your frames (I’ve written a C extension for that using Haar transformation); then, comparing the indexes of the largest (proportionally) wavelet factors between the two pictures, you should get a numerical similarity approximation.

回答 10

如果为时已晚，我对此表示歉意，但是由于我一直在做类似的事情，所以我认为我可以有所作为。

也许使用OpenCV可以使用模板匹配。假设您正在使用网络摄像头，如您所说：

简化图像（可能是阈值吗？）
应用模板匹配并使用minMaxLoc检查max_val

提示：max_val（或min_val取决于所使用的方法）将为您提供数字，大数字。要获得百分比差异，请对同一张图片使用模板匹配-结果将是您的100％。

伪代码举例说明：

previous_screenshot = ...
current_screenshot = ...

# simplify both images somehow

# get the 100% corresponding value
res = matchTemplate(previous_screenshot, previous_screenshot, TM_CCOEFF)
_, hundred_p_val, _, _ = minMaxLoc(res)

# hundred_p_val is now the 100%

res = matchTemplate(previous_screenshot, current_screenshot, TM_CCOEFF)
_, max_val, _, _ = minMaxLoc(res)

difference_percentage = max_val / hundred_p_val

# the tolerance is now up to you

希望能帮助到你。

I apologize if this is too late to reply, but since I’ve been doing something similar I thought I could contribute somehow.

Maybe with OpenCV you could use template matching. Assuming you’re using a webcam as you said:

Simplify the images (thresholding maybe?)
Apply template matching and check the max_val with minMaxLoc

Tip: max_val (or min_val depending on the method used) will give you numbers, large numbers. To get the difference in percentage, use template matching with the same image — the result will be your 100%.

Pseudo code to exemplify:

previous_screenshot = ...
current_screenshot = ...

# simplify both images somehow

# get the 100% corresponding value
res = matchTemplate(previous_screenshot, previous_screenshot, TM_CCOEFF)
_, hundred_p_val, _, _ = minMaxLoc(res)

# hundred_p_val is now the 100%

res = matchTemplate(previous_screenshot, current_screenshot, TM_CCOEFF)
_, max_val, _, _ = minMaxLoc(res)

difference_percentage = max_val / hundred_p_val

# the tolerance is now up to you

Hope it helps.

回答 11

推土机距离可能正是您需要的。但是，实时实施可能会有点繁重。

Earth movers distance might be exactly what you need. It might be abit heavy to implement in real time though.

回答 12

如何计算两个图像的曼哈顿距离。这样就可以得到n * n个值。然后，您可以执行类似行平均的操作以减少到n个值，并执行一个函数以获得一个单个值。

What about calculating the Manhattan Distance of the two images. That gives you n*n values. Then you could do something like an row average to reduce to n values and a function over that to get one single value.

回答 13

我在三脚架上用同一台相机拍摄的jpg图像的运气很不错，方法是：（1）大大简化（例如从3000像素宽变为100像素或更小）（2）将每个jpg数组展平为一个向量（3）使用简单的相关算法对序列图像进行成对相关以获取相关系数（4）对相关系数进行平方以得到r平方（即，一幅图像中的可变性分数由下一幅图像的变化解释）（5）如果r-square <0.9，则表示两个图像不同，并且两者之间发生了某些情况。

在我的实施中，这是强大且快速的（Mathematica 7）

值得一试的是，将您感兴趣的图像部分并通过将所有图像裁剪到该小区域来专注于该部分，否则将错过与相机遥远但重要的更改。

我不知道如何使用Python，但是可以确定它也具有相关性，不是吗？

I have been having a lot of luck with jpg images taken with the same camera on a tripod by (1) simplifying greatly (like going from 3000 pixels wide to 100 pixels wide or even fewer) (2) flattening each jpg array into a single vector (3) pairwise correlating sequential images with a simple correlate algorithm to get correlation coefficient (4) squaring correlation coefficient to get r-square (i.e fraction of variability in one image explained by variation in the next) (5) generally in my application if r-square < 0.9, I say the two images are different and something happened in between.

This is robust and fast in my implementation (Mathematica 7)

It’s worth playing around with the part of the image you are interested in and focussing on that by cropping all images to that little area, otherwise a distant-from-the-camera but important change will be missed.

I don’t know how to use Python, but am sure it does correlations, too, no?

回答 14

您可以计算两个图像的直方图，然后计算Bhattacharyya系数，这是一种非常快速的算法，我用它来检测板球视频中的镜头变化（在C中使用openCV）

you can compute the histogram of both the images and then calculate the Bhattacharyya Coefficient, this is a very fast algorithm and I have used it to detect shot changes in a cricket video (in C using openCV)

回答 15

查看isk-daemon如何实现Haar Wavelets 。您可以使用它的imgdb C ++代码即时计算图像之间的差异：

isk-daemon是一个开源数据库服务器，能够将基于内容的（可视）图像搜索添加到任何与图像相关的网站或软件中。

这项技术允许任何与图像相关的网站或软件的用户在小部件上草绘他们想要查找的图像，并让网站将最相似的图像回复给他们，或者在每个图像详细信息页面上简单地请求更多相似的照片。

Check out how Haar Wavelets are implemented by isk-daemon. You could use it’s imgdb C++ code to calculate the difference between images on-the-fly:

isk-daemon is an open source database server capable of adding content-based (visual) image searching to any image related website or software.

This technology allows users of any image-related website or software to sketch on a widget which image they want to find and have the website reply to them the most similar images or simply request for more similar photos at each image detail page.

回答 16

我遇到了同样的问题，并编写了一个简单的python模块，该模块使用枕头的ImageChops比较两个相同尺寸的图像，以创建黑白差异图像，并对直方图值求和。

您可以直接获得此分数，也可以得到与全黑对比白对比的百分比值。

它还包含一个简单的is_equal函数，可以在图像通过（并包括）相等的情况下提供模糊阈值。

该方法不是很复杂，但是可能对于其他在同一问题上苦苦挣扎的人有用。

https://pypi.python.org/pypi/imgcompare/

I had the same problem and wrote a simple python module which compares two same-size images using pillow’s ImageChops to create a black/white diff image and sums up the histogram values.

You can get either this score directly, or a percentage value compared to a full black vs. white diff.

It also contains a simple is_equal function, with the possibility to supply a fuzzy-threshold under (and including) the image passes as equal.

The approach is not very elaborate, but maybe is of use for other out there struggling with the same issue.

https://pypi.python.org/pypi/imgcompare/

回答 17

一种更有原则的方法是使用全局描述符来比较图像，例如GIST或CENTRIST。散列函数，如所描述这里，还提供了类似的解决方案。

A somewhat more principled approach is to use a global descriptor to compare images, such as GIST or CENTRIST. A hash function, as described here, also provides a similar solution.

回答 18

import os
from PIL import Image
from PIL import ImageFile
import imagehash
  
#just use to the size diferent picture
def compare_image(img_file1, img_file2):
    if img_file1 == img_file2:
        return True
    fp1 = open(img_file1, 'rb')
    fp2 = open(img_file2, 'rb')

    img1 = Image.open(fp1)
    img2 = Image.open(fp2)

    ImageFile.LOAD_TRUNCATED_IMAGES = True
    b = img1 == img2

    fp1.close()
    fp2.close()

    return b





#through picturu hash to compare
def get_hash_dict(dir):
    hash_dict = {}
    image_quantity = 0
    for _, _, files in os.walk(dir):
        for i, fileName in enumerate(files):
            with open(dir + fileName, 'rb') as fp:
                hash_dict[dir + fileName] = imagehash.average_hash(Image.open(fp))
                image_quantity += 1

    return hash_dict, image_quantity

def compare_image_with_hash(image_file_name_1, image_file_name_2, max_dif=0):
    """
    max_dif: The maximum hash difference is allowed, the smaller and more accurate, the minimum is 0.
    recommend to use
    """
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    hash_1 = None
    hash_2 = None
    with open(image_file_name_1, 'rb') as fp:
        hash_1 = imagehash.average_hash(Image.open(fp))
    with open(image_file_name_2, 'rb') as fp:
        hash_2 = imagehash.average_hash(Image.open(fp))
    dif = hash_1 - hash_2
    if dif < 0:
        dif = -dif
    if dif <= max_dif:
        return True
    else:
        return False


def compare_image_dir_with_hash(dir_1, dir_2, max_dif=0):
    """
    max_dif: The maximum hash difference is allowed, the smaller and more accurate, the minimum is 0.

    """
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    hash_dict_1, image_quantity_1 = get_hash_dict(dir_1)
    hash_dict_2, image_quantity_2 = get_hash_dict(dir_2)

    if image_quantity_1 > image_quantity_2:
        tmp = image_quantity_1
        image_quantity_1 = image_quantity_2
        image_quantity_2 = tmp

        tmp = hash_dict_1
        hash_dict_1 = hash_dict_2
        hash_dict_2 = tmp

    result_dict = {}

    for k in hash_dict_1.keys():
        result_dict[k] = None

    for dif_i in range(0, max_dif + 1):
        have_none = False

        for k_1 in result_dict.keys():
            if result_dict.get(k_1) is None:
                have_none = True

        if not have_none:
            return result_dict

        for k_1, v_1 in hash_dict_1.items():
            for k_2, v_2 in hash_dict_2.items():
                sub = (v_1 - v_2)
                if sub < 0:
                    sub = -sub
                if sub == dif_i and result_dict.get(k_1) is None:
                    result_dict[k_1] = k_2
                    break
    return result_dict


def main():
    print(compare_image('image1\\815.jpg', 'image2\\5.jpg'))
    print(compare_image_with_hash('image1\\815.jpg', 'image2\\5.jpg', 7))
    r = compare_image_dir_with_hash('image1\\', 'image2\\', 10)
    for k in r.keys():
        print(k, r.get(k))


if __name__ == '__main__':
    main()

输出：

假
真
image2 \ 5.jpg image1 \ 815.jpg
image2 \ 6.jpg image1 \ 819.jpg
image2 \ 7.jpg image1 \ 900.jpg
image2 \ 8.jpg image1 \ 998.jpg
image2 \ 9.jpg image1 \ 1012 .jpg
示例图片：
- 815.jpg
- 5.jpg

import os
from PIL import Image
from PIL import ImageFile
import imagehash
  
#just use to the size diferent picture
def compare_image(img_file1, img_file2):
    if img_file1 == img_file2:
        return True
    fp1 = open(img_file1, 'rb')
    fp2 = open(img_file2, 'rb')

    img1 = Image.open(fp1)
    img2 = Image.open(fp2)

    ImageFile.LOAD_TRUNCATED_IMAGES = True
    b = img1 == img2

    fp1.close()
    fp2.close()

    return b





#through picturu hash to compare
def get_hash_dict(dir):
    hash_dict = {}
    image_quantity = 0
    for _, _, files in os.walk(dir):
        for i, fileName in enumerate(files):
            with open(dir + fileName, 'rb') as fp:
                hash_dict[dir + fileName] = imagehash.average_hash(Image.open(fp))
                image_quantity += 1

    return hash_dict, image_quantity

def compare_image_with_hash(image_file_name_1, image_file_name_2, max_dif=0):
    """
    max_dif: The maximum hash difference is allowed, the smaller and more accurate, the minimum is 0.
    recommend to use
    """
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    hash_1 = None
    hash_2 = None
    with open(image_file_name_1, 'rb') as fp:
        hash_1 = imagehash.average_hash(Image.open(fp))
    with open(image_file_name_2, 'rb') as fp:
        hash_2 = imagehash.average_hash(Image.open(fp))
    dif = hash_1 - hash_2
    if dif < 0:
        dif = -dif
    if dif <= max_dif:
        return True
    else:
        return False


def compare_image_dir_with_hash(dir_1, dir_2, max_dif=0):
    """
    max_dif: The maximum hash difference is allowed, the smaller and more accurate, the minimum is 0.

    """
    ImageFile.LOAD_TRUNCATED_IMAGES = True
    hash_dict_1, image_quantity_1 = get_hash_dict(dir_1)
    hash_dict_2, image_quantity_2 = get_hash_dict(dir_2)

    if image_quantity_1 > image_quantity_2:
        tmp = image_quantity_1
        image_quantity_1 = image_quantity_2
        image_quantity_2 = tmp

        tmp = hash_dict_1
        hash_dict_1 = hash_dict_2
        hash_dict_2 = tmp

    result_dict = {}

    for k in hash_dict_1.keys():
        result_dict[k] = None

    for dif_i in range(0, max_dif + 1):
        have_none = False

        for k_1 in result_dict.keys():
            if result_dict.get(k_1) is None:
                have_none = True

        if not have_none:
            return result_dict

        for k_1, v_1 in hash_dict_1.items():
            for k_2, v_2 in hash_dict_2.items():
                sub = (v_1 - v_2)
                if sub < 0:
                    sub = -sub
                if sub == dif_i and result_dict.get(k_1) is None:
                    result_dict[k_1] = k_2
                    break
    return result_dict


def main():
    print(compare_image('image1\\815.jpg', 'image2\\5.jpg'))
    print(compare_image_with_hash('image1\\815.jpg', 'image2\\5.jpg', 7))
    r = compare_image_dir_with_hash('image1\\', 'image2\\', 10)
    for k in r.keys():
        print(k, r.get(k))


if __name__ == '__main__':
    main()

output:

False
True
image2\5.jpg image1\815.jpg
image2\6.jpg image1\819.jpg
image2\7.jpg image1\900.jpg
image2\8.jpg image1\998.jpg
image2\9.jpg image1\1012.jpg
the example pictures:
- 815.jpg
- 5.jpg

回答 19

我认为您可以简单地计算两个图像的亮度之间的欧几里德距离（即sqrt（差异平方和，逐个像素）），并且如果这低于某个经验阈值，则认为它们相等。而且，您最好将其包装为C函数。

I think you could simply compute the euclidean distance (i.e. sqrt(sum of squares of differences, pixel by pixel)) between the luminance of the two images, and consider them equal if this falls under some empirical threshold. And you would better do it wrapping a C function.

回答 20

有很多指标可用于评估两个图像的外观/外观。

在这里我将不讨论任何代码，因为我认为这应该是科学问题，而不是技术问题。

通常，问题与人对图像的感知有关，因此每种算法都对人的视觉系统特征有支持。

经典方法是：

可见差异预测器：一种评估图像保真度的算法（https://www.spiedigitallibrary.org/conference-proceedings-of-spie/1666/0000/Visible-differences-predictor–an-algorithm-for-the-评估/10.1117/12.135952.short?SSO=1）

图像质量评估：从错误可见性到结构相似性（http://www.cns.nyu.edu/pub/lcv/wang03-reprint.pdf）

FSIM：用于图像质量评估的特征相似性索引（https://www4.comp.polyu.edu.hk/~cslzhang/IQA/TIP_IQA_FSIM.pdf）

其中，SSIM（图像质量评估：从错误可见性到结构相似性）是最容易计算的，并且开销也很小，如另一篇论文《基于梯度相似性的图像质量评估》（https：//www.semanticscholar .org / paper /基于渐变效果的图像质量评估-刘林/ 2b819bef80c02d5d4cb56f27b202535e119df988）。

还有许多其他方法。如果您对艺术真正感兴趣，可以看看Google学术搜索并搜索诸如“视觉差异”，“图像质量评估”之类的内容。

There are many metrics out there for evaluating whether two images look like/how much they look like.

I will not go into any code here, because I think it should be a scientific problem, other than a technical problem.

Generally, the question is related to human’s perception on images, so each algorithm has its support on human visual system traits.

Classic approaches are:

Visible differences predictor: an algorithm for the assessment of image fidelity (https://www.spiedigitallibrary.org/conference-proceedings-of-spie/1666/0000/Visible-differences-predictor–an-algorithm-for-the-assessment-of/10.1117/12.135952.short?SSO=1)

Image Quality Assessment: From Error Visibility to Structural Similarity (http://www.cns.nyu.edu/pub/lcv/wang03-reprint.pdf)

FSIM: A Feature Similarity Index for Image Quality Assessment (https://www4.comp.polyu.edu.hk/~cslzhang/IQA/TIP_IQA_FSIM.pdf)

Among them, SSIM (Image Quality Assessment: From Error Visibility to Structural Similarity ) is the easiest to calculate and its overhead is also small, as reported in another paper “Image Quality Assessment Based on Gradient Similarity” (https://www.semanticscholar.org/paper/Image-Quality-Assessment-Based-on-Gradient-Liu-Lin/2b819bef80c02d5d4cb56f27b202535e119df988).

There are many more other approaches. Take a look at Google Scholar and search for something like “visual difference”, “image quality assessment”, etc, if you are interested/really care about the art.

回答 21

有一种使用numpy的简单快速的解决方案，可通过计算均方误差来解决：

before = np.array(get_picture())
while True:
    now = np.array(get_picture())
    MSE = np.mean((now - before)**2)

    if  MSE > threshold:
        break

    before = now

There’s a simple and fast solution using numpy by calculating mean squared error:

before = np.array(get_picture())
while True:
    now = np.array(get_picture())
    MSE = np.mean((now - before)**2)

    if  MSE > threshold:
        break

    before = now

知识问答

表示并解决给定图像的迷宫

2021年7月31日 Python实用宝典

问题：表示并解决给定图像的迷宫

代表并解决给定图像的迷宫的最佳方法是什么？

给定JPEG图像（如上所示），读入，将其解析为某种数据结构并解决迷宫的最佳方法是什么？我的第一个本能是逐像素读取图像并将其存储在布尔值列表（数组）中：True对于白色像素，False对于非白色像素（可以丢弃颜色）。这种方法的问题在于图像可能不是“像素完美”的。我的意思只是说，如果墙壁上的某处有白色像素，可能会产生意外的路径。

另一种方法（经过一番思考后才想到）是将图像转换为SVG文件-SVG文件是在画布上绘制的路径的列表。这样，可以将路径读入相同种类的列表（布尔值），其中True表示路径或墙壁，False表示可移动的空间。如果转换不是100％准确，并且不能完全连接所有墙，从而产生间隙，则此方法会出现问题。

转换为SVG的另一个问题是这些线不是“完美”的直线。这导致路径是三次贝塞尔曲线。使用由整数索引的布尔值列表（数组），曲线将不易转移，并且必须计算曲线上直线的所有点，但不会与列表索引完全匹配。

我假设虽然其中一种方法可能会（虽然可能不会）起作用，但考虑到如此大的图像，它们的效率很低，并且存在更好的方法。如何做到最好（最有效和/或最低复杂度）？有没有最好的方法？

然后是迷宫的解决。如果我使用前两种方法中的任何一种，则基本上将得到一个矩阵。根据该答案，表示迷宫的一种好方法是使用树，而使用A *算法来解决它的好方法。一个人如何根据图像创建一棵树？有任何想法吗？

TL; DR
解析的最佳方法？变成什么数据结构？所述结构将如何帮助/阻碍解决？

更新
我已尝试使用numpy@Thomas建议的方式实现@Mikhail用Python编写的内容。我认为该算法是正确的，但无法正常运行。（下面的代码。）PNG库是PyPNG。

import png, numpy, Queue, operator, itertools

def is_white(coord, image):
  """ Returns whether (x, y) is approx. a white pixel."""
  a = True
  for i in xrange(3):
    if not a: break
    a = image[coord[1]][coord[0] * 3 + i] > 240
  return a

def bfs(s, e, i, visited):
  """ Perform a breadth-first search. """
  frontier = Queue.Queue()
  while s != e:
    for d in [(-1, 0), (0, -1), (1, 0), (0, 1)]:
      np = tuple(map(operator.add, s, d))
      if is_white(np, i) and np not in visited:
        frontier.put(np)
    visited.append(s)
    s = frontier.get()
  return visited

def main():
  r = png.Reader(filename = "thescope-134.png")
  rows, cols, pixels, meta = r.asDirect()
  assert meta['planes'] == 3 # ensure the file is RGB
  image2d = numpy.vstack(itertools.imap(numpy.uint8, pixels))
  start, end = (402, 985), (398, 27)
  print bfs(start, end, image2d, [])

What is the best way to represent and solve a maze given an image?

The cover image of The Scope Issue 134

Given an JPEG image (as seen above), what’s the best way to read it in, parse it into some data structure and solve the maze? My first instinct is to read the image in pixel by pixel and store it in a list (array) of boolean values: True for a white pixel, and False for a non-white pixel (the colours can be discarded). The issue with this method, is that the image may not be “pixel perfect”. By that I simply mean that if there is a white pixel somewhere on a wall it may create an unintended path.

Another method (which came to me after a bit of thought) is to convert the image to an SVG file – which is a list of paths drawn on a canvas. This way, the paths could be read into the same sort of list (boolean values) where True indicates a path or wall, False indicating a travel-able space. An issue with this method arises if the conversion is not 100% accurate, and does not fully connect all of the walls, creating gaps.

Also an issue with converting to SVG is that the lines are not “perfectly” straight. This results in the paths being cubic bezier curves. With a list (array) of boolean values indexed by integers, the curves would not transfer easily, and all the points that line on the curve would have to be calculated, but won’t exactly match to list indices.

I assume that while one of these methods may work (though probably not) that they are woefully inefficient given such a large image, and that there exists a better way. How is this best (most efficiently and/or with the least complexity) done? Is there even a best way?

Then comes the solving of the maze. If I use either of the first two methods, I will essentially end up with a matrix. According to this answer, a good way to represent a maze is using a tree, and a good way to solve it is using the A* algorithm. How would one create a tree from the image? Any ideas?

TL;DR
Best way to parse? Into what data structure? How would said structure help/hinder solving?

UPDATE
I’ve tried my hand at implementing what @Mikhail has written in Python, using numpy, as @Thomas recommended. I feel that the algorithm is correct, but it’s not working as hoped. (Code below.) The PNG library is PyPNG.

import png, numpy, Queue, operator, itertools

def is_white(coord, image):
  """ Returns whether (x, y) is approx. a white pixel."""
  a = True
  for i in xrange(3):
    if not a: break
    a = image[coord[1]][coord[0] * 3 + i] > 240
  return a

def bfs(s, e, i, visited):
  """ Perform a breadth-first search. """
  frontier = Queue.Queue()
  while s != e:
    for d in [(-1, 0), (0, -1), (1, 0), (0, 1)]:
      np = tuple(map(operator.add, s, d))
      if is_white(np, i) and np not in visited:
        frontier.put(np)
    visited.append(s)
    s = frontier.get()
  return visited

def main():
  r = png.Reader(filename = "thescope-134.png")
  rows, cols, pixels, meta = r.asDirect()
  assert meta['planes'] == 3 # ensure the file is RGB
  image2d = numpy.vstack(itertools.imap(numpy.uint8, pixels))
  start, end = (402, 985), (398, 27)
  print bfs(start, end, image2d, [])

回答 0

这是一个解决方案。

将图像转换为灰度（尚未二进制），调整颜色的权重，以使最终的灰度图像大致均匀。您只需在Photoshop中控制图像->调整->黑白中的滑块即可完成此操作。
通过在Photoshop中的“图像”->“调整”->“阈值”中设置适当的阈值，将图像转换为二进制。
确保正确选择阈值。使用魔术棒工具，公差为0，点采样，连续，无抗锯齿。检查选择中断处的边不是由错误阈值引入的错误边。实际上，从一开始就可以访问此迷宫的所有内部点。
在迷宫上添加人工边界，以确保虚拟旅行者不会在它周围走动:)
以您喜欢的语言实现广度优先搜索（BFS），并从头开始运行它。我更喜欢MATLAB来完成这项任务。正如@Thomas已经提到的那样，无需弄乱图的常规表示。您可以直接使用二值化图像。

这是BFS的MATLAB代码：

function path = solve_maze(img_file)
  %% Init data
  img = imread(img_file);
  img = rgb2gray(img);
  maze = img > 0;
  start = [985 398];
  finish = [26 399];

  %% Init BFS
  n = numel(maze);
  Q = zeros(n, 2);
  M = zeros([size(maze) 2]);
  front = 0;
  back = 1;

  function push(p, d)
    q = p + d;
    if maze(q(1), q(2)) && M(q(1), q(2), 1) == 0
      front = front + 1;
      Q(front, :) = q;
      M(q(1), q(2), :) = reshape(p, [1 1 2]);
    end
  end

  push(start, [0 0]);

  d = [0 1; 0 -1; 1 0; -1 0];

  %% Run BFS
  while back <= front
    p = Q(back, :);
    back = back + 1;
    for i = 1:4
      push(p, d(i, :));
    end
  end

  %% Extracting path
  path = finish;
  while true
    q = path(end, :);
    p = reshape(M(q(1), q(2), :), 1, 2);
    path(end + 1, :) = p;
    if isequal(p, start) 
      break;
    end
  end
end

它确实非常简单和标准，因此在Python或其他任何方式中实现它应该没有困难。

这是答案：

Here is a solution.

Convert image to grayscale (not yet binary), adjusting weights for the colors so that final grayscale image is approximately uniform. You can do it simply by controlling sliders in Photoshop in Image -> Adjustments -> Black & White.
Convert image to binary by setting appropriate threshold in Photoshop in Image -> Adjustments -> Threshold.
Make sure threshold is selected right. Use the Magic Wand Tool with 0 tolerance, point sample, contiguous, no anti-aliasing. Check that edges at which selection breaks are not false edges introduced by wrong threshold. In fact, all interior points of this maze are accessible from the start.
Add artificial borders on the maze to make sure virtual traveler will not walk around it :)
Implement breadth-first search (BFS) in your favorite language and run it from the start. I prefer MATLAB for this task. As @Thomas already mentioned, there is no need to mess with regular representation of graphs. You can work with binarized image directly.

Here is the MATLAB code for BFS:

function path = solve_maze(img_file)
  %% Init data
  img = imread(img_file);
  img = rgb2gray(img);
  maze = img > 0;
  start = [985 398];
  finish = [26 399];

  %% Init BFS
  n = numel(maze);
  Q = zeros(n, 2);
  M = zeros([size(maze) 2]);
  front = 0;
  back = 1;

  function push(p, d)
    q = p + d;
    if maze(q(1), q(2)) && M(q(1), q(2), 1) == 0
      front = front + 1;
      Q(front, :) = q;
      M(q(1), q(2), :) = reshape(p, [1 1 2]);
    end
  end

  push(start, [0 0]);

  d = [0 1; 0 -1; 1 0; -1 0];

  %% Run BFS
  while back <= front
    p = Q(back, :);
    back = back + 1;
    for i = 1:4
      push(p, d(i, :));
    end
  end

  %% Extracting path
  path = finish;
  while true
    q = path(end, :);
    p = reshape(M(q(1), q(2), :), 1, 2);
    path(end + 1, :) = p;
    if isequal(p, start) 
      break;
    end
  end
end

It is really very simple and standard, there should not be difficulties on implementing this in Python or whatever.

And here is the answer:

回答 1

该解决方案是用Python编写的。感谢米哈伊尔（Mikhail）提供有关图像准备工作的指导。

动画式广度优先搜索：

完成的迷宫：

#!/usr/bin/env python

import sys

from Queue import Queue
from PIL import Image

start = (400,984)
end = (398,25)

def iswhite(value):
    if value == (255,255,255):
        return True

def getadjacent(n):
    x,y = n
    return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]

def BFS(start, end, pixels):

    queue = Queue()
    queue.put([start]) # Wrapping the start tuple in a list

    while not queue.empty():

        path = queue.get() 
        pixel = path[-1]

        if pixel == end:
            return path

        for adjacent in getadjacent(pixel):
            x,y = adjacent
            if iswhite(pixels[x,y]):
                pixels[x,y] = (127,127,127) # see note
                new_path = list(path)
                new_path.append(adjacent)
                queue.put(new_path)

    print "Queue has been exhausted. No answer was found."


if __name__ == '__main__':

    # invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
    base_img = Image.open(sys.argv[1])
    base_pixels = base_img.load()

    path = BFS(start, end, base_pixels)

    path_img = Image.open(sys.argv[1])
    path_pixels = path_img.load()

    for position in path:
        x,y = position
        path_pixels[x,y] = (255,0,0) # red

    path_img.save(sys.argv[2])

注意：标记为白色的访问像素为灰色。这消除了对访问列表的需求，但是这需要在绘制路径之前从磁盘中第二次加载图像文件（如果您不希望使用最终路径和所有路径的合成图像）。

我使用的迷宫的空白版本。

This solution is written in Python. Thanks Mikhail for the pointers on the image preparation.

An animated Breadth-First Search:

Animated version of BFS

The Completed Maze:

Completed Maze

#!/usr/bin/env python

import sys

from Queue import Queue
from PIL import Image

start = (400,984)
end = (398,25)

def iswhite(value):
    if value == (255,255,255):
        return True

def getadjacent(n):
    x,y = n
    return [(x-1,y),(x,y-1),(x+1,y),(x,y+1)]

def BFS(start, end, pixels):

    queue = Queue()
    queue.put([start]) # Wrapping the start tuple in a list

    while not queue.empty():

        path = queue.get() 
        pixel = path[-1]

        if pixel == end:
            return path

        for adjacent in getadjacent(pixel):
            x,y = adjacent
            if iswhite(pixels[x,y]):
                pixels[x,y] = (127,127,127) # see note
                new_path = list(path)
                new_path.append(adjacent)
                queue.put(new_path)

    print "Queue has been exhausted. No answer was found."


if __name__ == '__main__':

    # invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]
    base_img = Image.open(sys.argv[1])
    base_pixels = base_img.load()

    path = BFS(start, end, base_pixels)

    path_img = Image.open(sys.argv[1])
    path_pixels = path_img.load()

    for position in path:
        x,y = position
        path_pixels[x,y] = (255,0,0) # red

    path_img.save(sys.argv[2])

Note: Marks a white visited pixel grey. This removes the need for a visited list, but this requires a second load of the image file from disk before drawing a path (if you don’t want a composite image of the final path and ALL paths taken).

A blank version of the maze I used.

回答 2

我尝试自己实施A-Star搜索以解决此问题。紧跟着约瑟夫·科恩（Joseph Kern）在此处给出的框架和算法伪代码的实现：

def AStar(start, goal, neighbor_nodes, distance, cost_estimate):
    def reconstruct_path(came_from, current_node):
        path = []
        while current_node is not None:
            path.append(current_node)
            current_node = came_from[current_node]
        return list(reversed(path))

    g_score = {start: 0}
    f_score = {start: g_score[start] + cost_estimate(start, goal)}
    openset = {start}
    closedset = set()
    came_from = {start: None}

    while openset:
        current = min(openset, key=lambda x: f_score[x])
        if current == goal:
            return reconstruct_path(came_from, goal)
        openset.remove(current)
        closedset.add(current)
        for neighbor in neighbor_nodes(current):
            if neighbor in closedset:
                continue
            if neighbor not in openset:
                openset.add(neighbor)
            tentative_g_score = g_score[current] + distance(current, neighbor)
            if tentative_g_score >= g_score.get(neighbor, float('inf')):
                continue
            came_from[neighbor] = current
            g_score[neighbor] = tentative_g_score
            f_score[neighbor] = tentative_g_score + cost_estimate(neighbor, goal)
    return []

由于A-Star是一种启发式搜索算法，因此您需要提供一个函数，该函数可以估算直到达到目标之前的剩余成本（此处为距离）。除非您对次优解决方案感到满意，否则不要高估成本。此处的保守选择是曼哈顿（或出租车）距离，因为这表示所用冯·诺依曼邻域在网格上两点之间的直线距离。（在这种情况下，永远都不会高估成本。）

但是，这将大大低估手边给定迷宫的实际成本。因此，我添加了其他两个距离度量标准，即欧几里得距离和曼哈顿距离乘以4进行比较。但是，这些可能会高估实际成本，因此可能会产生次优的结果。

这是代码：

import sys
from PIL import Image

def is_blocked(p):
    x,y = p
    pixel = path_pixels[x,y]
    if any(c < 225 for c in pixel):
        return True
def von_neumann_neighbors(p):
    x, y = p
    neighbors = [(x-1, y), (x, y-1), (x+1, y), (x, y+1)]
    return [p for p in neighbors if not is_blocked(p)]
def manhattan(p1, p2):
    return abs(p1[0]-p2[0]) + abs(p1[1]-p2[1])
def squared_euclidean(p1, p2):
    return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2

start = (400, 984)
goal = (398, 25)

# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]

path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()

distance = manhattan
heuristic = manhattan

path = AStar(start, goal, von_neumann_neighbors, distance, heuristic)

for position in path:
    x,y = position
    path_pixels[x,y] = (255,0,0) # red

path_img.save(sys.argv[2])

这是可视化结果的一些图像（灵感来自Joseph Kern发布的图像）。动画在主while循环进行10000次迭代后分别显示一个新帧。

广度优先搜索：

曼哈顿A级明星距离：

A星平方欧几里德距离：

曼哈顿A级明星距离乘以4：

结果表明，对于使用的启发式方法，迷宫的探索区域差异很大。因此，平方欧几里德距离甚至会产生与其他度量不同的（次优）路径。

关于A-Star算法在终止之前的运行时间方面的性能，请注意，与距离和成本函数相比，很多评估相加，而广度优先搜索（BFS）只需评估目标的“目标”每个候选职位。这些附加功能评估（A-Star）的成本是否超过了要检查的大量节点（BFS）的成本，尤其是性能是否对您的应用程序来说完全是一个问题，这取决于个人的看法。当然不能普遍回答。

一件事，可以在一般的说一下是否知情的搜索算法（如A-星）可能比穷举搜索（例如，BFS）是更好的选择如下。随着迷宫的维数，即搜索树的分支因子，穷举搜索（穷举搜索）的缺点呈指数增长。随着复杂性的增加，这样做变得越来越不可行，并且在某种程度上，您对任何结果路径都非常满意，无论它是否（近似）最佳。

I tried myself implementing A-Star search for this problem. Followed closely the implementation by Joseph Kern for the framework and the algorithm pseudocode given here:

def AStar(start, goal, neighbor_nodes, distance, cost_estimate):
    def reconstruct_path(came_from, current_node):
        path = []
        while current_node is not None:
            path.append(current_node)
            current_node = came_from[current_node]
        return list(reversed(path))

    g_score = {start: 0}
    f_score = {start: g_score[start] + cost_estimate(start, goal)}
    openset = {start}
    closedset = set()
    came_from = {start: None}

    while openset:
        current = min(openset, key=lambda x: f_score[x])
        if current == goal:
            return reconstruct_path(came_from, goal)
        openset.remove(current)
        closedset.add(current)
        for neighbor in neighbor_nodes(current):
            if neighbor in closedset:
                continue
            if neighbor not in openset:
                openset.add(neighbor)
            tentative_g_score = g_score[current] + distance(current, neighbor)
            if tentative_g_score >= g_score.get(neighbor, float('inf')):
                continue
            came_from[neighbor] = current
            g_score[neighbor] = tentative_g_score
            f_score[neighbor] = tentative_g_score + cost_estimate(neighbor, goal)
    return []

As A-Star is a heuristic search algorithm you need to come up with a function that estimates the remaining cost (here: distance) until the goal is reached. Unless you’re comfortable with a suboptimal solution it should not overestimate the cost. A conservative choice would here be the manhattan (or taxicab) distance as this represents the straight-line distance between two points on the grid for the used Von Neumann neighborhood. (Which, in this case, wouldn’t ever overestimate the cost.)

This would however significantly underestimate the actual cost for the given maze at hand. Therefore I’ve added two other distance metrics squared euclidean distance and the manhattan distance multiplied by four for comparison. These however might overestimate the actual cost, and might therefore yield suboptimal results.

Here’s the code:

import sys
from PIL import Image

def is_blocked(p):
    x,y = p
    pixel = path_pixels[x,y]
    if any(c < 225 for c in pixel):
        return True
def von_neumann_neighbors(p):
    x, y = p
    neighbors = [(x-1, y), (x, y-1), (x+1, y), (x, y+1)]
    return [p for p in neighbors if not is_blocked(p)]
def manhattan(p1, p2):
    return abs(p1[0]-p2[0]) + abs(p1[1]-p2[1])
def squared_euclidean(p1, p2):
    return (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2

start = (400, 984)
goal = (398, 25)

# invoke: python mazesolver.py <mazefile> <outputfile>[.jpg|.png|etc.]

path_img = Image.open(sys.argv[1])
path_pixels = path_img.load()

distance = manhattan
heuristic = manhattan

path = AStar(start, goal, von_neumann_neighbors, distance, heuristic)

for position in path:
    x,y = position
    path_pixels[x,y] = (255,0,0) # red

path_img.save(sys.argv[2])

Here are some images for a visualization of the results (inspired by the one posted by Joseph Kern). The animations show a new frame each after 10000 iterations of the main while-loop.

Breadth-First Search:

Breadth-First Search

A-Star Manhattan Distance:

A-Star Manhattan Distance

A-Star Squared Euclidean Distance:

A-Star Squared Euclidean Distance

A-Star Manhattan Distance multiplied by four:

A-Star Manhattan Distance multiplied by four

The results show that the explored regions of the maze differ considerably for the heuristics being used. As such, squared euclidean distance even produces a different (suboptimal) path as the other metrics.

Concerning the performance of the A-Star algorithm in terms of the runtime until termination, note that a lot of evaluation of distance and cost functions add up compared to the Breadth-First Search (BFS) which only needs to evaluate the “goaliness” of each candidate position. Whether or not the cost for these additional function evaluations (A-Star) outweighs the cost for the larger number of nodes to check (BFS) and especially whether or not performance is an issue for your application at all, is a matter of individual perception and can of course not be generally answered.

A thing that can be said in general about whether or not an informed search algorithm (such as A-Star) could be the better choice compared to an exhaustive search (e.g., BFS) is the following. With the number of dimensions of the maze, i.e., the branching factor of the search tree, the disadvantage of an exhaustive search (to search exhaustively) grows exponentially. With growing complexity it becomes less and less feasible to do so and at some point you are pretty much happy with any result path, be it (approximately) optimal or not.

回答 3

树搜索太多。迷宫沿溶液路径固有地可分离。

（感谢Reddit的rainman002向我指出了这一点。）

因此，您可以快速使用连接的组件来识别迷宫墙的连接部分。这会在像素上迭代两次。

如果要将其转换成解决方案路径的理想示意图，则可以将二进制操作与结构化元素一起使用，以填充每个连接区域的“死角”路径。

下面是MATLAB的演示代码。它可以使用调整来更好地清理结果，使其更通用，并使其运行更快。（有时不是2:30 AM。）

% read in and invert the image
im = 255 - imread('maze.jpg');

% sharpen it to address small fuzzy channels
% threshold to binary 15%
% run connected components
result = bwlabel(im2bw(imfilter(im,fspecial('unsharp')),0.15));

% purge small components (e.g. letters)
for i = 1:max(reshape(result,1,1002*800))
    [count,~] = size(find(result==i));
    if count < 500
        result(result==i) = 0;
    end
end

% close dead-end channels
closed = zeros(1002,800);
for i = 1:max(reshape(result,1,1002*800))
    k = zeros(1002,800);
    k(result==i) = 1; k = imclose(k,strel('square',8));
    closed(k==1) = i;
end

% do output
out = 255 - im;
for x = 1:1002
    for y = 1:800
        if closed(x,y) == 0
            out(x,y,:) = 0;
        end
    end
end
imshow(out);

Tree search is too much. The maze is inherently separable along the solution path(s).

(Thanks to rainman002 from Reddit for pointing this out to me.)

Because of this, you can quickly use connected components to identify the connected sections of maze wall. This iterates over the pixels twice.

If you want to turn that into a nice diagram of the solution path(s), you can then use binary operations with structuring elements to fill in the “dead end” pathways for each connected region.

Demo code for MATLAB follows. It could use tweaking to clean up the result better, make it more generalizable, and make it run faster. (Sometime when it’s not 2:30 AM.)

% read in and invert the image
im = 255 - imread('maze.jpg');

% sharpen it to address small fuzzy channels
% threshold to binary 15%
% run connected components
result = bwlabel(im2bw(imfilter(im,fspecial('unsharp')),0.15));

% purge small components (e.g. letters)
for i = 1:max(reshape(result,1,1002*800))
    [count,~] = size(find(result==i));
    if count < 500
        result(result==i) = 0;
    end
end

% close dead-end channels
closed = zeros(1002,800);
for i = 1:max(reshape(result,1,1002*800))
    k = zeros(1002,800);
    k(result==i) = 1; k = imclose(k,strel('square',8));
    closed(k==1) = i;
end

% do output
out = 255 - im;
for x = 1:1002
    for y = 1:800
        if closed(x,y) == 0
            out(x,y,:) = 0;
        end
    end
end
imshow(out);

result of current code

回答 4

使用队列进行阈值连续填充。将入口左侧的像素推入队列，然后开始循环。如果排队的像素足够暗，则其颜色为浅灰色（高于阈值），并且所有相邻像素都被推入队列。

from PIL import Image
img = Image.open("/tmp/in.jpg")
(w,h) = img.size
scan = [(394,23)]
while(len(scan) > 0):
    (i,j) = scan.pop()
    (r,g,b) = img.getpixel((i,j))
    if(r*g*b < 9000000):
        img.putpixel((i,j),(210,210,210))
        for x in [i-1,i,i+1]:
            for y in [j-1,j,j+1]:
                scan.append((x,y))
img.save("/tmp/out.png")

解决方案是灰墙和彩色墙之间的走廊。请注意，此迷宫有多种解决方案。而且，这似乎只是起作用。

Uses a queue for a threshold continuous fill. Pushes the pixel left of the entrance onto the queue and then starts the loop. If a queued pixel is dark enough, it’s colored light gray (above threshold), and all the neighbors are pushed onto the queue.

from PIL import Image
img = Image.open("/tmp/in.jpg")
(w,h) = img.size
scan = [(394,23)]
while(len(scan) > 0):
    (i,j) = scan.pop()
    (r,g,b) = img.getpixel((i,j))
    if(r*g*b < 9000000):
        img.putpixel((i,j),(210,210,210))
        for x in [i-1,i,i+1]:
            for y in [j-1,j,j+1]:
                scan.append((x,y))
img.save("/tmp/out.png")

Solution is the corridor between gray wall and colored wall. Note this maze has multiple solutions. Also, this merely appears to work.

Solution

回答 5

在这里，您可以去：maze-solver-python（GitHub）

我玩得很开心，并扩展了约瑟夫·科恩（Joseph Kern）的答案。不减损它；我为可能对此感兴趣的其他人做了一些小的补充。

这是一个基于Python的求解器，它使用BFS查找最短路径。当时，我的主要补充是：

搜索之前将图像清除（即转换为纯黑白）
自动生成GIF。
自动生成AVI。

就目前而言，此示例迷宫的起点/终点经过了硬编码，但我计划对其进行扩展，以便您可以选择适当的像素。

Here you go: maze-solver-python (GitHub)

enter image description here

I had fun playing around with this and extended on Joseph Kern‘s answer. Not to detract from it; I just made some minor additions for anyone else who may be interested in playing around with this.

It’s a python-based solver which uses BFS to find the shortest path. My main additions, at the time, are:

The image is cleaned before the search (ie. convert to pure black & white)
Automatically generate a GIF.
Automatically generate an AVI.

As it stands, the start/end-points are hard-coded for this sample maze, but I plan on extending it such that you can pick the appropriate pixels.

回答 6

我会选择矩阵矩阵选项。如果您发现标准的Python列表在此方面效率太低，则可以改用numpy.bool数组。这样，一个1000×1000像素迷宫的存储空间仅为1 MB。

不要为创建任何树或图数据结构而烦恼。那只是思考它的一种方式，但不一定是在内存中表示它的好方法。布尔矩阵既容易编码，也更有效。

然后使用A *算法对其进行求解。对于距离启发式方法，请使用曼哈顿距离（distance_x + distance_y）。

用(row, column)坐标元组表示节点。每当算法（Wikipedia伪代码）要求“邻居”时，只需循环遍历四个可能的邻居即可（注意图像的边缘！）。

如果发现它仍然太慢，可以在加载图像之前尝试缩小图像。小心不要在此过程中迷路。

也许也可以在Python中进行1：2的缩减，以检查您实际上没有丢失任何可能的路径。一个有趣的选择，但还需要更多思考。

I’d go for the matrix-of-bools option. If you find that standard Python lists are too inefficient for this, you could use a numpy.bool array instead. Storage for a 1000×1000 pixel maze is then just 1 MB.

Don’t bother with creating any tree or graph data structures. That’s just a way of thinking about it, but not necessarily a good way to represent it in memory; a boolean matrix is both easier to code and more efficient.

Then use the A* algorithm to solve it. For the distance heuristic, use the Manhattan distance (distance_x + distance_y).

Represent nodes by a tuple of (row, column) coordinates. Whenever the algorithm (Wikipedia pseudocode) calls for “neighbours”, it’s a simple matter of looping over the four possible neighbours (mind the edges of the image!).

If you find that it’s still too slow, you could try downscaling the image before you load it. Be careful not to lose any narrow paths in the process.

Maybe it’s possible to do a 1:2 downscaling in Python as well, checking that you don’t actually lose any possible paths. An interesting option, but it needs a bit more thought.

回答 7

这里有一些想法。

（1.图像处理：）

1.1将图像加载为RGB像素图。在C＃中，使用是微不足道的system.drawing.bitmap。在没有简单的图像支持的语言中，只需将图像转换为可移植的pixmap格式（PPM）（Unix文本表示，会生成大文件）或一些您可以轻松阅读的简单二进制文件格式，例如BMP或TGA。Unix中的ImageMagick或Windows中的IrfanView。

1.2如前所述，您可以通过将每个像素的（R + G + B）/ 3用作灰度指示，然后对该值进行阈值生成黑白表，来简化数据。假设0 =黑色和255 =白色，则接近200的值会去除JPEG伪像。

（2.解决方案：）

2.1深度优先搜索：使用起始位置初始化一个空的堆栈，收集可用的后续动作，随机选择一个并推入堆栈，继续进行直至到达终点或结束。在弹出堆栈的死角回溯中，您需要跟踪在地图上访问过哪些位置，因此当您收集可用的移动时，您绝不会两次走同一条路径。动画非常有趣。

2.2广度优先搜索：之前提到过，与上面类似，但仅使用队列。动画也很有趣。这就像填充图像编辑软件一样。我认为您可以使用此技巧在Photoshop中解决迷宫问题。

2.3 Wall Follower（墙随动件）：从几何学上讲，迷宫是一个折叠/盘旋的管子。如果您将手放在墙上，您最终将找到出口;）并非总是如此。有某些假设，例如：完美的迷宫等，例如，某些迷宫包含孤岛。查一下；令人着迷。

（3.评论：）

这是一个棘手的问题。如果以某种简单的形式来表示形式，每个元素都是具有北，东，南和西壁以及已访问标记场的像元类型，则很容易解决迷宫问题。但是鉴于给定的手绘草图，您正在尝试执行此操作，因此会变得凌乱。老实说，我认为尝试使草图合理化会让您发疯。这类似于相当复杂的计算机视觉问题。也许直接进入图像地图可能更容易但更浪费。

Here are some ideas.

(1. Image Processing:)

1.1 Load the image as RGB pixel map. In C# it is trivial using system.drawing.bitmap. In languages with no simple support for imaging, just convert the image to portable pixmap format (PPM) (a Unix text representation, produces large files) or some simple binary file format you can easily read, such as BMP or TGA. ImageMagick in Unix or IrfanView in Windows.

1.2 You may, as mentioned earlier, simplify the data by taking the (R+G+B)/3 for each pixel as an indicator of gray tone and then threshold the value to produce a black and white table. Something close to 200 assuming 0=black and 255=white will take out the JPEG artifacts.

(2. Solutions:)

2.1 Depth-First Search: Init an empty stack with starting location, collect available follow-up moves, pick one at random and push onto the stack, proceed until end is reached or a deadend. On deadend backtrack by popping the stack, you need to keep track of which positions were visited on the map so when you collect available moves you never take the same path twice. Very interesting to animate.

2.2 Breadth-First Search: Mentioned before, similar as above but only using queues. Also interesting to animate. This works like flood-fill in image editing software. I think you may be able to solve a maze in Photoshop using this trick.

2.3 Wall Follower: Geometrically speaking, a maze is a folded/convoluted tube. If you keep your hand on the wall you will eventually find the exit ;) This does not always work. There are certain assumption re: perfect mazes, etc., for instance, certain mazes contain islands. Do look it up; it is fascinating.

(3. Comments:)

This is the tricky one. It is easy to solve mazes if represented in some simple array formal with each element being a cell type with north, east, south and west walls and a visited flag field. However given that you are trying to do this given a hand drawn sketch it becomes messy. I honestly think that trying to rationalize the sketch will drive you nuts. This is akin to computer vision problems which are fairly involved. Perhaps going directly onto the image map may be easier yet more wasteful.

回答 8

这是使用R的解决方案。

### download the image, read it into R, converting to something we can play with...
library(jpeg)
url <- "https://i.stack.imgur.com/TqKCM.jpg"
download.file(url, "./maze.jpg", mode = "wb")
jpg <- readJPEG("./maze.jpg")

### reshape array into data.frame
library(reshape2)
img3 <- melt(jpg, varnames = c("y","x","rgb"))
img3$rgb <- as.character(factor(img3$rgb, levels = c(1,2,3), labels=c("r","g","b")))

## split out rgb values into separate columns
img3 <- dcast(img3, x + y ~ rgb)

RGB到灰度，请参阅：https : //stackoverflow.com/a/27491947/2371031

# convert rgb to greyscale (0, 1)
img3$v <- img3$r*.21 + img3$g*.72 + img3$b*.07
# v: values closer to 1 are white, closer to 0 are black

## strategically fill in some border pixels so the solver doesn't "go around":
img3$v2 <- img3$v
img3[(img3$x == 300 | img3$x == 500) & (img3$y %in% c(0:23,988:1002)),"v2"]  = 0

# define some start/end point coordinates
pts_df <- data.frame(x = c(398, 399),
                     y = c(985, 26))

# set a reference value as the mean of the start and end point greyscale "v"s
ref_val <- mean(c(subset(img3, x==pts_df[1,1] & y==pts_df[1,2])$v,
                  subset(img3, x==pts_df[2,1] & y==pts_df[2,2])$v))

library(sp)
library(gdistance)
spdf3 <- SpatialPixelsDataFrame(points = img3[c("x","y")], data = img3["v2"])
r3 <- rasterFromXYZ(spdf3)

# transition layer defines a "conductance" function between any two points, and the number of connections (4 = Manhatten distances)
# x in the function represents the greyscale values ("v2") of two adjacent points (pixels), i.e., = (x1$v2, x2$v2)
# make function(x) encourages transitions between cells with small changes in greyscale compared to the reference values, such that: 
# when v2 is closer to 0 (black) = poor conductance
# when v2 is closer to 1 (white) = good conductance
tl3 <- transition(r3, function(x) (1/max( abs( (x/ref_val)-1 ) )^2)-1, 4) 

## get the shortest path between start, end points
sPath3 <- shortestPath(tl3, as.numeric(pts_df[1,]), as.numeric(pts_df[2,]), output = "SpatialLines")

## fortify for ggplot
sldf3 <- fortify(SpatialLinesDataFrame(sPath3, data = data.frame(ID = 1)))

# plot the image greyscale with start/end points (red) and shortest path (green)
ggplot(img3) +
  geom_raster(aes(x, y, fill=v2)) +
  scale_fill_continuous(high="white", low="black") +
  scale_y_reverse() +
  geom_point(data=pts_df, aes(x, y), color="red") +
  geom_path(data=sldf3, aes(x=long, y=lat), color="green")

瞧！

如果您不填写某些边框像素（哈！），就会发生这种情况。

全面披露：在我发现这个问题之前，我自己问和回答了一个非常类似的问题。然后通过SO的魔力，发现这是最重要的“相关问题”之一。我以为我会把这个迷宫作为一个额外的测试用例…我很高兴地发现，我的答案在很少修改的情况下也适用于该应用程序。

Here’s a solution using R.

### download the image, read it into R, converting to something we can play with...
library(jpeg)
url <- "https://i.stack.imgur.com/TqKCM.jpg"
download.file(url, "./maze.jpg", mode = "wb")
jpg <- readJPEG("./maze.jpg")

### reshape array into data.frame
library(reshape2)
img3 <- melt(jpg, varnames = c("y","x","rgb"))
img3$rgb <- as.character(factor(img3$rgb, levels = c(1,2,3), labels=c("r","g","b")))

## split out rgb values into separate columns
img3 <- dcast(img3, x + y ~ rgb)

RGB to greyscale, see: https://stackoverflow.com/a/27491947/2371031

# convert rgb to greyscale (0, 1)
img3$v <- img3$r*.21 + img3$g*.72 + img3$b*.07
# v: values closer to 1 are white, closer to 0 are black

## strategically fill in some border pixels so the solver doesn't "go around":
img3$v2 <- img3$v
img3[(img3$x == 300 | img3$x == 500) & (img3$y %in% c(0:23,988:1002)),"v2"]  = 0

# define some start/end point coordinates
pts_df <- data.frame(x = c(398, 399),
                     y = c(985, 26))

# set a reference value as the mean of the start and end point greyscale "v"s
ref_val <- mean(c(subset(img3, x==pts_df[1,1] & y==pts_df[1,2])$v,
                  subset(img3, x==pts_df[2,1] & y==pts_df[2,2])$v))

library(sp)
library(gdistance)
spdf3 <- SpatialPixelsDataFrame(points = img3[c("x","y")], data = img3["v2"])
r3 <- rasterFromXYZ(spdf3)

# transition layer defines a "conductance" function between any two points, and the number of connections (4 = Manhatten distances)
# x in the function represents the greyscale values ("v2") of two adjacent points (pixels), i.e., = (x1$v2, x2$v2)
# make function(x) encourages transitions between cells with small changes in greyscale compared to the reference values, such that: 
# when v2 is closer to 0 (black) = poor conductance
# when v2 is closer to 1 (white) = good conductance
tl3 <- transition(r3, function(x) (1/max( abs( (x/ref_val)-1 ) )^2)-1, 4) 

## get the shortest path between start, end points
sPath3 <- shortestPath(tl3, as.numeric(pts_df[1,]), as.numeric(pts_df[2,]), output = "SpatialLines")

## fortify for ggplot
sldf3 <- fortify(SpatialLinesDataFrame(sPath3, data = data.frame(ID = 1)))

# plot the image greyscale with start/end points (red) and shortest path (green)
ggplot(img3) +
  geom_raster(aes(x, y, fill=v2)) +
  scale_fill_continuous(high="white", low="black") +
  scale_y_reverse() +
  geom_point(data=pts_df, aes(x, y), color="red") +
  geom_path(data=sldf3, aes(x=long, y=lat), color="green")

Voila!

This is what happens if you don’t fill in some border pixels (Ha!)…

Full disclosure: I asked and answered a very similar question myself before I found this one. Then through the magic of SO, found this one as one of the top “Related Questions”. I thought I’d use this maze as an additional test case… I was very pleased to find that my answer there also works for this application with very little modification.

回答 9

好的解决方案是，而不是按像素查找邻居，而是按单元格完成，因为走廊可以有15px，因此在同一走廊中可以执行左右移动等操作，而如果这样做的话就好像位移是一个多维数据集，这将是一个简单的操作，例如UP，DOWN，LEFT或RIGHT

the good solution would be that instead of finding the neighbors by pixel, it would be done by cell, because a corridor can have 15px so in the same corridor it can take actions like left or right, while if it was done as if the displacement was a cube it would be a simple action like UP,DOWN,LEFT OR RIGHT

知识问答

如何检测圣诞树？[关闭]

2021年7月27日 Python实用宝典

问题：如何检测圣诞树？[关闭]

可以使用哪些图像处理技术来实现检测以下图像中显示的圣诞树的应用程序？

我正在寻找可以在所有这些图像上使用的解决方案。因此，需要训练haar级联分类器或模板匹配的方法不是很有趣。

我正在寻找可以使用任何编程语言编写的东西，只要它仅使用开源技术即可。该解决方案必须使用此问题上共享的图像进行测试。有6个输入图像，答案应显示每个图像的处理结果。最后，对于每个输出图像，必须绘制红线以包围检测到的树。

您将如何以编程方式检测这些图像中的树木？

Which image processing techniques could be used to implement an application that detects the Christmas trees displayed in the following images?

I’m searching for solutions that are going to work on all these images. Therefore, approaches that require training haar cascade classifiers or template matching are not very interesting.

I’m looking for something that can be written in any programming language, as long as it uses only Open Source technologies. The solution must be tested with the images that are shared on this question. There are 6 input images and the answer should display the results of processing each of them. Finally, for each output image there must be red lines draw to surround the detected tree.

How would you go about programmatically detecting the trees in these images?

回答 0

我有一种我认为很有趣的方法，与其他方法有所不同。与其他方法相比，我的方法的主要区别在于如何执行图像分割步骤-我使用了来自python scikit-learn 的DBSCAN聚类算法；它经过优化，可找到可能不一定具有单个清晰质心的某种无定形形状。

在最高层，我的方法很简单，可以分解为大约3个步骤。首先，我应用一个阈值（或者实际上是两个单独且不同的阈值的逻辑“或”）。与其他许多答案一样，我假设圣诞树将是场景中较亮的对象之一，因此第一个阈值只是一个简单的单色亮度测试；0-255范围（黑色为0，白色为255）上的值大于220的所有像素将保存到二进制黑白图像。第二个阈值尝试寻找红光和黄光，这在六张图像的左上角和右下角的树木中尤为突出，并且在大多数照片中普遍使用的蓝绿色背景下表现出色。我将rgb图像转换为hsv空间，并要求色相在0.0-1.0范围内小于0.2（大致对应于黄色和绿色之间的边界）或大于0.95（对应于紫色与红色之间的边界），另外我还要求明亮，饱和的颜色：饱和度和值都必须高于0.7。这两个阈值过程的结果在逻辑上“或”在一起，黑白二进制图像的结果矩阵如下所示：

您可以清楚地看到，每个图像都有一个大的像素簇，大致对应于每棵树的位置，加上一些图像还具有一些其他的小簇，它们对应于某些建筑物的窗户上的灯光，或者对应于背景场景在地平线上。下一步是使计算机识别这些是单独的群集，并使用群集成员ID号正确标记每个像素。

为此，我选择了DBSCAN。相对于其他集群算法，这里有一个很好的视觉比较，可以比较DBSCAN通常的行为。正如我之前说的，它非常适合非晶形形状。此处显示了DBSCAN的输出，其中每个集群以不同的颜色绘制：

查看此结果时，需要注意一些事项。首先，DBSCAN要求用户设置一个“接近”参数以调节其行为，该参数有效地控制了一对点必须分开的程度，以便算法声明新的单独簇，而不是将测试点聚结到已经存在的集群。我将此值设置为每个图像对角线大小的0.04倍。由于图像的大小从大约VGA到大约HD 1080不等，因此这种比例相关的定义至关重要。

另一个值得注意的点是，在scikit-learn中实现的DBSCAN算法具有内存限制，对于此示例中的某些较大图像而言，这是相当大的挑战。因此，对于一些较大的图像，我实际上必须每个群集“抽取”（即，仅保留每个第3或第4像素并丢弃其他像素），以保持在此范围内。作为这种剔除处理的结果，在某些较大的图像上很难看到其余的单个稀疏像素。因此，仅出于显示目的，上述图像中的颜色编码像素已被有效地稍微“扩张”了一点，以使其更加突出。出于叙述目的，这纯粹是一种修饰操作；尽管在我的代码中有评论提到此膨胀，

识别并标记了聚类后，第三步也是最后一步很容易：我只是在每个图像中选取最大的聚类（在这种情况下，我选择根据成员像素的总数来衡量“大小”，尽管可以却很容易地使用某种类型的度量标准来衡量物理范围）并计算该集群的凸包。凸包然后成为树的边界。通过此方法计算的六个凸包在下面以红色显示：

源代码是为Python 2.7.6编写的，它取决于numpy，scipy，matplotlib和scikit-learn。我将其分为两部分。第一部分负责实际的图像处理：

from PIL import Image
import numpy as np
import scipy as sp
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from math import ceil, sqrt

"""
Inputs:

    rgbimg:         [M,N,3] numpy array containing (uint, 0-255) color image

    hueleftthr:     Scalar constant to select maximum allowed hue in the
                    yellow-green region

    huerightthr:    Scalar constant to select minimum allowed hue in the
                    blue-purple region

    satthr:         Scalar constant to select minimum allowed saturation

    valthr:         Scalar constant to select minimum allowed value

    monothr:        Scalar constant to select minimum allowed monochrome
                    brightness

    maxpoints:      Scalar constant maximum number of pixels to forward to
                    the DBSCAN clustering algorithm

    proxthresh:     Proximity threshold to use for DBSCAN, as a fraction of
                    the diagonal size of the image

Outputs:

    borderseg:      [K,2,2] Nested list containing K pairs of x- and y- pixel
                    values for drawing the tree border

    X:              [P,2] List of pixels that passed the threshold step

    labels:         [Q,2] List of cluster labels for points in Xslice (see
                    below)

    Xslice:         [Q,2] Reduced list of pixels to be passed to DBSCAN

"""

def findtree(rgbimg, hueleftthr=0.2, huerightthr=0.95, satthr=0.7, 
             valthr=0.7, monothr=220, maxpoints=5000, proxthresh=0.04):

    # Convert rgb image to monochrome for
    gryimg = np.asarray(Image.fromarray(rgbimg).convert('L'))
    # Convert rgb image (uint, 0-255) to hsv (float, 0.0-1.0)
    hsvimg = colors.rgb_to_hsv(rgbimg.astype(float)/255)

    # Initialize binary thresholded image
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    # Find pixels with hue<0.2 or hue>0.95 (red or yellow) and saturation/value
    # both greater than 0.7 (saturated and bright)--tends to coincide with
    # ornamental lights on trees in some of the images
    boolidx = np.logical_and(
                np.logical_and(
                  np.logical_or((hsvimg[:,:,0] < hueleftthr),
                                (hsvimg[:,:,0] > huerightthr)),
                                (hsvimg[:,:,1] > satthr)),
                                (hsvimg[:,:,2] > valthr))
    # Find pixels that meet hsv criterion
    binimg[np.where(boolidx)] = 255
    # Add pixels that meet grayscale brightness criterion
    binimg[np.where(gryimg > monothr)] = 255

    # Prepare thresholded points for DBSCAN clustering algorithm
    X = np.transpose(np.where(binimg == 255))
    Xslice = X
    nsample = len(Xslice)
    if nsample > maxpoints:
        # Make sure number of points does not exceed DBSCAN maximum capacity
        Xslice = X[range(0,nsample,int(ceil(float(nsample)/maxpoints)))]

    # Translate DBSCAN proximity threshold to units of pixels and run DBSCAN
    pixproxthr = proxthresh * sqrt(binimg.shape[0]**2 + binimg.shape[1]**2)
    db = DBSCAN(eps=pixproxthr, min_samples=10).fit(Xslice)
    labels = db.labels_.astype(int)

    # Find the largest cluster (i.e., with most points) and obtain convex hull   
    unique_labels = set(labels)
    maxclustpt = 0
    for k in unique_labels:
        class_members = [index[0] for index in np.argwhere(labels == k)]
        if len(class_members) > maxclustpt:
            points = Xslice[class_members]
            hull = sp.spatial.ConvexHull(points)
            maxclustpt = len(class_members)
            borderseg = [[points[simplex,0], points[simplex,1]] for simplex
                          in hull.simplices]

    return borderseg, X, labels, Xslice

第二部分是用户级脚本，该脚本调用第一个文件并生成上面的所有图：

#!/usr/bin/env python

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from findtree import findtree

# Image files to process
fname = ['nmzwj.png', 'aVZhC.png', '2K9EF.png',
         'YowlH.png', '2y4o5.png', 'FWhSP.png']

# Initialize figures
fgsz = (16,7)        
figthresh = plt.figure(figsize=fgsz, facecolor='w')
figclust  = plt.figure(figsize=fgsz, facecolor='w')
figcltwo  = plt.figure(figsize=fgsz, facecolor='w')
figborder = plt.figure(figsize=fgsz, facecolor='w')
figthresh.canvas.set_window_title('Thresholded HSV and Monochrome Brightness')
figclust.canvas.set_window_title('DBSCAN Clusters (Raw Pixel Output)')
figcltwo.canvas.set_window_title('DBSCAN Clusters (Slightly Dilated for Display)')
figborder.canvas.set_window_title('Trees with Borders')

for ii, name in zip(range(len(fname)), fname):
    # Open the file and convert to rgb image
    rgbimg = np.asarray(Image.open(name))

    # Get the tree borders as well as a bunch of other intermediate values
    # that will be used to illustrate how the algorithm works
    borderseg, X, labels, Xslice = findtree(rgbimg)

    # Display thresholded images
    axthresh = figthresh.add_subplot(2,3,ii+1)
    axthresh.set_xticks([])
    axthresh.set_yticks([])
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    for v, h in X:
        binimg[v,h] = 255
    axthresh.imshow(binimg, interpolation='nearest', cmap='Greys')

    # Display color-coded clusters
    axclust = figclust.add_subplot(2,3,ii+1) # Raw version
    axclust.set_xticks([])
    axclust.set_yticks([])
    axcltwo = figcltwo.add_subplot(2,3,ii+1) # Dilated slightly for display only
    axcltwo.set_xticks([])
    axcltwo.set_yticks([])
    axcltwo.imshow(binimg, interpolation='nearest', cmap='Greys')
    clustimg = np.ones(rgbimg.shape)    
    unique_labels = set(labels)
    # Generate a unique color for each cluster 
    plcol = cm.rainbow_r(np.linspace(0, 1, len(unique_labels)))
    for lbl, pix in zip(labels, Xslice):
        for col, unqlbl in zip(plcol, unique_labels):
            if lbl == unqlbl:
                # Cluster label of -1 indicates no cluster membership;
                # override default color with black
                if lbl == -1:
                    col = [0.0, 0.0, 0.0, 1.0]
                # Raw version
                for ij in range(3):
                    clustimg[pix[0],pix[1],ij] = col[ij]
                # Dilated just for display
                axcltwo.plot(pix[1], pix[0], 'o', markerfacecolor=col, 
                    markersize=1, markeredgecolor=col)
    axclust.imshow(clustimg)
    axcltwo.set_xlim(0, binimg.shape[1]-1)
    axcltwo.set_ylim(binimg.shape[0], -1)

    # Plot original images with read borders around the trees
    axborder = figborder.add_subplot(2,3,ii+1)
    axborder.set_axis_off()
    axborder.imshow(rgbimg, interpolation='nearest')
    for vseg, hseg in borderseg:
        axborder.plot(hseg, vseg, 'r-', lw=3)
    axborder.set_xlim(0, binimg.shape[1]-1)
    axborder.set_ylim(binimg.shape[0], -1)

plt.show()

I have an approach which I think is interesting and a bit different from the rest. The main difference in my approach, compared to some of the others, is in how the image segmentation step is performed–I used the DBSCAN clustering algorithm from Python’s scikit-learn; it’s optimized for finding somewhat amorphous shapes that may not necessarily have a single clear centroid.

At the top level, my approach is fairly simple and can be broken down into about 3 steps. First I apply a threshold (or actually, the logical “or” of two separate and distinct thresholds). As with many of the other answers, I assumed that the Christmas tree would be one of the brighter objects in the scene, so the first threshold is just a simple monochrome brightness test; any pixels with values above 220 on a 0-255 scale (where black is 0 and white is 255) are saved to a binary black-and-white image. The second threshold tries to look for red and yellow lights, which are particularly prominent in the trees in the upper left and lower right of the six images, and stand out well against the blue-green background which is prevalent in most of the photos. I convert the rgb image to hsv space, and require that the hue is either less than 0.2 on a 0.0-1.0 scale (corresponding roughly to the border between yellow and green) or greater than 0.95 (corresponding to the border between purple and red) and additionally I require bright, saturated colors: saturation and value must both be above 0.7. The results of the two threshold procedures are logically “or”-ed together, and the resulting matrix of black-and-white binary images is shown below:

You can clearly see that each image has one large cluster of pixels roughly corresponding to the location of each tree, plus a few of the images also have some other small clusters corresponding either to lights in the windows of some of the buildings, or to a background scene on the horizon. The next step is to get the computer to recognize that these are separate clusters, and label each pixel correctly with a cluster membership ID number.

For this task I chose DBSCAN. There is a pretty good visual comparison of how DBSCAN typically behaves, relative to other clustering algorithms, available here. As I said earlier, it does well with amorphous shapes. The output of DBSCAN, with each cluster plotted in a different color, is shown here:

There are a few things to be aware of when looking at this result. First is that DBSCAN requires the user to set a “proximity” parameter in order to regulate its behavior, which effectively controls how separated a pair of points must be in order for the algorithm to declare a new separate cluster rather than agglomerating a test point onto an already pre-existing cluster. I set this value to be 0.04 times the size along the diagonal of each image. Since the images vary in size from roughly VGA up to about HD 1080, this type of scale-relative definition is critical.

Another point worth noting is that the DBSCAN algorithm as it is implemented in scikit-learn has memory limits which are fairly challenging for some of the larger images in this sample. Therefore, for a few of the larger images, I actually had to “decimate” (i.e., retain only every 3rd or 4th pixel and drop the others) each cluster in order to stay within this limit. As a result of this culling process, the remaining individual sparse pixels are difficult to see on some of the larger images. Therefore, for display purposes only, the color-coded pixels in the above images have been effectively “dilated” just slightly so that they stand out better. It’s purely a cosmetic operation for the sake of the narrative; although there are comments mentioning this dilation in my code, rest assured that it has nothing to do with any calculations that actually matter.

Once the clusters are identified and labeled, the third and final step is easy: I simply take the largest cluster in each image (in this case, I chose to measure “size” in terms of the total number of member pixels, although one could have just as easily instead used some type of metric that gauges physical extent) and compute the convex hull for that cluster. The convex hull then becomes the tree border. The six convex hulls computed via this method are shown below in red:

The source code is written for Python 2.7.6 and it depends on numpy, scipy, matplotlib and scikit-learn. I’ve divided it into two parts. The first part is responsible for the actual image processing:

from PIL import Image
import numpy as np
import scipy as sp
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from math import ceil, sqrt

"""
Inputs:

    rgbimg:         [M,N,3] numpy array containing (uint, 0-255) color image

    hueleftthr:     Scalar constant to select maximum allowed hue in the
                    yellow-green region

    huerightthr:    Scalar constant to select minimum allowed hue in the
                    blue-purple region

    satthr:         Scalar constant to select minimum allowed saturation

    valthr:         Scalar constant to select minimum allowed value

    monothr:        Scalar constant to select minimum allowed monochrome
                    brightness

    maxpoints:      Scalar constant maximum number of pixels to forward to
                    the DBSCAN clustering algorithm

    proxthresh:     Proximity threshold to use for DBSCAN, as a fraction of
                    the diagonal size of the image

Outputs:

    borderseg:      [K,2,2] Nested list containing K pairs of x- and y- pixel
                    values for drawing the tree border

    X:              [P,2] List of pixels that passed the threshold step

    labels:         [Q,2] List of cluster labels for points in Xslice (see
                    below)

    Xslice:         [Q,2] Reduced list of pixels to be passed to DBSCAN

"""

def findtree(rgbimg, hueleftthr=0.2, huerightthr=0.95, satthr=0.7, 
             valthr=0.7, monothr=220, maxpoints=5000, proxthresh=0.04):

    # Convert rgb image to monochrome for
    gryimg = np.asarray(Image.fromarray(rgbimg).convert('L'))
    # Convert rgb image (uint, 0-255) to hsv (float, 0.0-1.0)
    hsvimg = colors.rgb_to_hsv(rgbimg.astype(float)/255)

    # Initialize binary thresholded image
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    # Find pixels with hue<0.2 or hue>0.95 (red or yellow) and saturation/value
    # both greater than 0.7 (saturated and bright)--tends to coincide with
    # ornamental lights on trees in some of the images
    boolidx = np.logical_and(
                np.logical_and(
                  np.logical_or((hsvimg[:,:,0] < hueleftthr),
                                (hsvimg[:,:,0] > huerightthr)),
                                (hsvimg[:,:,1] > satthr)),
                                (hsvimg[:,:,2] > valthr))
    # Find pixels that meet hsv criterion
    binimg[np.where(boolidx)] = 255
    # Add pixels that meet grayscale brightness criterion
    binimg[np.where(gryimg > monothr)] = 255

    # Prepare thresholded points for DBSCAN clustering algorithm
    X = np.transpose(np.where(binimg == 255))
    Xslice = X
    nsample = len(Xslice)
    if nsample > maxpoints:
        # Make sure number of points does not exceed DBSCAN maximum capacity
        Xslice = X[range(0,nsample,int(ceil(float(nsample)/maxpoints)))]

    # Translate DBSCAN proximity threshold to units of pixels and run DBSCAN
    pixproxthr = proxthresh * sqrt(binimg.shape[0]**2 + binimg.shape[1]**2)
    db = DBSCAN(eps=pixproxthr, min_samples=10).fit(Xslice)
    labels = db.labels_.astype(int)

    # Find the largest cluster (i.e., with most points) and obtain convex hull   
    unique_labels = set(labels)
    maxclustpt = 0
    for k in unique_labels:
        class_members = [index[0] for index in np.argwhere(labels == k)]
        if len(class_members) > maxclustpt:
            points = Xslice[class_members]
            hull = sp.spatial.ConvexHull(points)
            maxclustpt = len(class_members)
            borderseg = [[points[simplex,0], points[simplex,1]] for simplex
                          in hull.simplices]

    return borderseg, X, labels, Xslice

and the second part is a user-level script which calls the first file and generates all of the plots above:

#!/usr/bin/env python

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from findtree import findtree

# Image files to process
fname = ['nmzwj.png', 'aVZhC.png', '2K9EF.png',
         'YowlH.png', '2y4o5.png', 'FWhSP.png']

# Initialize figures
fgsz = (16,7)        
figthresh = plt.figure(figsize=fgsz, facecolor='w')
figclust  = plt.figure(figsize=fgsz, facecolor='w')
figcltwo  = plt.figure(figsize=fgsz, facecolor='w')
figborder = plt.figure(figsize=fgsz, facecolor='w')
figthresh.canvas.set_window_title('Thresholded HSV and Monochrome Brightness')
figclust.canvas.set_window_title('DBSCAN Clusters (Raw Pixel Output)')
figcltwo.canvas.set_window_title('DBSCAN Clusters (Slightly Dilated for Display)')
figborder.canvas.set_window_title('Trees with Borders')

for ii, name in zip(range(len(fname)), fname):
    # Open the file and convert to rgb image
    rgbimg = np.asarray(Image.open(name))

    # Get the tree borders as well as a bunch of other intermediate values
    # that will be used to illustrate how the algorithm works
    borderseg, X, labels, Xslice = findtree(rgbimg)

    # Display thresholded images
    axthresh = figthresh.add_subplot(2,3,ii+1)
    axthresh.set_xticks([])
    axthresh.set_yticks([])
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    for v, h in X:
        binimg[v,h] = 255
    axthresh.imshow(binimg, interpolation='nearest', cmap='Greys')

    # Display color-coded clusters
    axclust = figclust.add_subplot(2,3,ii+1) # Raw version
    axclust.set_xticks([])
    axclust.set_yticks([])
    axcltwo = figcltwo.add_subplot(2,3,ii+1) # Dilated slightly for display only
    axcltwo.set_xticks([])
    axcltwo.set_yticks([])
    axcltwo.imshow(binimg, interpolation='nearest', cmap='Greys')
    clustimg = np.ones(rgbimg.shape)    
    unique_labels = set(labels)
    # Generate a unique color for each cluster 
    plcol = cm.rainbow_r(np.linspace(0, 1, len(unique_labels)))
    for lbl, pix in zip(labels, Xslice):
        for col, unqlbl in zip(plcol, unique_labels):
            if lbl == unqlbl:
                # Cluster label of -1 indicates no cluster membership;
                # override default color with black
                if lbl == -1:
                    col = [0.0, 0.0, 0.0, 1.0]
                # Raw version
                for ij in range(3):
                    clustimg[pix[0],pix[1],ij] = col[ij]
                # Dilated just for display
                axcltwo.plot(pix[1], pix[0], 'o', markerfacecolor=col, 
                    markersize=1, markeredgecolor=col)
    axclust.imshow(clustimg)
    axcltwo.set_xlim(0, binimg.shape[1]-1)
    axcltwo.set_ylim(binimg.shape[0], -1)

    # Plot original images with read borders around the trees
    axborder = figborder.add_subplot(2,3,ii+1)
    axborder.set_axis_off()
    axborder.imshow(rgbimg, interpolation='nearest')
    for vseg, hseg in borderseg:
        axborder.plot(hseg, vseg, 'r-', lw=3)
    axborder.set_xlim(0, binimg.shape[1]-1)
    axborder.set_ylim(binimg.shape[0], -1)

plt.show()

回答 1

编辑注释：我编辑了这篇文章，以（i）根据要求单独处理每棵树图像，（ii）考虑对象的亮度和形状，以提高结果的质量。

下面介绍一种考虑物体亮度和形状的方法。换句话说，它寻找具有三角形形状且具有明显亮度的物体。它使用Marvin图像处理框架以Java实现。

第一步是颜色阈值。此处的目的是将分析重点放在亮度很高的物体上。

输出图像：

源代码：

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);
    }
}
public static void main(String[] args) {
    new ChristmasTree();
}
}

在第二步中，将图像中最亮的点放大以形成形状。该过程的结果是具有明显亮度的物体的可能形状。应用洪水填充分割，可以检测到断开的形状。

输出图像：

源代码：

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=5;
    }
    else{
        blue+=5;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

如输出图像所示，检测到多种形状。在此问题中，图像中只有几个亮点。但是，实施此方法是为了处理更复杂的情况。

在下一步中，将分析每个形状。一种简单的算法可以检测形状类似于三角形的形状。该算法逐行分析对象形状。如果每个形状线的质心几乎相同（给定阈值），并且质量随着y的增加而增加，则对象具有三角形的形状。形状线的质量是该线中属于该形状的像素数。想象一下，您将对象水平切片并分析每个水平段。如果它们彼此居中，并且长度以线性模式从第一段到最后一段增加，则您可能有一个类似于三角形的对象。

源代码：

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][2];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][3] = xe;
        mass[y][4] = mc;    
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][5] > 0 &&
            Math.abs(((mass[y][0]+mass[y][6])/2)-xStart) <= 50 &&
            mass[y][7] >= (mass[yStart][8] + (y-yStart)*0.3) &&
            mass[y][9] <= (mass[yStart][10] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

最后，如下图所示，原始图像中突出显示了每个形状类似于三角形且具有明显亮度的位置（在本例中为圣诞树）。

最终输出图像：

最终源代码：

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");

        // 4. Detect tree-like shapes
        int[] rect = detectTrees(trees2);

        // 5. Draw the result
        MarvinImage original = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");
        drawBoundary(trees2, original, rect);
        MarvinImageIO.saveImage(original, "./res/trees/new/tree_"+i+"_out_2.jpg");
    }
}

private void drawBoundary(MarvinImage shape, MarvinImage original, int[] rect){
    int yLines[] = new int[6];
    yLines[0] = rect[1];
    yLines[1] = rect[1]+(int)((rect[3]/5));
    yLines[2] = rect[1]+((rect[3]/5)*2);
    yLines[3] = rect[1]+((rect[3]/5)*3);
    yLines[4] = rect[1]+(int)((rect[3]/5)*4);
    yLines[5] = rect[1]+rect[3];

    List<Point> points = new ArrayList<Point>();
    for(int i=0; i<yLines.length; i++){
        boolean in=false;
        Point startPoint=null;
        Point endPoint=null;
        for(int x=rect[0]; x<rect[0]+rect[2]; x++){

            if(shape.getIntColor(x, yLines[i]) != 0xFFFFFFFF){
                if(!in){
                    if(startPoint == null){
                        startPoint = new Point(x, yLines[i]);
                    }
                }
                in = true;
            }
            else{
                if(in){
                    endPoint = new Point(x, yLines[i]);
                }
                in = false;
            }
        }

        if(endPoint == null){
            endPoint = new Point((rect[0]+rect[2])-1, yLines[i]);
        }

        points.add(startPoint);
        points.add(endPoint);
    }

    drawLine(points.get(0).x, points.get(0).y, points.get(1).x, points.get(1).y, 15, original);
    drawLine(points.get(1).x, points.get(1).y, points.get(3).x, points.get(3).y, 15, original);
    drawLine(points.get(3).x, points.get(3).y, points.get(5).x, points.get(5).y, 15, original);
    drawLine(points.get(5).x, points.get(5).y, points.get(7).x, points.get(7).y, 15, original);
    drawLine(points.get(7).x, points.get(7).y, points.get(9).x, points.get(9).y, 15, original);
    drawLine(points.get(9).x, points.get(9).y, points.get(11).x, points.get(11).y, 15, original);
    drawLine(points.get(11).x, points.get(11).y, points.get(10).x, points.get(10).y, 15, original);
    drawLine(points.get(10).x, points.get(10).y, points.get(8).x, points.get(8).y, 15, original);
    drawLine(points.get(8).x, points.get(8).y, points.get(6).x, points.get(6).y, 15, original);
    drawLine(points.get(6).x, points.get(6).y, points.get(4).x, points.get(4).y, 15, original);
    drawLine(points.get(4).x, points.get(4).y, points.get(2).x, points.get(2).y, 15, original);
    drawLine(points.get(2).x, points.get(2).y, points.get(0).x, points.get(0).y, 15, original);
}

private void drawLine(int x1, int y1, int x2, int y2, int length, MarvinImage image){
    int lx1, lx2, ly1, ly2;
    for(int i=0; i<length; i++){
        lx1 = (x1+i >= image.getWidth() ? (image.getWidth()-1)-i: x1);
        lx2 = (x2+i >= image.getWidth() ? (image.getWidth()-1)-i: x2);
        ly1 = (y1+i >= image.getHeight() ? (image.getHeight()-1)-i: y1);
        ly2 = (y2+i >= image.getHeight() ? (image.getHeight()-1)-i: y2);

        image.drawLine(lx1+i, ly1, lx2+i, ly2, Color.red);
        image.drawLine(lx1, ly1+i, lx2, ly2+i, Color.red);
    }
}

private void fillRect(MarvinImage image, int[] rect, int length){
    for(int i=0; i<length; i++){
        image.drawRect(rect[0]+i, rect[1]+i, rect[2]-(i*2), rect[3]-(i*2), Color.red);
    }
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][11];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][12] = xe;
        mass[y][13] = mc;   
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][14] > 0 &&
            Math.abs(((mass[y][0]+mass[y][15])/2)-xStart) <= 50 &&
            mass[y][16] >= (mass[yStart][17] + (y-yStart)*0.3) &&
            mass[y][18] <= (mass[yStart][19] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

private int[] getObjectRect(MarvinImage image, int color){
    int x1=-1;
    int x2=-1;
    int y1=-1;
    int y2=-1;

    for(int y=0; y<image.getHeight(); y++){
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){

                if(x1 == -1 || x < x1){
                    x1 = x;
                }
                if(x2 == -1 || x > x2){
                    x2 = x;
                }
                if(y1 == -1 || y < y1){
                    y1 = y;
                }
                if(y2 == -1 || y > y2){
                    y2 = y;
                }
            }
        }
    }

    return new int[]{x1, y1, (x2-x1), (y2-y1)};
}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=30;
    }
    else{
        blue+=30;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

这种方法的优点在于，由于它可以分析物体的形状，因此可能会与包含其他发光物体的图像一起使用。

圣诞节快乐！

编辑注2

讨论了此解决方案与其他解决方案的输出图像的相似性。实际上，它们非常相似。但是这种方法不仅可以分割对象。它还从某种意义上分析了对象的形状。它可以处理同一场景中的多个发光物体。实际上，圣诞树不必是最亮的圣诞树。我只是为了中止讨论而中止。样本中存在偏差，即仅寻找最亮的对象，便会找到树木。但是，我们真的要在这一点上停止讨论吗？在这一点上，计算机实际上能识别出类似于圣诞树的物体吗？让我们尝试缩小这一差距。

下面给出的结果只是为了阐明这一点：

输入图像

输出

EDIT NOTE: I edited this post to (i) process each tree image individually, as requested in the requirements, (ii) to consider both object brightness and shape in order to improve the quality of the result.

Below is presented an approach that takes in consideration the object brightness and shape. In other words, it seeks for objects with triangle-like shape and with significant brightness. It was implemented in Java, using Marvin image processing framework.

The first step is the color thresholding. The objective here is to focus the analysis on objects with significant brightness.

output images:

source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);
    }
}
public static void main(String[] args) {
    new ChristmasTree();
}
}

In the second step, the brightest points in the image are dilated in order to form shapes. The result of this process is the probable shape of the objects with significant brightness. Applying flood fill segmentation, disconnected shapes are detected.

output images:

source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=5;
    }
    else{
        blue+=5;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

As shown in the output image, multiple shapes was detected. In this problem, there a just a few bright points in the images. However, this approach was implemented to deal with more complex scenarios.

In the next step each shape is analyzed. A simple algorithm detects shapes with a pattern similar to a triangle. The algorithm analyze the object shape line by line. If the center of the mass of each shape line is almost the same (given a threshold) and mass increase as y increase, the object has a triangle-like shape. The mass of the shape line is the number of pixels in that line that belongs to the shape. Imagine you slice the object horizontally and analyze each horizontal segment. If they are centralized to each other and the length increase from the first segment to last one in a linear pattern, you probably has an object that resembles a triangle.

source code:

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][2];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][3] = xe;
        mass[y][4] = mc;    
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][5] > 0 &&
            Math.abs(((mass[y][0]+mass[y][6])/2)-xStart) <= 50 &&
            mass[y][7] >= (mass[yStart][8] + (y-yStart)*0.3) &&
            mass[y][9] <= (mass[yStart][10] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

Finally, the position of each shape similar to a triangle and with significant brightness, in this case a Christmas tree, is highlighted in the original image, as shown below.

final output images:

final source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");

        // 4. Detect tree-like shapes
        int[] rect = detectTrees(trees2);

        // 5. Draw the result
        MarvinImage original = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");
        drawBoundary(trees2, original, rect);
        MarvinImageIO.saveImage(original, "./res/trees/new/tree_"+i+"_out_2.jpg");
    }
}

private void drawBoundary(MarvinImage shape, MarvinImage original, int[] rect){
    int yLines[] = new int[6];
    yLines[0] = rect[1];
    yLines[1] = rect[1]+(int)((rect[3]/5));
    yLines[2] = rect[1]+((rect[3]/5)*2);
    yLines[3] = rect[1]+((rect[3]/5)*3);
    yLines[4] = rect[1]+(int)((rect[3]/5)*4);
    yLines[5] = rect[1]+rect[3];

    List<Point> points = new ArrayList<Point>();
    for(int i=0; i<yLines.length; i++){
        boolean in=false;
        Point startPoint=null;
        Point endPoint=null;
        for(int x=rect[0]; x<rect[0]+rect[2]; x++){

            if(shape.getIntColor(x, yLines[i]) != 0xFFFFFFFF){
                if(!in){
                    if(startPoint == null){
                        startPoint = new Point(x, yLines[i]);
                    }
                }
                in = true;
            }
            else{
                if(in){
                    endPoint = new Point(x, yLines[i]);
                }
                in = false;
            }
        }

        if(endPoint == null){
            endPoint = new Point((rect[0]+rect[2])-1, yLines[i]);
        }

        points.add(startPoint);
        points.add(endPoint);
    }

    drawLine(points.get(0).x, points.get(0).y, points.get(1).x, points.get(1).y, 15, original);
    drawLine(points.get(1).x, points.get(1).y, points.get(3).x, points.get(3).y, 15, original);
    drawLine(points.get(3).x, points.get(3).y, points.get(5).x, points.get(5).y, 15, original);
    drawLine(points.get(5).x, points.get(5).y, points.get(7).x, points.get(7).y, 15, original);
    drawLine(points.get(7).x, points.get(7).y, points.get(9).x, points.get(9).y, 15, original);
    drawLine(points.get(9).x, points.get(9).y, points.get(11).x, points.get(11).y, 15, original);
    drawLine(points.get(11).x, points.get(11).y, points.get(10).x, points.get(10).y, 15, original);
    drawLine(points.get(10).x, points.get(10).y, points.get(8).x, points.get(8).y, 15, original);
    drawLine(points.get(8).x, points.get(8).y, points.get(6).x, points.get(6).y, 15, original);
    drawLine(points.get(6).x, points.get(6).y, points.get(4).x, points.get(4).y, 15, original);
    drawLine(points.get(4).x, points.get(4).y, points.get(2).x, points.get(2).y, 15, original);
    drawLine(points.get(2).x, points.get(2).y, points.get(0).x, points.get(0).y, 15, original);
}

private void drawLine(int x1, int y1, int x2, int y2, int length, MarvinImage image){
    int lx1, lx2, ly1, ly2;
    for(int i=0; i<length; i++){
        lx1 = (x1+i >= image.getWidth() ? (image.getWidth()-1)-i: x1);
        lx2 = (x2+i >= image.getWidth() ? (image.getWidth()-1)-i: x2);
        ly1 = (y1+i >= image.getHeight() ? (image.getHeight()-1)-i: y1);
        ly2 = (y2+i >= image.getHeight() ? (image.getHeight()-1)-i: y2);

        image.drawLine(lx1+i, ly1, lx2+i, ly2, Color.red);
        image.drawLine(lx1, ly1+i, lx2, ly2+i, Color.red);
    }
}

private void fillRect(MarvinImage image, int[] rect, int length){
    for(int i=0; i<length; i++){
        image.drawRect(rect[0]+i, rect[1]+i, rect[2]-(i*2), rect[3]-(i*2), Color.red);
    }
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][11];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][12] = xe;
        mass[y][13] = mc;   
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][14] > 0 &&
            Math.abs(((mass[y][0]+mass[y][15])/2)-xStart) <= 50 &&
            mass[y][16] >= (mass[yStart][17] + (y-yStart)*0.3) &&
            mass[y][18] <= (mass[yStart][19] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

private int[] getObjectRect(MarvinImage image, int color){
    int x1=-1;
    int x2=-1;
    int y1=-1;
    int y2=-1;

    for(int y=0; y<image.getHeight(); y++){
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){

                if(x1 == -1 || x < x1){
                    x1 = x;
                }
                if(x2 == -1 || x > x2){
                    x2 = x;
                }
                if(y1 == -1 || y < y1){
                    y1 = y;
                }
                if(y2 == -1 || y > y2){
                    y2 = y;
                }
            }
        }
    }

    return new int[]{x1, y1, (x2-x1), (y2-y1)};
}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=30;
    }
    else{
        blue+=30;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

The advantage of this approach is the fact it will probably work with images containing other luminous objects since it analyzes the object shape.

Merry Christmas!

EDIT NOTE 2

There is a discussion about the similarity of the output images of this solution and some other ones. In fact, they are very similar. But this approach does not just segment objects. It also analyzes the object shapes in some sense. It can handle multiple luminous objects in the same scene. In fact, the Christmas tree does not need to be the brightest one. I’m just abording it to enrich the discussion. There is a bias in the samples that just looking for the brightest object, you will find the trees. But, does we really want to stop the discussion at this point? At this point, how far the computer is really recognizing an object that resembles a Christmas tree? Let’s try to close this gap.

Below is presented a result just to elucidate this point:

input image

output

回答 2

这是我简单而又愚蠢的解决方案。它基于这样的假设，即树将是图片中最亮，最大的东西。

//g++ -Wall -pedantic -ansi -O2 -pipe -s -o christmas_tree christmas_tree.cpp `pkg-config --cflags --libs opencv`
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc,char *argv[])
{
    Mat original,tmp,tmp1;
    vector <vector<Point> > contours;
    Moments m;
    Rect boundrect;
    Point2f center;
    double radius, max_area=0,tmp_area=0;
    unsigned int j, k;
    int i;

    for(i = 1; i < argc; ++i)
    {
        original = imread(argv[i]);
        if(original.empty())
        {
            cerr << "Error"<<endl;
            return -1;
        }

        GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
        erode(tmp, tmp, Mat(), Point(-1, -1), 10);
        cvtColor(tmp, tmp, CV_BGR2HSV);
        inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

        dilate(original, tmp1, Mat(), Point(-1, -1), 15);
        cvtColor(tmp1, tmp1, CV_BGR2HLS);
        inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
        dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }
        tmp1 = Mat::zeros(original.size(),CV_8U);
        approxPolyDP(contours[j], contours[j], 30, true);
        drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

        m = moments(contours[j]);
        boundrect = boundingRect(contours[j]);
        center = Point2f(m.m10/m.m00, m.m01/m.m00);
        radius = (center.y - (boundrect.tl().y))/4.0*3.0;
        Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

        tmp = Mat::zeros(original.size(), CV_8U);
        rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
        circle(tmp, center, radius, Scalar(255, 255, 255), -1);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }

        approxPolyDP(contours[j], contours[j], 30, true);
        convexHull(contours[j], contours[j]);

        drawContours(original, contours, j, Scalar(0, 0, 255), 3);

        namedWindow(argv[i], CV_WINDOW_NORMAL|CV_WINDOW_KEEPRATIO|CV_GUI_EXPANDED);
        imshow(argv[i], original);

        waitKey(0);
        destroyWindow(argv[i]);
    }

    return 0;
}

第一步是检测图片中最亮的像素，但是我们必须对树木本身和反射其光的雪进行区分。在这里，我们尝试排除对颜色代码应用非常简单的滤镜的雪：

GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
erode(tmp, tmp, Mat(), Point(-1, -1), 10);
cvtColor(tmp, tmp, CV_BGR2HSV);
inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

然后我们找到每个“明亮”像素：

dilate(original, tmp1, Mat(), Point(-1, -1), 15);
cvtColor(tmp1, tmp1, CV_BGR2HLS);
inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

最后，我们将两个结果结合起来：

bitwise_and(tmp, tmp1, tmp1);

现在我们寻找最大的明亮物体：

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}
tmp1 = Mat::zeros(original.size(),CV_8U);
approxPolyDP(contours[j], contours[j], 30, true);
drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

现在我们差不多完成了，但是由于下雪还有些不完善。为了将它们剪掉，我们将使用一个圆形和一个矩形构建一个遮罩，以近似于树的形状来删除不需要的片段：

m = moments(contours[j]);
boundrect = boundingRect(contours[j]);
center = Point2f(m.m10/m.m00, m.m01/m.m00);
radius = (center.y - (boundrect.tl().y))/4.0*3.0;
Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

tmp = Mat::zeros(original.size(), CV_8U);
rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
circle(tmp, center, radius, Scalar(255, 255, 255), -1);

bitwise_and(tmp, tmp1, tmp1);

最后一步是找到我们树的轮廓并将其绘制在原始图片上。

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}

approxPolyDP(contours[j], contours[j], 30, true);
convexHull(contours[j], contours[j]);

drawContours(original, contours, j, Scalar(0, 0, 255), 3);

抱歉，目前连接不好，因此无法上传图片。稍后再尝试。

圣诞节快乐。

编辑：

这里是最终输出的一些图片：

Here is my simple and dumb solution. It is based upon the assumption that the tree will be the most bright and big thing in the picture.

//g++ -Wall -pedantic -ansi -O2 -pipe -s -o christmas_tree christmas_tree.cpp `pkg-config --cflags --libs opencv`
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc,char *argv[])
{
    Mat original,tmp,tmp1;
    vector <vector<Point> > contours;
    Moments m;
    Rect boundrect;
    Point2f center;
    double radius, max_area=0,tmp_area=0;
    unsigned int j, k;
    int i;

    for(i = 1; i < argc; ++i)
    {
        original = imread(argv[i]);
        if(original.empty())
        {
            cerr << "Error"<<endl;
            return -1;
        }

        GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
        erode(tmp, tmp, Mat(), Point(-1, -1), 10);
        cvtColor(tmp, tmp, CV_BGR2HSV);
        inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

        dilate(original, tmp1, Mat(), Point(-1, -1), 15);
        cvtColor(tmp1, tmp1, CV_BGR2HLS);
        inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
        dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }
        tmp1 = Mat::zeros(original.size(),CV_8U);
        approxPolyDP(contours[j], contours[j], 30, true);
        drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

        m = moments(contours[j]);
        boundrect = boundingRect(contours[j]);
        center = Point2f(m.m10/m.m00, m.m01/m.m00);
        radius = (center.y - (boundrect.tl().y))/4.0*3.0;
        Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

        tmp = Mat::zeros(original.size(), CV_8U);
        rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
        circle(tmp, center, radius, Scalar(255, 255, 255), -1);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }

        approxPolyDP(contours[j], contours[j], 30, true);
        convexHull(contours[j], contours[j]);

        drawContours(original, contours, j, Scalar(0, 0, 255), 3);

        namedWindow(argv[i], CV_WINDOW_NORMAL|CV_WINDOW_KEEPRATIO|CV_GUI_EXPANDED);
        imshow(argv[i], original);

        waitKey(0);
        destroyWindow(argv[i]);
    }

    return 0;
}

The first step is to detect the most bright pixels in the picture, but we have to do a distinction between the tree itself and the snow which reflect its light. Here we try to exclude the snow appling a really simple filter on the color codes:

GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
erode(tmp, tmp, Mat(), Point(-1, -1), 10);
cvtColor(tmp, tmp, CV_BGR2HSV);
inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

Then we find every “bright” pixel:

dilate(original, tmp1, Mat(), Point(-1, -1), 15);
cvtColor(tmp1, tmp1, CV_BGR2HLS);
inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

Finally we join the two results:

bitwise_and(tmp, tmp1, tmp1);

Now we look for the biggest bright object:

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}
tmp1 = Mat::zeros(original.size(),CV_8U);
approxPolyDP(contours[j], contours[j], 30, true);
drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

Now we have almost done, but there are still some imperfection due to the snow. To cut them off we’ll build a mask using a circle and a rectangle to approximate the shape of a tree to delete unwanted pieces:

m = moments(contours[j]);
boundrect = boundingRect(contours[j]);
center = Point2f(m.m10/m.m00, m.m01/m.m00);
radius = (center.y - (boundrect.tl().y))/4.0*3.0;
Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

tmp = Mat::zeros(original.size(), CV_8U);
rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
circle(tmp, center, radius, Scalar(255, 255, 255), -1);

bitwise_and(tmp, tmp1, tmp1);

The last step is to find the contour of our tree and draw it on the original picture.

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}

approxPolyDP(contours[j], contours[j], 30, true);
convexHull(contours[j], contours[j]);

drawContours(original, contours, j, Scalar(0, 0, 255), 3);

I’m sorry but at the moment I have a bad connection so it is not possible for me to upload pictures. I’ll try to do it later.

Merry Christmas.

EDIT:

Here some pictures of the final output:

回答 3

我在Matlab R2007a中编写了代码。我用k均值粗略提取了圣诞树。我将只用一张图像显示中间结果，而用全部六个图像显示最终结果。

首先，我将RGB空间映射到Lab空间，这可以增强b通道中红色的对比度：

colorTransform = makecform('srgb2lab');
I = applycform(I, colorTransform);
L = double(I(:,:,1));
a = double(I(:,:,2));
b = double(I(:,:,3));

除了色彩空间中的功能外，我还使用了与邻域相关的纹理功能，而不是与每个像素本身相关。在这里，我将三个原始通道（R，G，B）的强度线性组合。我采用这种格式的原因是，图片中的圣诞树上都有红色的灯光，有时还有绿色/有时是蓝色的照明。

R=double(Irgb(:,:,1));
G=double(Irgb(:,:,2));
B=double(Irgb(:,:,3));
I0 = (3*R + max(G,B)-min(G,B))/2;

我在其上应用了3X3局部二进制模式I0，将中心像素用作阈值，并通过计算阈值以上的平均像素强度值与阈值以下的平均值之间的差来获得对比度。

I0_copy = zeros(size(I0));
for i = 2 : size(I0,1) - 1
    for j = 2 : size(I0,2) - 1
        tmp = I0(i-1:i+1,j-1:j+1) >= I0(i,j);
        I0_copy(i,j) = mean(mean(tmp.*I0(i-1:i+1,j-1:j+1))) - ...
            mean(mean(~tmp.*I0(i-1:i+1,j-1:j+1))); % Contrast
    end
end

由于我总共有4个特征，因此我将在聚类方法中选择K = 5。k-means的代码如下所示（来自Andrew Ng博士的机器学习类。我之前参加过该类，我自己在程序设计中编写了代码）。

[centroids, idx] = runkMeans(X, initial_centroids, max_iters);
mask=reshape(idx,img_size(1),img_size(2));

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [centroids, idx] = runkMeans(X, initial_centroids, ...
                                  max_iters, plot_progress)
   [m n] = size(X);
   K = size(initial_centroids, 1);
   centroids = initial_centroids;
   previous_centroids = centroids;
   idx = zeros(m, 1);

   for i=1:max_iters    
      % For each example in X, assign it to the closest centroid
      idx = findClosestCentroids(X, centroids);

      % Given the memberships, compute new centroids
      centroids = computeCentroids(X, idx, K);

   end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function idx = findClosestCentroids(X, centroids)
   K = size(centroids, 1);
   idx = zeros(size(X,1), 1);
   for xi = 1:size(X,1)
      x = X(xi, :);
      % Find closest centroid for x.
      best = Inf;
      for mui = 1:K
        mu = centroids(mui, :);
        d = dot(x - mu, x - mu);
        if d < best
           best = d;
           idx(xi) = mui;
        end
      end
   end 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function centroids = computeCentroids(X, idx, K)
   [m n] = size(X);
   centroids = zeros(K, n);
   for mui = 1:K
      centroids(mui, :) = sum(X(idx == mui, :)) / sum(idx == mui);
   end

由于该程序在我的计算机上运行非常慢，因此我只运行了3次迭代。通常，停止条件是（i）迭代时间至少为10，或（ii）质心不再变化。以我的测试而言，增加迭代次数可能会更准确地区分背景（天空和树木，天空和建筑物等），但在圣诞树提取中并未显示出明显的变化。还要注意，k均值不能不受随机质心初始化的影响，因此建议多次运行该程序进行比较。

在k均值之后，I0选择具有最大强度的标记区域。并使用边界跟踪提取边界。对我来说，最后一棵圣诞树是最难提取的圣诞树，因为该图片中的对比度不如前五棵圣诞树高。我方法中的另一个问题是，我bwboundaries在Matlab中使用函数来跟踪边界，但是有时在第3、5、6个结果中也可以看到内部边界。圣诞树上的阴暗面不仅无法与发光面聚在一起，而且还导致了许多细微的内部边界追踪（imfill改善不多）。总之我的算法还有很大的改进空间。

一些出版物指出，均值平移可能比k均值更健壮，并且许多基于图割的算法在复杂的边界分割上也很有竞争力。我自己编写了均值漂移算法，似乎可以在没有足够光线的情况下更好地提取区域。但是均值移动有点过分，需要一些合并策略。它在我的计算机上的运行速度甚至比k-means慢得多，恐怕我不得不放弃它。我热切期待看到其他人将通过上述现代算法在此处提交出色的结果。

但是我始终相信特征选择是图像分割中的关键组成部分。选择适当的特征以使对象和背景之间的余量最大化，许多分割算法肯定会起作用。不同的算法可能会将结果从1提高到10，但是功能选择可能会将结果从0提高到1。

圣诞节快乐！

I wrote the code in Matlab R2007a. I used k-means to roughly extract the christmas tree. I will show my intermediate result only with one image, and final results with all the six.

First, I mapped the RGB space onto Lab space, which could enhance the contrast of red in its b channel:

colorTransform = makecform('srgb2lab');
I = applycform(I, colorTransform);
L = double(I(:,:,1));
a = double(I(:,:,2));
b = double(I(:,:,3));

Besides the feature in color space, I also used texture feature that is relevant with the neighborhood rather than each pixel itself. Here I linearly combined the intensity from the 3 original channels (R,G,B). The reason why I formatted this way is because the christmas trees in the picture all have red lights on them, and sometimes green/sometimes blue illumination as well.

R=double(Irgb(:,:,1));
G=double(Irgb(:,:,2));
B=double(Irgb(:,:,3));
I0 = (3*R + max(G,B)-min(G,B))/2;

I applied a 3X3 local binary pattern on I0, used the center pixel as the threshold, and obtained the contrast by calculating the difference between the mean pixel intensity value above the threshold and the mean value below it.

I0_copy = zeros(size(I0));
for i = 2 : size(I0,1) - 1
    for j = 2 : size(I0,2) - 1
        tmp = I0(i-1:i+1,j-1:j+1) >= I0(i,j);
        I0_copy(i,j) = mean(mean(tmp.*I0(i-1:i+1,j-1:j+1))) - ...
            mean(mean(~tmp.*I0(i-1:i+1,j-1:j+1))); % Contrast
    end
end

Since I have 4 features in total, I would choose K=5 in my clustering method. The code for k-means are shown below (it is from Dr. Andrew Ng’s machine learning course. I took the course before, and I wrote the code myself in his programming assignment).

[centroids, idx] = runkMeans(X, initial_centroids, max_iters);
mask=reshape(idx,img_size(1),img_size(2));

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [centroids, idx] = runkMeans(X, initial_centroids, ...
                                  max_iters, plot_progress)
   [m n] = size(X);
   K = size(initial_centroids, 1);
   centroids = initial_centroids;
   previous_centroids = centroids;
   idx = zeros(m, 1);

   for i=1:max_iters    
      % For each example in X, assign it to the closest centroid
      idx = findClosestCentroids(X, centroids);

      % Given the memberships, compute new centroids
      centroids = computeCentroids(X, idx, K);

   end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function idx = findClosestCentroids(X, centroids)
   K = size(centroids, 1);
   idx = zeros(size(X,1), 1);
   for xi = 1:size(X,1)
      x = X(xi, :);
      % Find closest centroid for x.
      best = Inf;
      for mui = 1:K
        mu = centroids(mui, :);
        d = dot(x - mu, x - mu);
        if d < best
           best = d;
           idx(xi) = mui;
        end
      end
   end 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function centroids = computeCentroids(X, idx, K)
   [m n] = size(X);
   centroids = zeros(K, n);
   for mui = 1:K
      centroids(mui, :) = sum(X(idx == mui, :)) / sum(idx == mui);
   end

Since the program runs very slow in my computer, I just ran 3 iterations. Normally the stop criteria is (i) iteration time at least 10, or (ii) no change on the centroids any more. To my test, increasing the iteration may differentiate the background (sky and tree, sky and building,…) more accurately, but did not show a drastic changes in christmas tree extraction. Also note k-means is not immune to the random centroid initialization, so running the program several times to make a comparison is recommended.

After the k-means, the labelled region with the maximum intensity of I0 was chosen. And boundary tracing was used to extracted the boundaries. To me, the last christmas tree is the most difficult one to extract since the contrast in that picture is not high enough as they are in the first five. Another issue in my method is that I used bwboundaries function in Matlab to trace the boundary, but sometimes the inner boundaries are also included as you can observe in 3rd, 5th, 6th results. The dark side within the christmas trees are not only failed to be clustered with the illuminated side, but they also lead to so many tiny inner boundaries tracing (imfill doesn’t improve very much). In all my algorithm still has a lot improvement space.

Some publications indicates that mean-shift may be more robust than k-means, and many graph-cut based algorithms are also very competitive on complicated boundaries segmentation. I wrote a mean-shift algorithm myself, it seems to better extract the regions without enough light. But mean-shift is a little bit over-segmented, and some strategy of merging is needed. It ran even much slower than k-means in my computer, I am afraid I have to give it up. I eagerly look forward to see others would submit excellent results here with those modern algorithms mentioned above.

Yet I always believe the feature selection is the key component in image segmentation. With a proper feature selection that can maximize the margin between object and background, many segmentation algorithms will definitely work. Different algorithms may improve the result from 1 to 10, but the feature selection may improve it from 0 to 1.

Merry Christmas !

回答 4

这是我使用传统图像处理方法的最后一篇文章。

在这里，我以某种方式结合了其他两个建议，甚至取得了更好的结果。事实上，我看不到这些结果如何更好（尤其是当您查看该方法生成的蒙版图像时）。

该方法的核心是三个关键假设的组合：

图像在树状区域中应该有很大的波动
图像在树状区域中应具有更高的强度
背景区域应具有较低的强度，并且大部分为蓝色

考虑到这些假设，该方法的工作方式如下：

将图像转换为HSV
用LoG滤波器过滤V通道
对LoG滤波图像应用硬阈值以获得“活动”蒙版A
对V通道应用硬阈值以获得强度遮罩B
应用H通道阈值以将低强度的蓝色区域捕获到背景遮罩C中
使用AND合并蒙版以获得最终蒙版
扩展遮罩以扩大区域并连接分散的像素
消除小区域并获得最终的蒙版，该蒙版最终仅代表树

这是MATLAB中的代码（同样，脚本将所有jpg图像加载到当前文件夹中，同样，这并不是一段经过优化的代码）：

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
back_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to HSV colorspace
    images{end+1}=rgb2hsv(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}(:,:,3)))/thres_div);
    log_image{end+1} = imfilter( images{i}(:,:,3),fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}(:,:,3)));
    int_image{end+1} = images{i}(:,:,3) > int_thres;

    % get the most probable background regions of the image
    back_image{end+1} = images{i}(:,:,1)>(150/360) & images{i}(:,:,1)<(320/360) & images{i}(:,:,3)<0.5;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = logical( log_image{i}) & logical( int_image{i}) & ~logical( back_image{i});

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');

    % iterative enlargement of the structuring element for better connectivity
    while length(measurements{i})>14 && strel_size<(min(size(imgs{i}(:,:,1)))/2),
        strel_size = round( 1.5 * strel_size);
        dilated_image{i} = imdilate( bin_image{i}, strel('disk',strel_size));
        measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    end

    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

结果

高分辨率结果仍可在这里！
在这里可以找到更多带有其他图像的实验。

This is my final post using the traditional image processing approaches…

Here I somehow combine my two other proposals, achieving even better results. As a matter of fact I cannot see how these results could be better (especially when you look at the masked images that the method produces).

At the heart of the approach is the combination of three key assumptions:

Images should have high fluctuations in the tree regions
Images should have higher intensity in the tree regions
Background regions should have low intensity and be mostly blue-ish

With these assumptions in mind the method works as follows:

Convert the images to HSV
Filter the V channel with a LoG filter
Apply hard thresholding on LoG filtered image to get ‘activity’ mask A
Apply hard thresholding to V channel to get intensity mask B
Apply H channel thresholding to capture low intensity blue-ish regions into background mask C
Combine masks using AND to get the final mask
Dilate the mask to enlarge regions and connect dispersed pixels
Eliminate small regions and get the final mask which will eventually represent only the tree

Here is the code in MATLAB (again, the script loads all jpg images in the current folder and, again, this is far from being an optimized piece of code):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
back_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to HSV colorspace
    images{end+1}=rgb2hsv(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}(:,:,3)))/thres_div);
    log_image{end+1} = imfilter( images{i}(:,:,3),fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}(:,:,3)));
    int_image{end+1} = images{i}(:,:,3) > int_thres;

    % get the most probable background regions of the image
    back_image{end+1} = images{i}(:,:,1)>(150/360) & images{i}(:,:,1)<(320/360) & images{i}(:,:,3)<0.5;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = logical( log_image{i}) & logical( int_image{i}) & ~logical( back_image{i});

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');

    % iterative enlargement of the structuring element for better connectivity
    while length(measurements{i})>14 && strel_size<(min(size(imgs{i}(:,:,1)))/2),
        strel_size = round( 1.5 * strel_size);
        dilated_image{i} = imdilate( bin_image{i}, strel('disk',strel_size));
        measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    end

    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

Results

High resolution results still available here!
Even more experiments with additional images can be found here.

回答 5

我的解决步骤：

获取R通道（从RGB）-我们在此通道上进行的所有操作：
创建兴趣区（ROI）
- 最小值为149的阈值R通道（右上图）
- 扩大结果区域（左中图）
在计算的投资回报率中检测矿石。树有很多边缘（右中图）
- 膨胀结果
- 半径较大的腐蚀（左下图）
选择最大的（按区域）对象-这是结果区域
ConvexHull（树是凸多边形）（右下图）
边界框（右下图-grren框）

一步步：

第一个结果-最简单但不是开源软件-“ Adaptive Vision Studio + Adaptive Vision Library”：这不是开源的，但原型制作起来确实非常快：

完整的圣诞树检测算法（11个块）：

下一步。我们需要开源解决方案。将AVL滤镜更改为OpenCV滤镜：在这里，我进行了一些更改，例如，“边缘检测”使用cvCanny滤镜，以尊重roi，我确实将区域图像与边缘图像相乘，选择了我使用的最大元素findContours + outlineArea，但是想法是相同的。

https://www.youtube.com/watch?v=sfjB3MigLH0&index=1&list=UUpSRrkMHNHiLDXgylwhWNQQ

我现在无法显示具有中间步骤的图像，因为我只能放置2个链接。

好的，现在我们使用openSource过滤器，但它还不是全部开源。最后一步-移植到C ++代码。我在版本2.4.4中使用了OpenCV

最终的c ++代码的结果是：

C ++代码也很短：

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/opencv.hpp"
#include <algorithm>
using namespace cv;

int main()
{

    string images[6] = {"..\\1.png","..\\2.png","..\\3.png","..\\4.png","..\\5.png","..\\6.png"};

    for(int i = 0; i < 6; ++i)
    {
        Mat img, thresholded, tdilated, tmp, tmp1;
        vector<Mat> channels(3);

        img = imread(images[i]);
        split(img, channels);
        threshold( channels[2], thresholded, 149, 255, THRESH_BINARY);                      //prepare ROI - threshold
        dilate( thresholded, tdilated,  getStructuringElement( MORPH_RECT, Size(22,22) ) ); //prepare ROI - dilate
        Canny( channels[2], tmp, 75, 125, 3, true );    //Canny edge detection
        multiply( tmp, tdilated, tmp1 );    // set ROI

        dilate( tmp1, tmp, getStructuringElement( MORPH_RECT, Size(20,16) ) ); // dilate
        erode( tmp, tmp1, getStructuringElement( MORPH_RECT, Size(36,36) ) ); // erode

        vector<vector<Point> > contours, contours1(1);
        vector<Point> convex;
        vector<Vec4i> hierarchy;
        findContours( tmp1, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );

        //get element of maximum area
        //int bestID = std::max_element( contours.begin(), contours.end(), 
        //  []( const vector<Point>& A, const vector<Point>& B ) { return contourArea(A) < contourArea(B); } ) - contours.begin();

            int bestID = 0;
        int bestArea = contourArea( contours[0] );
        for( int i = 1; i < contours.size(); ++i )
        {
            int area = contourArea( contours[i] );
            if( area > bestArea )
            {
                bestArea  = area;
                bestID = i;
            }
        }

        convexHull( contours[bestID], contours1[0] ); 
        drawContours( img, contours1, 0, Scalar( 100, 100, 255 ), img.rows / 100, 8, hierarchy, 0, Point() );

        imshow("image", img );
        waitKey(0);
    }


    return 0;
}

My solution steps:

Get R channel (from RGB) – all operations we make on this channel:
Create Region of Interest (ROI)
- Threshold R channel with min value 149 (top right image)
- Dilate result region (middle left image)
Detect eges in computed roi. Tree has a lot of edges (middle right image)
- Dilate result
- Erode with bigger radius ( bottom left image)
Select the biggest (by area) object – it’s the result region
ConvexHull ( tree is convex polygon ) ( bottom right image )
Bounding box (bottom right image – grren box )

Step by step:

The first result – most simple but not in open source software – “Adaptive Vision Studio + Adaptive Vision Library”: This is not open source but really fast to prototype:

Whole algorithm to detect christmas tree (11 blocks):

Next step. We want open source solution. Change AVL filters to OpenCV filters: Here I did little changes e.g. Edge Detection use cvCanny filter, to respect roi i did multiply region image with edges image, to select the biggest element i used findContours + contourArea but idea is the same.

https://www.youtube.com/watch?v=sfjB3MigLH0&index=1&list=UUpSRrkMHNHiLDXgylwhWNQQ

I can’t show images with intermediate steps now because I can put only 2 links.

Ok now we use openSource filters but it’s not still whole open source. Last step – port to c++ code. I used OpenCV in version 2.4.4

The result of final c++ code is:

c++ code is also quite short:

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/opencv.hpp"
#include <algorithm>
using namespace cv;

int main()
{

    string images[6] = {"..\\1.png","..\\2.png","..\\3.png","..\\4.png","..\\5.png","..\\6.png"};

    for(int i = 0; i < 6; ++i)
    {
        Mat img, thresholded, tdilated, tmp, tmp1;
        vector<Mat> channels(3);

        img = imread(images[i]);
        split(img, channels);
        threshold( channels[2], thresholded, 149, 255, THRESH_BINARY);                      //prepare ROI - threshold
        dilate( thresholded, tdilated,  getStructuringElement( MORPH_RECT, Size(22,22) ) ); //prepare ROI - dilate
        Canny( channels[2], tmp, 75, 125, 3, true );    //Canny edge detection
        multiply( tmp, tdilated, tmp1 );    // set ROI

        dilate( tmp1, tmp, getStructuringElement( MORPH_RECT, Size(20,16) ) ); // dilate
        erode( tmp, tmp1, getStructuringElement( MORPH_RECT, Size(36,36) ) ); // erode

        vector<vector<Point> > contours, contours1(1);
        vector<Point> convex;
        vector<Vec4i> hierarchy;
        findContours( tmp1, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );

        //get element of maximum area
        //int bestID = std::max_element( contours.begin(), contours.end(), 
        //  []( const vector<Point>& A, const vector<Point>& B ) { return contourArea(A) < contourArea(B); } ) - contours.begin();

            int bestID = 0;
        int bestArea = contourArea( contours[0] );
        for( int i = 1; i < contours.size(); ++i )
        {
            int area = contourArea( contours[i] );
            if( area > bestArea )
            {
                bestArea  = area;
                bestID = i;
            }
        }

        convexHull( contours[bestID], contours1[0] ); 
        drawContours( img, contours1, 0, Scalar( 100, 100, 255 ), img.rows / 100, 8, hierarchy, 0, Point() );

        imshow("image", img );
        waitKey(0);
    }


    return 0;
}

回答 6

…另一种老式解决方案-完全基于HSV处理：

将图像转换为HSV色彩空间
根据HSV中的启发式方法创建蒙版（请参见下文）
对面罩进行形态学扩张以连接断开的区域
丢弃小区域和水平块（记住树是垂直块）
计算边界框

一个字的启发式在HSV处理：

色调（H）在210-320度之间的所有物体被丢弃为蓝紫色，应该是在背景或不相关的区域
一切与值（V）降低40％也被丢弃，因为太暗是相关

当然，可以尝试许多其他可能性来微调这种方法。

这是实现此技巧的MATLAB代码（警告：代码远未优化！！！我使用了不推荐用于MATLAB编程的技术，只是为了能够跟踪过程中的任何内容，因此可以对其进行极大地优化）：

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
num=length(ims);

imgs={};
hsvs={}; 
masks={};
dilated_images={};
measurements={};
boxs={};

for i=1:num, 
    % load original image
    imgs{end+1} = imread(ims(i).name);
    flt_x_size = round(size(imgs{i},2)*0.005);
    flt_y_size = round(size(imgs{i},1)*0.005);
    flt = fspecial( 'average', max( flt_y_size, flt_x_size));
    imgs{i} = imfilter( imgs{i}, flt, 'same');
    % convert to HSV colorspace
    hsvs{end+1} = rgb2hsv(imgs{i});
    % apply a hard thresholding and binary operation to construct the mask
    masks{end+1} = medfilt2( ~(hsvs{i}(:,:,1)>(210/360) & hsvs{i}(:,:,1)<(320/360))&hsvs{i}(:,:,3)>0.4);
    % apply morphological dilation to connect distonnected components
    strel_size = round(0.03*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_images{end+1} = imdilate( masks{i}, strel('disk',strel_size));
    % do some measurements to eliminate small objects
    measurements{i} = regionprops( dilated_images{i},'Perimeter','Area','BoundingBox'); 
    for m=1:length(measurements{i})
        if (measurements{i}(m).Area < 0.02*numel( dilated_images{i})) || (measurements{i}(m).BoundingBox(3)>1.2*measurements{i}(m).BoundingBox(4))
            dilated_images{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    dilated_images{i} = dilated_images{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_images{i});
    if isempty( y)
        boxs{end+1}=[];
    else
        boxs{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end

end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(boxs{i})
        hold on;
        rr = rectangle( 'position', boxs{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_images{i},[1 1 3])));
end

结果：

在结果中，我显示了蒙版的图像和边界框。

…another old fashioned solution – purely based on HSV processing:

Convert images to the HSV colorspace
Create masks according to heuristics in the HSV (see below)
Apply morphological dilation to the mask to connect disconnected areas
Discard small areas and horizontal blocks (remember trees are vertical blocks)
Compute the bounding box

A word on the heuristics in the HSV processing:

everything with Hues (H) between 210 – 320 degrees is discarded as blue-magenta that is supposed to be in the background or in non-relevant areas
everything with Values (V) lower that 40% is also discarded as being too dark to be relevant

Of course one may experiment with numerous other possibilities to fine-tune this approach…

Here is the MATLAB code to do the trick (warning: the code is far from being optimized!!! I used techniques not recommended for MATLAB programming just to be able to track anything in the process-this can be greatly optimized):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
num=length(ims);

imgs={};
hsvs={}; 
masks={};
dilated_images={};
measurements={};
boxs={};

for i=1:num, 
    % load original image
    imgs{end+1} = imread(ims(i).name);
    flt_x_size = round(size(imgs{i},2)*0.005);
    flt_y_size = round(size(imgs{i},1)*0.005);
    flt = fspecial( 'average', max( flt_y_size, flt_x_size));
    imgs{i} = imfilter( imgs{i}, flt, 'same');
    % convert to HSV colorspace
    hsvs{end+1} = rgb2hsv(imgs{i});
    % apply a hard thresholding and binary operation to construct the mask
    masks{end+1} = medfilt2( ~(hsvs{i}(:,:,1)>(210/360) & hsvs{i}(:,:,1)<(320/360))&hsvs{i}(:,:,3)>0.4);
    % apply morphological dilation to connect distonnected components
    strel_size = round(0.03*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_images{end+1} = imdilate( masks{i}, strel('disk',strel_size));
    % do some measurements to eliminate small objects
    measurements{i} = regionprops( dilated_images{i},'Perimeter','Area','BoundingBox'); 
    for m=1:length(measurements{i})
        if (measurements{i}(m).Area < 0.02*numel( dilated_images{i})) || (measurements{i}(m).BoundingBox(3)>1.2*measurements{i}(m).BoundingBox(4))
            dilated_images{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    dilated_images{i} = dilated_images{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_images{i});
    if isempty( y)
        boxs{end+1}=[];
    else
        boxs{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end

end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(boxs{i})
        hold on;
        rr = rectangle( 'position', boxs{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_images{i},[1 1 3])));
end

Results:

In the results I show the masked image and the bounding box.

回答 7

一些老式的图像处理方法…
这个想法是基于这样的假设，即图像在通常较暗和较平滑的背景（在某些情况下为前景）上描绘了发光的树。该点燃树面积更“有活力”，具有较高的强度。
流程如下：

转换为灰度
应用LoG过滤以获取最“活跃”的区域
应用专心的阈值以获得最明亮的区域
结合之前的2个以获得初步的蒙版
应用形态学扩张来扩大区域并连接相邻的组件
根据面积缩小消除候选区域

您得到的是每个图像的二进制掩码和边界框。

这是使用这种幼稚技术的结果：

MATLAB上 的代码如下：该代码在包含JPG图像的文件夹上运行。加载所有图像并返回检测到的结果。

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to grayscale
    images{end+1}=rgb2gray(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}))/thres_div);
    log_image{end+1} = imfilter( images{i},fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}));
    int_image{end+1} = images{i} > int_thres;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = log_image{i} .* int_image{i};

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

Some old-fashioned image processing approach…
The idea is based on the assumption that images depict lighted trees on typically darker and smoother backgrounds (or foregrounds in some cases). The lighted tree area is more “energetic” and has higher intensity.
The process is as follows:

Convert to graylevel
Apply LoG filtering to get the most “active” areas
Apply an intentisy thresholding to get the most bright areas
Combine the previous 2 to get a preliminary mask
Apply a morphological dilation to enlarge areas and connect neighboring components
Eliminate small candidate areas according to their area size

What you get is a binary mask and a bounding box for each image.

Here are the results using this naive technique:

Code on MATLAB follows: The code runs on a folder with JPG images. Loads all images and returns detected results.

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to grayscale
    images{end+1}=rgb2gray(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}))/thres_div);
    log_image{end+1} = imfilter( images{i},fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}));
    int_image{end+1} = images{i} > int_thres;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = log_image{i} .* int_image{i};

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

回答 8

使用与我所见不同的方法，我创建了一个的PHP通过它们的灯光检测圣诞树的脚本。结果始终是对称的三角形，如有必要，还可以使用数字值，例如树的角度（“脂肪度”）。

显然，此算法的最大威胁是（大量）或树前（更大的问题，直到进一步优化）旁边的灯。编辑（添加）：做不到的事情：找出是否有一棵圣诞树，在一幅图像中找到多棵圣诞树，正确检测拉斯维加斯中部的圣诞前夜树，检测严重弯曲的圣诞树，倒置或切碎…;）

不同的阶段是：

计算每个像素的附加亮度（R + G + B）
将每个像素上方所有8个相邻像素的值相加
按此值对所有像素进行排名（最亮的优先）-我知道，不是很细微…
从顶部开始选择N个，跳过距离太近的
计算中位数这前N个（给我们大约树的中心）
从中位数位置开始，在一个加宽的搜索光束中，从所选的最亮光源发出的最上面的光线（人们倾向于在最上方放置至少一个光线）
从那里开始，想象线条向左和向右向下倾斜60度（圣诞节树不应该那么胖）
降低60度，直到20％的最亮的光不在此三角形内
在三角形的最底部找到光线，为您提供树的下部水平边框
完成了

标记说明：

树中心的大红十字：N个最亮的灯光的中位数
从上方向上的虚线：“搜索光束”为树的顶部
较小的红十字：树顶
很小的红叉：所有N个最亮的灯
红色三角形：D！

源代码：

<?php

ini_set('memory_limit', '1024M');

header("Content-type: image/png");

$chosenImage = 6;

switch($chosenImage){
    case 1:
        $inputImage     = imagecreatefromjpeg("nmzwj.jpg");
        break;
    case 2:
        $inputImage     = imagecreatefromjpeg("2y4o5.jpg");
        break;
    case 3:
        $inputImage     = imagecreatefromjpeg("YowlH.jpg");
        break;
    case 4:
        $inputImage     = imagecreatefromjpeg("2K9Ef.jpg");
        break;
    case 5:
        $inputImage     = imagecreatefromjpeg("aVZhC.jpg");
        break;
    case 6:
        $inputImage     = imagecreatefromjpeg("FWhSP.jpg");
        break;
    case 7:
        $inputImage     = imagecreatefromjpeg("roemerberg.jpg");
        break;
    default:
        exit();
}

// Process the loaded image

$topNspots = processImage($inputImage);

imagejpeg($inputImage);
imagedestroy($inputImage);

// Here be functions

function processImage($image) {
    $orange = imagecolorallocate($image, 220, 210, 60);
    $black = imagecolorallocate($image, 0, 0, 0);
    $red = imagecolorallocate($image, 255, 0, 0);

    $maxX = imagesx($image)-1;
    $maxY = imagesy($image)-1;

    // Parameters
    $spread = 1; // Number of pixels to each direction that will be added up
    $topPositions = 80; // Number of (brightest) lights taken into account
    $minLightDistance = round(min(array($maxX, $maxY)) / 30); // Minimum number of pixels between the brigtests lights
    $searchYperX = 5; // spread of the "search beam" from the median point to the top

    $renderStage = 3; // 1 to 3; exits the process early


    // STAGE 1
    // Calculate the brightness of each pixel (R+G+B)

    $maxBrightness = 0;
    $stage1array = array();

    for($row = 0; $row <= $maxY; $row++) {

        $stage1array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {

            $rgb = imagecolorat($image, $col, $row);
            $brightness = getBrightnessFromRgb($rgb);
            $stage1array[$row][$col] = $brightness;

            if($renderStage == 1){
                $brightnessToGrey = round($brightness / 765 * 256);
                $greyRgb = imagecolorallocate($image, $brightnessToGrey, $brightnessToGrey, $brightnessToGrey);
                imagesetpixel($image, $col, $row, $greyRgb);
            }

            if($brightness > $maxBrightness) {
                $maxBrightness = $brightness;
                if($renderStage == 1){
                    imagesetpixel($image, $col, $row, $red);
                }
            }
        }
    }
    if($renderStage == 1) {
        return;
    }


    // STAGE 2
    // Add up brightness of neighbouring pixels

    $stage2array = array();
    $maxStage2 = 0;

    for($row = 0; $row <= $maxY; $row++) {
        $stage2array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {
            if(!isset($stage2array[$row][$col])) $stage2array[$row][$col] = 0;

            // Look around the current pixel, add brightness
            for($y = $row-$spread; $y <= $row+$spread; $y++) {
                for($x = $col-$spread; $x <= $col+$spread; $x++) {

                    // Don't read values from outside the image
                    if($x >= 0 && $x <= $maxX && $y >= 0 && $y <= $maxY){
                        $stage2array[$row][$col] += $stage1array[$y][$x]+10;
                    }
                }
            }

            $stage2value = $stage2array[$row][$col];
            if($stage2value > $maxStage2) {
                $maxStage2 = $stage2value;
            }
        }
    }

    if($renderStage >= 2){
        // Paint the accumulated light, dimmed by the maximum value from stage 2
        for($row = 0; $row <= $maxY; $row++) {
            for($col = 0; $col <= $maxX; $col++) {
                $brightness = round($stage2array[$row][$col] / $maxStage2 * 255);
                $greyRgb = imagecolorallocate($image, $brightness, $brightness, $brightness);
                imagesetpixel($image, $col, $row, $greyRgb);
            }
        }
    }

    if($renderStage == 2) {
        return;
    }


    // STAGE 3

    // Create a ranking of bright spots (like "Top 20")
    $topN = array();

    for($row = 0; $row <= $maxY; $row++) {
        for($col = 0; $col <= $maxX; $col++) {

            $stage2Brightness = $stage2array[$row][$col];
            $topN[$col.":".$row] = $stage2Brightness;
        }
    }
    arsort($topN);

    $topNused = array();
    $topPositionCountdown = $topPositions;

    if($renderStage == 3){
        foreach ($topN as $key => $val) {
            if($topPositionCountdown <= 0){
                break;
            }

            $position = explode(":", $key);

            foreach($topNused as $usedPosition => $usedValue) {
                $usedPosition = explode(":", $usedPosition);
                $distance = abs($usedPosition[0] - $position[0]) + abs($usedPosition[1] - $position[1]);
                if($distance < $minLightDistance) {
                    continue 2;
                }
            }

            $topNused[$key] = $val;

            paintCrosshair($image, $position[0], $position[1], $red, 2);

            $topPositionCountdown--;

        }
    }


    // STAGE 4
    // Median of all Top N lights
    $topNxValues = array();
    $topNyValues = array();

    foreach ($topNused as $key => $val) {
        $position = explode(":", $key);
        array_push($topNxValues, $position[0]);
        array_push($topNyValues, $position[1]);
    }

    $medianXvalue = round(calculate_median($topNxValues));
    $medianYvalue = round(calculate_median($topNyValues));
    paintCrosshair($image, $medianXvalue, $medianYvalue, $red, 15);


    // STAGE 5
    // Find treetop

    $filename = 'debug.log';
    $handle = fopen($filename, "w");
    fwrite($handle, "\n\n STAGE 5");

    $treetopX = $medianXvalue;
    $treetopY = $medianYvalue;

    $searchXmin = $medianXvalue;
    $searchXmax = $medianXvalue;

    $width = 0;
    for($y = $medianYvalue; $y >= 0; $y--) {
        fwrite($handle, "\nAt y = ".$y);

        if(($y % $searchYperX) == 0) { // Modulo
            $width++;
            $searchXmin = $medianXvalue - $width;
            $searchXmax = $medianXvalue + $width;
            imagesetpixel($image, $searchXmin, $y, $red);
            imagesetpixel($image, $searchXmax, $y, $red);
        }

        foreach ($topNused as $key => $val) {
            $position = explode(":", $key); // "x:y"

            if($position[1] != $y){
                continue;
            }

            if($position[0] >= $searchXmin && $position[0] <= $searchXmax){
                $treetopX = $position[0];
                $treetopY = $y;
            }
        }

    }

    paintCrosshair($image, $treetopX, $treetopY, $red, 5);


    // STAGE 6
    // Find tree sides
    fwrite($handle, "\n\n STAGE 6");

    $treesideAngle = 60; // The extremely "fat" end of a christmas tree
    $treeBottomY = $treetopY;

    $topPositionsExcluded = 0;
    $xymultiplier = 0;
    while(($topPositionsExcluded < ($topPositions / 5)) && $treesideAngle >= 1){
        fwrite($handle, "\n\nWe're at angle ".$treesideAngle);
        $xymultiplier = sin(deg2rad($treesideAngle));
        fwrite($handle, "\nMultiplier: ".$xymultiplier);

        $topPositionsExcluded = 0;
        foreach ($topNused as $key => $val) {
            $position = explode(":", $key);
            fwrite($handle, "\nAt position ".$key);

            if($position[1] > $treeBottomY) {
                $treeBottomY = $position[1];
            }

            // Lights above the tree are outside of it, but don't matter
            if($position[1] < $treetopY){
                $topPositionsExcluded++;
                fwrite($handle, "\nTOO HIGH");
                continue;
            }

            // Top light will generate division by zero
            if($treetopY-$position[1] == 0) {
                fwrite($handle, "\nDIVISION BY ZERO");
                continue;
            }

            // Lights left end right of it are also not inside
            fwrite($handle, "\nLight position factor: ".(abs($treetopX-$position[0]) / abs($treetopY-$position[1])));
            if((abs($treetopX-$position[0]) / abs($treetopY-$position[1])) > $xymultiplier){
                $topPositionsExcluded++;
                fwrite($handle, "\n --- Outside tree ---");
            }
        }

        $treesideAngle--;
    }
    fclose($handle);

    // Paint tree's outline
    $treeHeight = abs($treetopY-$treeBottomY);
    $treeBottomLeft = 0;
    $treeBottomRight = 0;
    $previousState = false; // line has not started; assumes the tree does not "leave"^^

    for($x = 0; $x <= $maxX; $x++){
        if(abs($treetopX-$x) != 0 && abs($treetopX-$x) / $treeHeight > $xymultiplier){
            if($previousState == true){
                $treeBottomRight = $x;
                $previousState = false;
            }
            continue;
        }
        imagesetpixel($image, $x, $treeBottomY, $red);
        if($previousState == false){
            $treeBottomLeft = $x;
            $previousState = true;
        }
    }
    imageline($image, $treeBottomLeft, $treeBottomY, $treetopX, $treetopY, $red);
    imageline($image, $treeBottomRight, $treeBottomY, $treetopX, $treetopY, $red);


    // Print out some parameters

    $string = "Min dist: ".$minLightDistance." | Tree angle: ".$treesideAngle." deg | Tree bottom: ".$treeBottomY;

    $px     = (imagesx($image) - 6.5 * strlen($string)) / 2;
    imagestring($image, 2, $px, 5, $string, $orange);

    return $topN;
}

/**
 * Returns values from 0 to 765
 */
function getBrightnessFromRgb($rgb) {
    $r = ($rgb >> 16) & 0xFF;
    $g = ($rgb >> 8) & 0xFF;
    $b = $rgb & 0xFF;

    return $r+$r+$b;
}

function paintCrosshair($image, $posX, $posY, $color, $size=5) {
    for($x = $posX-$size; $x <= $posX+$size; $x++) {
        if($x>=0 && $x < imagesx($image)){
            imagesetpixel($image, $x, $posY, $color);
        }
    }
    for($y = $posY-$size; $y <= $posY+$size; $y++) {
        if($y>=0 && $y < imagesy($image)){
            imagesetpixel($image, $posX, $y, $color);
        }
    }
}

// From http://www.mdj.us/web-development/php-programming/calculating-the-median-average-values-of-an-array-with-php/
function calculate_median($arr) {
    sort($arr);
    $count = count($arr); //total numbers in array
    $middleval = floor(($count-1)/2); // find the middle value, or the lowest middle value
    if($count % 2) { // odd number, middle is the median
        $median = $arr[$middleval];
    } else { // even number, calculate avg of 2 medians
        $low = $arr[$middleval];
        $high = $arr[$middleval+1];
        $median = (($low+$high)/2);
    }
    return $median;
}


?>

图片：

奖励：来自维基百科的德国人Weihnachtsbaum，网址：http：//commons.wikimedia.org/wiki/File：Weihnachtsbaum_R％C3％B6merberg.jpg

Using a quite different approach from what I’ve seen, I created a php script that detects christmas trees by their lights. The result ist always a symmetrical triangle, and if necessary numeric values like the angle (“fatness”) of the tree.

The biggest threat to this algorithm obviously are lights next to (in great numbers) or in front of the tree (the greater problem until further optimization). Edit (added): What it can’t do: Find out if there’s a christmas tree or not, find multiple christmas trees in one image, correctly detect a cristmas tree in the middle of Las Vegas, detect christmas trees that are heavily bent, upside-down or chopped down… ;)

The different stages are:

Calculate the added brightness (R+G+B) for each pixel
Add up this value of all 8 neighbouring pixels on top of each pixel
Rank all pixels by this value (brightest first) – I know, not really subtle…
Choose N of these, starting from the top, skipping ones that are too close
Calculate the median of these top N (gives us the approximate center of the tree)
Start from the median position upwards in a widening search beam for the topmost light from the selected brightest ones (people tend to put at least one light at the very top)
From there, imagine lines going 60 degrees left and right downwards (christmas trees shouldn’t be that fat)
Decrease those 60 degrees until 20% of the brightest lights are outside this triangle
Find the light at the very bottom of the triangle, giving you the lower horizontal border of the tree
Done

Explanation of the markings:

Big red cross in the center of the tree: Median of the top N brightest lights
Dotted line from there upwards: “search beam” for the top of the tree
Smaller red cross: top of the tree
Really small red crosses: All of the top N brightest lights
Red triangle: D’uh!

Source code:

<?php

ini_set('memory_limit', '1024M');

header("Content-type: image/png");

$chosenImage = 6;

switch($chosenImage){
    case 1:
        $inputImage     = imagecreatefromjpeg("nmzwj.jpg");
        break;
    case 2:
        $inputImage     = imagecreatefromjpeg("2y4o5.jpg");
        break;
    case 3:
        $inputImage     = imagecreatefromjpeg("YowlH.jpg");
        break;
    case 4:
        $inputImage     = imagecreatefromjpeg("2K9Ef.jpg");
        break;
    case 5:
        $inputImage     = imagecreatefromjpeg("aVZhC.jpg");
        break;
    case 6:
        $inputImage     = imagecreatefromjpeg("FWhSP.jpg");
        break;
    case 7:
        $inputImage     = imagecreatefromjpeg("roemerberg.jpg");
        break;
    default:
        exit();
}

// Process the loaded image

$topNspots = processImage($inputImage);

imagejpeg($inputImage);
imagedestroy($inputImage);

// Here be functions

function processImage($image) {
    $orange = imagecolorallocate($image, 220, 210, 60);
    $black = imagecolorallocate($image, 0, 0, 0);
    $red = imagecolorallocate($image, 255, 0, 0);

    $maxX = imagesx($image)-1;
    $maxY = imagesy($image)-1;

    // Parameters
    $spread = 1; // Number of pixels to each direction that will be added up
    $topPositions = 80; // Number of (brightest) lights taken into account
    $minLightDistance = round(min(array($maxX, $maxY)) / 30); // Minimum number of pixels between the brigtests lights
    $searchYperX = 5; // spread of the "search beam" from the median point to the top

    $renderStage = 3; // 1 to 3; exits the process early


    // STAGE 1
    // Calculate the brightness of each pixel (R+G+B)

    $maxBrightness = 0;
    $stage1array = array();

    for($row = 0; $row <= $maxY; $row++) {

        $stage1array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {

            $rgb = imagecolorat($image, $col, $row);
            $brightness = getBrightnessFromRgb($rgb);
            $stage1array[$row][$col] = $brightness;

            if($renderStage == 1){
                $brightnessToGrey = round($brightness / 765 * 256);
                $greyRgb = imagecolorallocate($image, $brightnessToGrey, $brightnessToGrey, $brightnessToGrey);
                imagesetpixel($image, $col, $row, $greyRgb);
            }

            if($brightness > $maxBrightness) {
                $maxBrightness = $brightness;
                if($renderStage == 1){
                    imagesetpixel($image, $col, $row, $red);
                }
            }
        }
    }
    if($renderStage == 1) {
        return;
    }


    // STAGE 2
    // Add up brightness of neighbouring pixels

    $stage2array = array();
    $maxStage2 = 0;

    for($row = 0; $row <= $maxY; $row++) {
        $stage2array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {
            if(!isset($stage2array[$row][$col])) $stage2array[$row][$col] = 0;

            // Look around the current pixel, add brightness
            for($y = $row-$spread; $y <= $row+$spread; $y++) {
                for($x = $col-$spread; $x <= $col+$spread; $x++) {

                    // Don't read values from outside the image
                    if($x >= 0 && $x <= $maxX && $y >= 0 && $y <= $maxY){
                        $stage2array[$row][$col] += $stage1array[$y][$x]+10;
                    }
                }
            }

            $stage2value = $stage2array[$row][$col];
            if($stage2value > $maxStage2) {
                $maxStage2 = $stage2value;
            }
        }
    }

    if($renderStage >= 2){
        // Paint the accumulated light, dimmed by the maximum value from stage 2
        for($row = 0; $row <= $maxY; $row++) {
            for($col = 0; $col <= $maxX; $col++) {
                $brightness = round($stage2array[$row][$col] / $maxStage2 * 255);
                $greyRgb = imagecolorallocate($image, $brightness, $brightness, $brightness);
                imagesetpixel($image, $col, $row, $greyRgb);
            }
        }
    }

    if($renderStage == 2) {
        return;
    }


    // STAGE 3

    // Create a ranking of bright spots (like "Top 20")
    $topN = array();

    for($row = 0; $row <= $maxY; $row++) {
        for($col = 0; $col <= $maxX; $col++) {

            $stage2Brightness = $stage2array[$row][$col];
            $topN[$col.":".$row] = $stage2Brightness;
        }
    }
    arsort($topN);

    $topNused = array();
    $topPositionCountdown = $topPositions;

    if($renderStage == 3){
        foreach ($topN as $key => $val) {
            if($topPositionCountdown <= 0){
                break;
            }

            $position = explode(":", $key);

            foreach($topNused as $usedPosition => $usedValue) {
                $usedPosition = explode(":", $usedPosition);
                $distance = abs($usedPosition[0] - $position[0]) + abs($usedPosition[1] - $position[1]);
                if($distance < $minLightDistance) {
                    continue 2;
                }
            }

            $topNused[$key] = $val;

            paintCrosshair($image, $position[0], $position[1], $red, 2);

            $topPositionCountdown--;

        }
    }


    // STAGE 4
    // Median of all Top N lights
    $topNxValues = array();
    $topNyValues = array();

    foreach ($topNused as $key => $val) {
        $position = explode(":", $key);
        array_push($topNxValues, $position[0]);
        array_push($topNyValues, $position[1]);
    }

    $medianXvalue = round(calculate_median($topNxValues));
    $medianYvalue = round(calculate_median($topNyValues));
    paintCrosshair($image, $medianXvalue, $medianYvalue, $red, 15);


    // STAGE 5
    // Find treetop

    $filename = 'debug.log';
    $handle = fopen($filename, "w");
    fwrite($handle, "\n\n STAGE 5");

    $treetopX = $medianXvalue;
    $treetopY = $medianYvalue;

    $searchXmin = $medianXvalue;
    $searchXmax = $medianXvalue;

    $width = 0;
    for($y = $medianYvalue; $y >= 0; $y--) {
        fwrite($handle, "\nAt y = ".$y);

        if(($y % $searchYperX) == 0) { // Modulo
            $width++;
            $searchXmin = $medianXvalue - $width;
            $searchXmax = $medianXvalue + $width;
            imagesetpixel($image, $searchXmin, $y, $red);
            imagesetpixel($image, $searchXmax, $y, $red);
        }

        foreach ($topNused as $key => $val) {
            $position = explode(":", $key); // "x:y"

            if($position[1] != $y){
                continue;
            }

            if($position[0] >= $searchXmin && $position[0] <= $searchXmax){
                $treetopX = $position[0];
                $treetopY = $y;
            }
        }

    }

    paintCrosshair($image, $treetopX, $treetopY, $red, 5);


    // STAGE 6
    // Find tree sides
    fwrite($handle, "\n\n STAGE 6");

    $treesideAngle = 60; // The extremely "fat" end of a christmas tree
    $treeBottomY = $treetopY;

    $topPositionsExcluded = 0;
    $xymultiplier = 0;
    while(($topPositionsExcluded < ($topPositions / 5)) && $treesideAngle >= 1){
        fwrite($handle, "\n\nWe're at angle ".$treesideAngle);
        $xymultiplier = sin(deg2rad($treesideAngle));
        fwrite($handle, "\nMultiplier: ".$xymultiplier);

        $topPositionsExcluded = 0;
        foreach ($topNused as $key => $val) {
            $position = explode(":", $key);
            fwrite($handle, "\nAt position ".$key);

            if($position[1] > $treeBottomY) {
                $treeBottomY = $position[1];
            }

            // Lights above the tree are outside of it, but don't matter
            if($position[1] < $treetopY){
                $topPositionsExcluded++;
                fwrite($handle, "\nTOO HIGH");
                continue;
            }

            // Top light will generate division by zero
            if($treetopY-$position[1] == 0) {
                fwrite($handle, "\nDIVISION BY ZERO");
                continue;
            }

            // Lights left end right of it are also not inside
            fwrite($handle, "\nLight position factor: ".(abs($treetopX-$position[0]) / abs($treetopY-$position[1])));
            if((abs($treetopX-$position[0]) / abs($treetopY-$position[1])) > $xymultiplier){
                $topPositionsExcluded++;
                fwrite($handle, "\n --- Outside tree ---");
            }
        }

        $treesideAngle--;
    }
    fclose($handle);

    // Paint tree's outline
    $treeHeight = abs($treetopY-$treeBottomY);
    $treeBottomLeft = 0;
    $treeBottomRight = 0;
    $previousState = false; // line has not started; assumes the tree does not "leave"^^

    for($x = 0; $x <= $maxX; $x++){
        if(abs($treetopX-$x) != 0 && abs($treetopX-$x) / $treeHeight > $xymultiplier){
            if($previousState == true){
                $treeBottomRight = $x;
                $previousState = false;
            }
            continue;
        }
        imagesetpixel($image, $x, $treeBottomY, $red);
        if($previousState == false){
            $treeBottomLeft = $x;
            $previousState = true;
        }
    }
    imageline($image, $treeBottomLeft, $treeBottomY, $treetopX, $treetopY, $red);
    imageline($image, $treeBottomRight, $treeBottomY, $treetopX, $treetopY, $red);


    // Print out some parameters

    $string = "Min dist: ".$minLightDistance." | Tree angle: ".$treesideAngle." deg | Tree bottom: ".$treeBottomY;

    $px     = (imagesx($image) - 6.5 * strlen($string)) / 2;
    imagestring($image, 2, $px, 5, $string, $orange);

    return $topN;
}

/**
 * Returns values from 0 to 765
 */
function getBrightnessFromRgb($rgb) {
    $r = ($rgb >> 16) & 0xFF;
    $g = ($rgb >> 8) & 0xFF;
    $b = $rgb & 0xFF;

    return $r+$r+$b;
}

function paintCrosshair($image, $posX, $posY, $color, $size=5) {
    for($x = $posX-$size; $x <= $posX+$size; $x++) {
        if($x>=0 && $x < imagesx($image)){
            imagesetpixel($image, $x, $posY, $color);
        }
    }
    for($y = $posY-$size; $y <= $posY+$size; $y++) {
        if($y>=0 && $y < imagesy($image)){
            imagesetpixel($image, $posX, $y, $color);
        }
    }
}

// From http://www.mdj.us/web-development/php-programming/calculating-the-median-average-values-of-an-array-with-php/
function calculate_median($arr) {
    sort($arr);
    $count = count($arr); //total numbers in array
    $middleval = floor(($count-1)/2); // find the middle value, or the lowest middle value
    if($count % 2) { // odd number, middle is the median
        $median = $arr[$middleval];
    } else { // even number, calculate avg of 2 medians
        $low = $arr[$middleval];
        $high = $arr[$middleval+1];
        $median = (($low+$high)/2);
    }
    return $median;
}


?>

Images:

Bonus: A german Weihnachtsbaum, from Wikipedia http://commons.wikimedia.org/wiki/File:Weihnachtsbaum_R%C3%B6merberg.jpg

回答 9

我将python与opencv一起使用。

我的算法是这样的：

首先，它从图像中获取红色通道
将阈值（最小值200）应用于红色通道
然后应用形态学梯度，然后执行“闭合”（先扩张，然后进行侵蚀）
然后，它在平面中找到轮廓，并选择最长的轮廓。

编码：

import numpy as np
import cv2
import copy


def findTree(image,num):
    im = cv2.imread(image)
    im = cv2.resize(im, (400,250))
    gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)
    imf = copy.deepcopy(im)

    b,g,r = cv2.split(im)
    minR = 200
    _,thresh = cv2.threshold(r,minR,255,0)
    kernel = np.ones((25,5))
    dst = cv2.morphologyEx(thresh, cv2.MORPH_GRADIENT, kernel)
    dst = cv2.morphologyEx(dst, cv2.MORPH_CLOSE, kernel)

    contours = cv2.findContours(dst,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[0]
    cv2.drawContours(im, contours,-1, (0,255,0), 1)

    maxI = 0
    for i in range(len(contours)):
        if len(contours[maxI]) < len(contours[i]):
            maxI = i

    img = copy.deepcopy(r)
    cv2.polylines(img,[contours[maxI]],True,(255,255,255),3)
    imf[:,:,2] = img

    cv2.imshow(str(num), imf)

def main():
    findTree('tree.jpg',1)
    findTree('tree2.jpg',2)
    findTree('tree3.jpg',3)
    findTree('tree4.jpg',4)
    findTree('tree5.jpg',5)
    findTree('tree6.jpg',6)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

如果将内核从（25,5）更改为（10,5），则除左下角以外的所有树都可获得更好的结果，

我的算法假设这棵树上有灯，在左下方的树中，顶部的灯比其他树的灯少。

I used python with opencv.

My algorithm goes like this:

First it takes the red channel from the image
Apply a threshold (min value 200) to the Red channel
Then apply Morphological Gradient and then do a ‘Closing’ (dilation followed by Erosion)
Then it finds the contours in the plane and it picks the longest contour.

The code:

import numpy as np
import cv2
import copy


def findTree(image,num):
    im = cv2.imread(image)
    im = cv2.resize(im, (400,250))
    gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)
    imf = copy.deepcopy(im)

    b,g,r = cv2.split(im)
    minR = 200
    _,thresh = cv2.threshold(r,minR,255,0)
    kernel = np.ones((25,5))
    dst = cv2.morphologyEx(thresh, cv2.MORPH_GRADIENT, kernel)
    dst = cv2.morphologyEx(dst, cv2.MORPH_CLOSE, kernel)

    contours = cv2.findContours(dst,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[0]
    cv2.drawContours(im, contours,-1, (0,255,0), 1)

    maxI = 0
    for i in range(len(contours)):
        if len(contours[maxI]) < len(contours[i]):
            maxI = i

    img = copy.deepcopy(r)
    cv2.polylines(img,[contours[maxI]],True,(255,255,255),3)
    imf[:,:,2] = img

    cv2.imshow(str(num), imf)

def main():
    findTree('tree.jpg',1)
    findTree('tree2.jpg',2)
    findTree('tree3.jpg',3)
    findTree('tree4.jpg',4)
    findTree('tree5.jpg',5)
    findTree('tree6.jpg',6)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

If I change the kernel from (25,5) to (10,5) I get nicer results on all trees but the bottom left,

my algorithm assumes that the tree has lights on it, and in the bottom left tree, the top has less light then the others.