标签归档:c

如何计算一条线与水平轴之间的角度?

问题:如何计算一条线与水平轴之间的角度?

在编程语言(Python,C#等)中,我需要确定如何计算直线和水平轴之间的角度?

我认为一张图片最能说明我想要的内容:

没有语言可以形容这

给定(P1 x,P1 y)和(P2 x,P2 y),计算此角度的最佳方法是什么?原点在左上角,仅使用正象限。

In a programming language (Python, C#, etc) I need to determine how to calculate the angle between a line and the horizontal axis?

I think an image describes best what I want:

no words can describe this

Given (P1x,P1y) and (P2x,P2y) what is the best way to calculate this angle? The origin is in the topleft and only the positive quadrant is used.


回答 0

首先找到起点和终点之间的差异(在这里,这更多的是有向线段,而不是“线”,因为线无限延伸且不在特定点处开始)。

deltaY = P2_y - P1_y
deltaX = P2_x - P1_x

然后计算角度(从的正X轴P1到的正Y轴延伸P1)。

angleInDegrees = arctan(deltaY / deltaX) * 180 / PI

但是arctan可能并不理想,因为用这种方式划分差异将消除区分角度所在象限所需的区分(请参见下文)。如果您的语言包含atan2函数,请改用以下代码:

angleInDegrees = atan2(deltaY, deltaX) * 180 / PI

编辑(2017年2月22日):但是,总的来说,打电话atan2(deltaY,deltaX)只是为了获得适当的角度cossin可能不太优雅。在这种情况下,通常可以执行以下操作:

  1. 治疗(deltaX, deltaY)作为载体。
  2. 将该向量归一化为单位向量。为此,分频deltaXdeltaY由向量的长度(sqrt(deltaX*deltaX+deltaY*deltaY)),除非长度为0。
  3. 此后,deltaX现在将是矢量和水平轴之间的角度(在处从正X轴到正Y轴的方向P1)的余弦值。
  4. deltaY现在将是角度的正弦值。
  5. 如果向量的长度为0,则它​​与水平轴之间将没有角度(因此,它将没有有意义的正弦和余弦)。

编辑(2017年2月28日):即使未规范化(deltaX, deltaY)

  • 的符号deltaX将告诉您步骤3中描述的余弦是正还是负。
  • 的符号deltaY将告诉您步骤4中描述的正弦是正还是负。
  • 的迹象deltaX,并deltaY会告诉你哪个象限的角度是,相对于在正X轴P1
    • +deltaX+deltaY:0至90度。
    • -deltaX+deltaY:90至180度。
    • -deltaX-deltaY180〜270度(-180到-90度)。
    • +deltaX-deltaY:270至360度(-90至0度)。

使用弧度的Python实现(2015年7月19日,由Eric Leschinski提供,他编辑了我的答案):

from math import *
def angle_trunc(a):
    while a < 0.0:
        a += pi * 2
    return a

def getAngleBetweenPoints(x_orig, y_orig, x_landmark, y_landmark):
    deltaY = y_landmark - y_orig
    deltaX = x_landmark - x_orig
    return angle_trunc(atan2(deltaY, deltaX))

angle = getAngleBetweenPoints(5, 2, 1,4)
assert angle >= 0, "angle must be >= 0"
angle = getAngleBetweenPoints(1, 1, 2, 1)
assert angle == 0, "expecting angle to be 0"
angle = getAngleBetweenPoints(2, 1, 1, 1)
assert abs(pi - angle) <= 0.01, "expecting angle to be pi, it is: " + str(angle)
angle = getAngleBetweenPoints(2, 1, 2, 3)
assert abs(angle - pi/2) <= 0.01, "expecting angle to be pi/2, it is: " + str(angle)
angle = getAngleBetweenPoints(2, 1, 2, 0)
assert abs(angle - (pi+pi/2)) <= 0.01, "expecting angle to be pi+pi/2, it is: " + str(angle)
angle = getAngleBetweenPoints(1, 1, 2, 2)
assert abs(angle - (pi/4)) <= 0.01, "expecting angle to be pi/4, it is: " + str(angle)
angle = getAngleBetweenPoints(-1, -1, -2, -2)
assert abs(angle - (pi+pi/4)) <= 0.01, "expecting angle to be pi+pi/4, it is: " + str(angle)
angle = getAngleBetweenPoints(-1, -1, -1, 2)
assert abs(angle - (pi/2)) <= 0.01, "expecting angle to be pi/2, it is: " + str(angle)

所有测试均通过。参见https://en.wikipedia.org/wiki/Unit_circle

First find the difference between the start point and the end point (here, this is more of a directed line segment, not a “line”, since lines extend infinitely and don’t start at a particular point).

deltaY = P2_y - P1_y
deltaX = P2_x - P1_x

Then calculate the angle (which runs from the positive X axis at P1 to the positive Y axis at P1).

angleInDegrees = arctan(deltaY / deltaX) * 180 / PI

But arctan may not be ideal, because dividing the differences this way will erase the distinction needed to distinguish which quadrant the angle is in (see below). Use the following instead if your language includes an atan2 function:

angleInDegrees = atan2(deltaY, deltaX) * 180 / PI

EDIT (Feb. 22, 2017): In general, however, calling atan2(deltaY,deltaX) just to get the proper angle for cos and sin may be inelegant. In those cases, you can often do the following instead:

  1. Treat (deltaX, deltaY) as a vector.
  2. Normalize that vector to a unit vector. To do so, divide deltaX and deltaY by the vector’s length (sqrt(deltaX*deltaX+deltaY*deltaY)), unless the length is 0.
  3. After that, deltaX will now be the cosine of the angle between the vector and the horizontal axis (in the direction from the positive X to the positive Y axis at P1).
  4. And deltaY will now be the sine of that angle.
  5. If the vector’s length is 0, it won’t have an angle between it and the horizontal axis (so it won’t have a meaningful sine and cosine).

EDIT (Feb. 28, 2017): Even without normalizing (deltaX, deltaY):

  • The sign of deltaX will tell you whether the cosine described in step 3 is positive or negative.
  • The sign of deltaY will tell you whether the sine described in step 4 is positive or negative.
  • The signs of deltaX and deltaY will tell you which quadrant the angle is in, in relation to the positive X axis at P1:
    • +deltaX, +deltaY: 0 to 90 degrees.
    • -deltaX, +deltaY: 90 to 180 degrees.
    • -deltaX, -deltaY: 180 to 270 degrees (-180 to -90 degrees).
    • +deltaX, -deltaY: 270 to 360 degrees (-90 to 0 degrees).

An implementation in Python using radians (provided on July 19, 2015 by Eric Leschinski, who edited my answer):

from math import *
def angle_trunc(a):
    while a < 0.0:
        a += pi * 2
    return a

def getAngleBetweenPoints(x_orig, y_orig, x_landmark, y_landmark):
    deltaY = y_landmark - y_orig
    deltaX = x_landmark - x_orig
    return angle_trunc(atan2(deltaY, deltaX))

angle = getAngleBetweenPoints(5, 2, 1,4)
assert angle >= 0, "angle must be >= 0"
angle = getAngleBetweenPoints(1, 1, 2, 1)
assert angle == 0, "expecting angle to be 0"
angle = getAngleBetweenPoints(2, 1, 1, 1)
assert abs(pi - angle) <= 0.01, "expecting angle to be pi, it is: " + str(angle)
angle = getAngleBetweenPoints(2, 1, 2, 3)
assert abs(angle - pi/2) <= 0.01, "expecting angle to be pi/2, it is: " + str(angle)
angle = getAngleBetweenPoints(2, 1, 2, 0)
assert abs(angle - (pi+pi/2)) <= 0.01, "expecting angle to be pi+pi/2, it is: " + str(angle)
angle = getAngleBetweenPoints(1, 1, 2, 2)
assert abs(angle - (pi/4)) <= 0.01, "expecting angle to be pi/4, it is: " + str(angle)
angle = getAngleBetweenPoints(-1, -1, -2, -2)
assert abs(angle - (pi+pi/4)) <= 0.01, "expecting angle to be pi+pi/4, it is: " + str(angle)
angle = getAngleBetweenPoints(-1, -1, -1, 2)
assert abs(angle - (pi/2)) <= 0.01, "expecting angle to be pi/2, it is: " + str(angle)

All tests pass. See https://en.wikipedia.org/wiki/Unit_circle


回答 1

抱歉,但是我很确定Peter的回答是错误的。请注意,y轴位于页面下方(在图形中常见)。因此,必须逆转deltaY计算,否则您将得到错误的答案。

考虑:

System.out.println (Math.toDegrees(Math.atan2(1,1)));
System.out.println (Math.toDegrees(Math.atan2(-1,1)));
System.out.println (Math.toDegrees(Math.atan2(1,-1)));
System.out.println (Math.toDegrees(Math.atan2(-1,-1)));

45.0
-45.0
135.0
-135.0

因此,如果在上面的示例中P1为(1,1),P2为(2,2)[因为Y向下扩展页面],则上面的代码将为所示示例给出45.0度,这是错误的。更改deltaY计算的顺序,即可正​​常工作。

Sorry, but I’m pretty sure Peter’s answer is wrong. Note that the y axis goes down the page (common in graphics). As such the deltaY calculation has to be reversed, or you get the wrong answer.

Consider:

System.out.println (Math.toDegrees(Math.atan2(1,1)));
System.out.println (Math.toDegrees(Math.atan2(-1,1)));
System.out.println (Math.toDegrees(Math.atan2(1,-1)));
System.out.println (Math.toDegrees(Math.atan2(-1,-1)));

gives

45.0
-45.0
135.0
-135.0

So if in the example above, P1 is (1,1) and P2 is (2,2) [because Y increases down the page], the code above will give 45.0 degrees for the example shown, which is wrong. Change the order of the deltaY calculation and it works properly.


回答 2

我已经找到了运行良好的Python解决方案!

from math import atan2,degrees

def GetAngleOfLineBetweenTwoPoints(p1, p2):
    return degrees(atan2(p2 - p1, 1))

print GetAngleOfLineBetweenTwoPoints(1,3)

I have found a solution in Python that is working well !

from math import atan2,degrees

def GetAngleOfLineBetweenTwoPoints(p1, p2):
    return degrees(atan2(p2 - p1, 1))

print GetAngleOfLineBetweenTwoPoints(1,3)

回答 3

考虑到确切的问题,将我们置于一个“特殊”坐标系中,其中正轴表示向下移动(例如屏幕或界面视图),您需要像这样调整该功能,而将Y坐标设为负:

Swift 2.0中的示例

func angle_between_two_points(pa:CGPoint,pb:CGPoint)->Double{
    let deltaY:Double = (Double(-pb.y) - Double(-pa.y))
    let deltaX:Double = (Double(pb.x) - Double(pa.x))
    var a = atan2(deltaY,deltaX)
    while a < 0.0 {
        a = a + M_PI*2
    }
    return a
}

此功能可为问题提供正确答案。答案以弧度为单位,因此,以度为单位查看角度的用法是:

let p1 = CGPoint(x: 1.5, y: 2) //estimated coords of p1 in question
let p2 = CGPoint(x: 2, y : 3) //estimated coords of p2 in question

print(angle_between_two_points(p1, pb: p2) / (M_PI/180))
//returns 296.56

Considering the exact question, putting us in a “special” coordinates system where positive axis means moving DOWN (like a screen or an interface view), you need to adapt this function like this, and negative the Y coordinates:

Example in Swift 2.0

func angle_between_two_points(pa:CGPoint,pb:CGPoint)->Double{
    let deltaY:Double = (Double(-pb.y) - Double(-pa.y))
    let deltaX:Double = (Double(pb.x) - Double(pa.x))
    var a = atan2(deltaY,deltaX)
    while a < 0.0 {
        a = a + M_PI*2
    }
    return a
}

This function gives a correct answer to the question. Answer is in radians, so the usage, to view angles in degrees, is:

let p1 = CGPoint(x: 1.5, y: 2) //estimated coords of p1 in question
let p2 = CGPoint(x: 2, y : 3) //estimated coords of p2 in question

print(angle_between_two_points(p1, pb: p2) / (M_PI/180))
//returns 296.56

回答 4

基于参考“ Peter O”。.这是java版本

private static final float angleBetweenPoints(PointF a, PointF b) {
float deltaY = b.y - a.y;
float deltaX = b.x - a.x;
return (float) (Math.atan2(deltaY, deltaX)); }

Based on reference “Peter O”.. Here is the java version

private static final float angleBetweenPoints(PointF a, PointF b) {
float deltaY = b.y - a.y;
float deltaX = b.x - a.x;
return (float) (Math.atan2(deltaY, deltaX)); }

回答 5

matlab函数:

function [lineAngle] = getLineAngle(x1, y1, x2, y2) 
    deltaY = y2 - y1;
    deltaX = x2 - x1;

    lineAngle = rad2deg(atan2(deltaY, deltaX));

    if deltaY < 0
        lineAngle = lineAngle + 360;
    end
end

matlab function:

function [lineAngle] = getLineAngle(x1, y1, x2, y2) 
    deltaY = y2 - y1;
    deltaX = x2 - x1;

    lineAngle = rad2deg(atan2(deltaY, deltaX));

    if deltaY < 0
        lineAngle = lineAngle + 360;
    end
end

回答 6

角度从0到2pi的公式。

有x = x2-x1和y = y2-y1。

x和y的任何值。对于x = y = 0,结果是不确定的。

f(x,y)= pi()-pi()/ 2 *(1+符号(x))*(1-符号(y ^ 2))

     -pi()/4*(2+sign(x))*sign(y)

     -sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))

A formula for an angle from 0 to 2pi.

There is x=x2-x1 and y=y2-y1.The formula is working for

any value of x and y. For x=y=0 the result is undefined.

f(x,y)=pi()-pi()/2*(1+sign(x))*(1-sign(y^2))

     -pi()/4*(2+sign(x))*sign(y)

     -sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))

回答 7

deltaY = Math.Abs(P2.y - P1.y);
deltaX = Math.Abs(P2.x - P1.x);

angleInDegrees = Math.atan2(deltaY, deltaX) * 180 / PI

if(p2.y > p1.y) // Second point is lower than first, angle goes down (180-360)
{
  if(p2.x < p1.x)//Second point is to the left of first (180-270)
    angleInDegrees += 180;
  else //(270-360)
    angleInDegrees += 270;
}
else if (p2.x < p1.x) //Second point is top left of first (90-180)
  angleInDegrees += 90;
deltaY = Math.Abs(P2.y - P1.y);
deltaX = Math.Abs(P2.x - P1.x);

angleInDegrees = Math.atan2(deltaY, deltaX) * 180 / PI

if(p2.y > p1.y) // Second point is lower than first, angle goes down (180-360)
{
  if(p2.x < p1.x)//Second point is to the left of first (180-270)
    angleInDegrees += 180;
  else //(270-360)
    angleInDegrees += 270;
}
else if (p2.x < p1.x) //Second point is top left of first (90-180)
  angleInDegrees += 90;

回答 8

import math
from collections import namedtuple


Point = namedtuple("Point", ["x", "y"])


def get_angle(p1: Point, p2: Point) -> float:
    """Get the angle of this line with the horizontal axis."""
    dx = p2.x - p1.x
    dy = p2.y - p1.y
    theta = math.atan2(dy, dx)
    angle = math.degrees(theta)  # angle is in (-180, 180]
    if angle < 0:
        angle = 360 + angle
    return angle

测验

为了进行测试,我让假设生成了测试案例。

在此处输入图片说明

import hypothesis.strategies as s
from hypothesis import given


@given(s.floats(min_value=0.0, max_value=360.0))
def test_angle(angle: float):
    epsilon = 0.0001
    x = math.cos(math.radians(angle))
    y = math.sin(math.radians(angle))
    p1 = Point(0, 0)
    p2 = Point(x, y)
    assert abs(get_angle(p1, p2) - angle) < epsilon
import math
from collections import namedtuple


Point = namedtuple("Point", ["x", "y"])


def get_angle(p1: Point, p2: Point) -> float:
    """Get the angle of this line with the horizontal axis."""
    dx = p2.x - p1.x
    dy = p2.y - p1.y
    theta = math.atan2(dy, dx)
    angle = math.degrees(theta)  # angle is in (-180, 180]
    if angle < 0:
        angle = 360 + angle
    return angle

Testing

For testing I let hypothesis generate test cases.

enter image description here

import hypothesis.strategies as s
from hypothesis import given


@given(s.floats(min_value=0.0, max_value=360.0))
def test_angle(angle: float):
    epsilon = 0.0001
    x = math.cos(math.radians(angle))
    y = math.sin(math.radians(angle))
    p1 = Point(0, 0)
    p2 = Point(x, y)
    assert abs(get_angle(p1, p2) - angle) < epsilon

在Python中包装C库:C,Cython或ctypes?

问题:在Python中包装C库:C,Cython或ctypes?

我想从Python应用程序调用C库。我不想包装整个API,只包装与我的情况相关的函数和数据类型。如我所见,我有三个选择:

  1. 在C中创建一个实际的扩展模块。可能有点过头了,我还想避免学习扩展编写的开销。
  2. 使用Cython将C库的相关部分公开给Python。
  3. 使用Python ctypes与外部库进行通信,从而完成整个工作。

我不确定2)还是3)是更好的选择。3)的优点是它ctypes是标准库的一部分,并且生成的代码将是纯Python –尽管我不确定该优点实际上有多大。

两种选择都有其他优点/缺点吗?您推荐哪种方法?


编辑:感谢您的所有答复,它们为希望做类似事情的任何人提供了很好的资源。当然,仍需针对单个案例做出决定-没有人会回答“这是对的”。就我自己而言,我可能会使用ctypes,但我也期待在其他项目中试用Cython。

由于没有一个单一的真实答案,因此接受一个答案有点武断。我选择了FogleBird的答案,因为它提供了对ctypes的一些很好的了解,并且它也是当前投票最高的答案。但是,我建议您阅读所有答案以获得一个很好的概述。

再次感谢。

I want to call a C library from a Python application. I don’t want to wrap the whole API, only the functions and datatypes that are relevant to my case. As I see it, I have three choices:

  1. Create an actual extension module in C. Probably overkill, and I’d also like to avoid the overhead of learning extension writing.
  2. Use Cython to expose the relevant parts from the C library to Python.
  3. Do the whole thing in Python, using ctypes to communicate with the external library.

I’m not sure whether 2) or 3) is the better choice. The advantage of 3) is that ctypes is part of the standard library, and the resulting code would be pure Python – although I’m not sure how big that advantage actually is.

Are there more advantages / disadvantages with either choice? Which approach do you recommend?


Edit: Thanks for all your answers, they provide a good resource for anyone looking to do something similar. The decision, of course, is still to be made for the single case—there’s no one “This is the right thing” sort of answer. For my own case, I’ll probably go with ctypes, but I’m also looking forward to trying out Cython in some other project.

With there being no single true answer, accepting one is somewhat arbitrary; I chose FogleBird’s answer as it provides some good insight into ctypes and it currently also is the highest-voted answer. However, I suggest to read all the answers to get a good overview.

Thanks again.


回答 0

ctypes 是快速完成它的最佳选择,并且在仍在编写Python的情况下很高兴与您合作!

我最近包装了一个FTDI驱动程序,用于使用ctypes与USB芯片进行通信,这很棒。我完成了所有工作,并在不到一个工作日的时间内完成了工作。(我只实现了我们需要的功能,大约有15个功能)。

出于同一目的,我们以前使用的是第三方模块PyUSB。PyUSB是实际的C / Python扩展模块。但是PyUSB在阻止读写时并没有释放GIL,这给我们带来了麻烦。因此,我使用ctypes编写了自己的模块,该模块在调用本机函数时会释放GIL。

需要注意的一件事是,ctypes不会知道#define所使用的库中的常数和内容,而仅是函数,因此您必须在自己的代码中重新定义这些常数。

这是一个代码最终的外观示例(很多内容被删除,只是试图向您展示其要旨):

from ctypes import *

d2xx = WinDLL('ftd2xx')

OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3

...

def openEx(serial):
    serial = create_string_buffer(serial)
    handle = c_int()
    if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
        return Handle(handle.value)
    raise D2XXException

class Handle(object):
    def __init__(self, handle):
        self.handle = handle
    ...
    def read(self, bytes):
        buffer = create_string_buffer(bytes)
        count = c_int()
        if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
            return buffer.raw[:count.value]
        raise D2XXException
    def write(self, data):
        buffer = create_string_buffer(data)
        count = c_int()
        bytes = len(data)
        if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
            return count.value
        raise D2XXException

有人对各种选项做了一些基准测试

如果不得不包装带有许多类/模板/等的C ++库,我可能会更加犹豫。但是ctypes可以很好地与结构配合使用,甚至可以回调到Python中。

ctypes is your best bet for getting it done quickly, and it’s a pleasure to work with as you’re still writing Python!

I recently wrapped an FTDI driver for communicating with a USB chip using ctypes and it was great. I had it all done and working in less than one work day. (I only implemented the functions we needed, about 15 functions).

We were previously using a third-party module, PyUSB, for the same purpose. PyUSB is an actual C/Python extension module. But PyUSB wasn’t releasing the GIL when doing blocking reads/writes, which was causing problems for us. So I wrote our own module using ctypes, which does release the GIL when calling the native functions.

One thing to note is that ctypes won’t know about #define constants and stuff in the library you’re using, only the functions, so you’ll have to redefine those constants in your own code.

Here’s an example of how the code ended up looking (lots snipped out, just trying to show you the gist of it):

from ctypes import *

d2xx = WinDLL('ftd2xx')

OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3

...

def openEx(serial):
    serial = create_string_buffer(serial)
    handle = c_int()
    if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
        return Handle(handle.value)
    raise D2XXException

class Handle(object):
    def __init__(self, handle):
        self.handle = handle
    ...
    def read(self, bytes):
        buffer = create_string_buffer(bytes)
        count = c_int()
        if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
            return buffer.raw[:count.value]
        raise D2XXException
    def write(self, data):
        buffer = create_string_buffer(data)
        count = c_int()
        bytes = len(data)
        if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
            return count.value
        raise D2XXException

Someone did some benchmarks on the various options.

I might be more hesitant if I had to wrap a C++ library with lots of classes/templates/etc. But ctypes works well with structs and can even callback into Python.


回答 1

警告:Cython核心开发人员的意见。

我几乎总是建议Cython胜过ctypes。原因是它具有更平滑的升级路径。如果使用ctypes,一开始很多事情都会很简单,用纯Python编写FFI代码当然很酷,而无需编译,构建依赖关系以及所有这些。但是,在某个时候,您几乎可以肯定会发现,您必须循环或以较长的一系列相互依赖的调用方式大量调用C库,并且您希望加快速度。在这一点上,您会注意到无法使用ctypes做到这一点。或者,当您需要回调函数并且发现Python回调代码成为瓶颈时,您也想加快它的速度和/或也将它移入C。同样,您不能使用ctypes做到这一点。

使用OTOH的Cython,您可以完全自由地使包装和调用代码变薄或变厚。您可以从常规Python代码对C代码的简单调用开始,然后Cython会将它们转换为本地C调用,而没有任何额外的调用开销,并且Python参数的转换开销非常低。当您发现需要对C库进行太多昂贵的调用时甚至需要更高的性能时,可以开始使用静态类型注释周围的Python代码,并让Cython为您直接对其进行优化。或者,您可以开始在Cython中重写部分C代码,以避免调用并在算法上专门化和加强循环。如果您需要快速回调,只需编写具有适当签名的函数,然后将其直接传递到C回调注册表即可。再次,没有开销,并且它为您提供了普通的C调用性能。而且在不太可能发生的情况下,您实际上无法在Cython中获得足够快的代码,您仍然可以考虑使用C(或C ++或Fortran)重写其真正关键的部分,并自然地从本地从Cython代码中调用它。但是,这实际上成了最后的选择,而不是唯一的选择。

因此,ctypes可以很好地完成简单的事情并快速使某些事情运行。但是,一旦事情开始发展,您很可能会发现您最好从一开始就使用Cython。

Warning: a Cython core developer’s opinion ahead.

I almost always recommend Cython over ctypes. The reason is that it has a much smoother upgrade path. If you use ctypes, many things will be simple at first, and it’s certainly cool to write your FFI code in plain Python, without compilation, build dependencies and all that. However, at some point, you will almost certainly find that you have to call into your C library a lot, either in a loop or in a longer series of interdependent calls, and you would like to speed that up. That’s the point where you’ll notice that you can’t do that with ctypes. Or, when you need callback functions and you find that your Python callback code becomes a bottleneck, you’d like to speed it up and/or move it down into C as well. Again, you cannot do that with ctypes. So you have to switch languages at that point and start rewriting parts of your code, potentially reverse engineering your Python/ctypes code into plain C, thus spoiling the whole benefit of writing your code in plain Python in the first place.

With Cython, OTOH, you’re completely free to make the wrapping and calling code as thin or thick as you want. You can start with simple calls into your C code from regular Python code, and Cython will translate them into native C calls, without any additional calling overhead, and with an extremely low conversion overhead for Python parameters. When you notice that you need even more performance at some point where you are making too many expensive calls into your C library, you can start annotating your surrounding Python code with static types and let Cython optimise it straight down into C for you. Or, you can start rewriting parts of your C code in Cython in order to avoid calls and to specialise and tighten your loops algorithmically. And if you need a fast callback, just write a function with the appropriate signature and pass it into the C callback registry directly. Again, no overhead, and it gives you plain C calling performance. And in the much less likely case that you really cannot get your code fast enough in Cython, you can still consider rewriting the truly critical parts of it in C (or C++ or Fortran) and call it from your Cython code naturally and natively. But then, this really becomes the last resort instead of the only option.

So, ctypes is nice to do simple things and to quickly get something running. However, as soon as things start to grow, you’ll most likely come to the point where you notice that you’d better used Cython right from the start.


回答 2

Cython本身就是一个很酷的工具,值得学习,并且令人惊讶地接近Python语法。如果您使用Numpy进行任何科学计算,那么Cython是必经之路,因为它与Numpy集成在一起可实现快速矩阵运算。

Cython是Python语言的超集。您可以向其抛出任何有效的Python文件,它将吐出有效的C程序。在这种情况下,Cython只会将Python调用映射到基础CPython API。由于不再解释您的代码,因此可能导致50%的加速。

为了进行一些优化,您必须开始告诉Cython有关代码的其他事实,例如类型声明。如果讲得足够多,它可以将代码简化为纯C。也就是说,Python中的for循环变成C中的for循环。在这里,您将看到大量的速度提升。您也可以在此处链接到外部C程序。

使用Cython代码也非常容易。我以为手册听起来很难。您实际上只是这样做:

$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so

然后您可以import mymodule在您的Python代码中完全忘记了它可以编译为C。

无论如何,由于Cython易于安装和开始使用,因此建议您尝试一下它是否适合您的需求。如果事实证明不是您要寻找的工具,那将不是浪费。

Cython is a pretty cool tool in itself, well worth learning, and is surprisingly close to the Python syntax. If you do any scientific computing with Numpy, then Cython is the way to go because it integrates with Numpy for fast matrix operations.

Cython is a superset of Python language. You can throw any valid Python file at it, and it will spit out a valid C program. In this case, Cython will just map the Python calls to the underlying CPython API. This results in perhaps a 50% speedup because your code is no longer interpreted.

To get some optimizations, you have to start telling Cython additional facts about your code, such as type declarations. If you tell it enough, it can boil the code down to pure C. That is, a for loop in Python becomes a for loop in C. Here you will see massive speed gains. You can also link to external C programs here.

Using Cython code is also incredibly easy. I thought the manual makes it sound difficult. You literally just do:

$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so

and then you can import mymodule in your Python code and forget entirely that it compiles down to C.

In any case, because Cython is so easy to setup and start using, I suggest trying it to see if it suits your needs. It won’t be a waste if it turns out not to be the tool you’re looking for.


回答 3

为了从Python应用程序调用C库,还有cffi,它是ctypes的新选择。它为FFI带来了全新外观:

  • 它以一种引人入胜,干净的方式处理问题(与ctypes相对
  • 它不需要编写非Python代码(如SWIG,Cython等)

For calling a C library from a Python application there is also cffi which is a new alternative for ctypes. It brings a fresh look for FFI:

  • it handles the problem in a fascinating, clean way (as opposed to ctypes)
  • it doesn’t require to write non Python code (as in SWIG, Cython, …)

回答 4

我再扔一个:SWIG

它易于学习,可以正确完成许多事情,并且支持更多语言,因此花时间学习它会非常有用。

如果您使用SWIG,则将创建一个新的python扩展模块,但是SWIG将为您完成大部分繁重的工作。

I’ll throw another one out there: SWIG

It’s easy to learn, does a lot of things right, and supports many more languages so the time spent learning it can be pretty useful.

If you use SWIG, you are creating a new python extension module, but with SWIG doing most of the heavy lifting for you.


回答 5

就个人而言,我会用C语言编写一个扩展模块。不要被Python C扩展吓到了-它们一点也不难编写。该文档非常清楚并且很有帮助。当我第一次用Python编写C扩展时,我认为花了大约一个小时才弄清楚如何编写一个-根本没有多少时间。

Personally, I’d write an extension module in C. Don’t be intimidated by Python C extensions — they’re not hard at all to write. The documentation is very clear and helpful. When I first wrote a C extension in Python, I think it took me about an hour to figure out how to write one — not much time at all.


回答 6

当您已经有编译的库blob要处理时(例如OS库),ctypes很棒。但是,调用开销很大,因此,如果您要在库中进行大量调用,并且无论如何都要编写C代码(或至少要编译它),我会说赛顿。它不需要太多工作,并且使用生成的pyd文件会更快,更pythonic。

我个人倾向于使用cython来加快python代码的速度(循环和整数比较是cython尤为突出的两个领域),并且当涉及到更多涉及代码/其他库的包装时,我将转向Boost.Python。。Boost.Python的设置可能很繁琐,但是一旦它开始工作,它将使包装C / C ++代码变得简单。

cython也非常擅长包装numpy(这是我从SciPy 2009程序中了解到的),但是我没有使用numpy,所以我无法对此发表评论。

ctypes is great when you’ve already got a compiled library blob to deal with (such as OS libraries). The calling overhead is severe, however, so if you’ll be making a lot of calls into the library, and you’re going to be writing the C code anyway (or at least compiling it), I’d say to go for cython. It’s not much more work, and it’ll be much faster and more pythonic to use the resulting pyd file.

I personally tend to use cython for quick speedups of python code (loops and integer comparisons are two areas where cython particularly shines), and when there is some more involved code/wrapping of other libraries involved, I’ll turn to Boost.Python. Boost.Python can be finicky to set up, but once you’ve got it working, it makes wrapping C/C++ code straightforward.

cython is also great at wrapping numpy (which I learned from the SciPy 2009 proceedings), but I haven’t used numpy, so I can’t comment on that.


回答 7

如果您已经有一个带有定义的API的库,我认为 ctypes是最好的选择,因为您只需要进行一些初始化,然后或多或少以您习惯的方式调用该库。

我认为Cython或用C创建扩展模块(这不是很困难)在需要新代码时更有用,例如,调用该库并执行一些复杂,耗时的任务,然后将结果传递给Python。

对于简单程序,另一种方法是直接执行不同的过程(在外部编译),将结果输出到标准输出,并使用子过程模块进行调用。有时,这是最简单的方法。

例如,如果您使控制台C程序或多或少地以这种方式工作

$miCcode 10
Result: 12345678

您可以从Python调用它

>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678

只需一点点字符串格式化,您就可以按照您想要的任何方式获取结果。您还可以捕获标准错误输出,因此非常灵活。

If you have already a library with a defined API, I think ctypes is the best option, as you only have to do a little initialization and then more or less call the library the way you’re used to.

I think Cython or creating an extension module in C (which is not very difficult) are more useful when you need new code, e.g. calling that library and do some complex, time-consuming tasks, and then passing the result to Python.

Another approach, for simple programs, is directly do a different process (compiled externally), outputting the result to standard output and call it with subprocess module. Sometimes it’s the easiest approach.

For example, if you make a console C program that works more or less that way

$miCcode 10
Result: 12345678

You could call it from Python

>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678

With a little string formating, you can take the result in any way you want. You can also capture the standard error output, so it’s quite flexible.


回答 8

有一个问题使我使用ctypes而不是cython,其他答案中未提及。

使用ctypes的结果根本不取决于您使用的编译器。您可以或多或少地使用可以编译为本机共享库的任何语言来编写库。哪种系统,哪种语言和哪种编译器都没关系。但是,Cython受基础架构的限制。例如,如果您想在Windows上使用Intel编译器,则使cython正常工作要困难得多:您应将cython解释为cython,使用此精确的编译器重新编译某些内容,等等。这大大限制了可移植性。

There is one issue which made me use ctypes and not cython and which is not mentioned in other answers.

Using ctypes the result does not depend on compiler you are using at all. You may write a library using more or less any language which may be compiled to native shared library. It does not matter much, which system, which language and which compiler. Cython, however, is limited by the infrastructure. E.g, if you want to use intel compiler on windows, it is much more tricky to make cython work: you should “explain” compiler to cython, recompile something with this exact compiler, etc. Which significantly limits portability.


回答 9

如果您以Windows为目标并且选择包装一些专有的C ++库,那么您很快就会发现msvcrt***.dll(Visual C ++ Runtime)的不同版本略有不兼容。

这意味着您可能无法使用,Cython因为结果wrapper.pyd链接到msvcr90.dll (Python 2.7)msvcr100.dll (Python 3.x)。如果要包装的库是针对不同版本的运行时链接的,那么您就不走运了。

然后,要使工作正常,您需要为C ++库创建C包装程序,将该包装程序dll链接到与msvcrt***.dllC ++库相同的版本。然后用于ctypes在运行时动态加载手动包装的dll。

因此,有很多小细节,下面的文章中将对其进行详细描述:

“美丽的本地库(使用Python) ”:http : //lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/

If you are targeting Windows and choose to wrap some proprietary C++ libraries, then you may soon discover that different versions of msvcrt***.dll (Visual C++ Runtime) are slightly incompatible.

This means that you may not be able to use Cython since resulting wrapper.pyd is linked against msvcr90.dll (Python 2.7) or msvcr100.dll (Python 3.x). If the library that you are wrapping is linked against different version of runtime, then you’re out of luck.

Then to make things work you’ll need to create C wrappers for C++ libraries, link that wrapper dll against the same version of msvcrt***.dll as your C++ library. And then use ctypes to load your hand-rolled wrapper dll dynamically at the runtime.

So there are lots of small details, which are described in great detail in following article:

“Beautiful Native Libraries (in Python)“: http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/


回答 10

对于使用GLib的库,也有可能使用GObject Introspection

There’s also one possibility to use GObject Introspection for libraries that are using GLib.


回答 11

我知道这是一个老问题,但是当您搜索诸如之类的东西时,这件事就会出现在Google上ctypes vs cython,并且这里的大多数答案都是由精通这些知识的人编写的,cython或者c可能无法反映您学习这些所需要的实际时间实施您的解决方案。我都是这方面的初学者。我以前从未接触过cython,并且经验很少c/c++

在过去的两天里,我一直在寻找一种方法来将代码中性能很重要的部分委托给比python更底层的东西。我在ctypes和中都实现了我的代码Cython,该代码基本上由两个简单的函数组成。

我有一个庞大的字符串列表需要处理。注意liststring。这两种类型都不完全对应于中的类型c,因为默认情况下python字符串是unicode,而c字符串不是。python中的列表根本不是c的数组。

这是我的判决。使用cython。它与python的集成更加流畅,并且通常更易于使用。当出现问题时,ctypes只会引发段错误,至少cython会在可能的情况下为您提供带有堆栈跟踪的编译警告,并且您可以使用轻松返回有效的python对象cython

这是有关我需要花多少时间来实现这两个功能的详细说明。顺便说一下,我很少进行C / C ++编程:

  • C类型:

    • 关于研究如何将unicode字符串列表转换为ac兼容类型的大约2小时。
    • 关于如何从ac函数正确返回字符串的大约一个小时。在编写函数之后,实际上我在这里为SO提供了自己的解决方案。
    • 用c编写代码大约半小时,然后将其编译到动态库中。
    • 10分钟在python中编写测试代码以检查c代码是否有效。
    • 做一些测试并重新排列c代码大约一个小时。
    • 然后,我将c代码插入到实际的代码库中,发现该模块ctypes不能很好地与multiprocessing模块配合使用,因为默认情况下无法选择其处理程序。
    • 大约20分钟后,我重新排列了代码以不使用multiprocessing模块,然后重试。
    • 然后,c尽管我的代码中的第二个功能通过了我的测试代码,但仍在我的代码库中产生了段错误。好吧,这可能是我不能很好地检查边缘情况的错,我一直在寻找一种快速的解决方案。
    • 在大约40分钟的时间里,我试图确定这些段错误的可能原因。
    • 我将函数分为两个库,然后重试。我的第二个功能仍然存在段错误。
    • 我决定放开第二个函数,只使用c代码的第一个函数,并且在使用它的python循环的第二或第三次迭代中,UnicodeError尽管我进行了一切编码和解码,但我仍未在某个位置解码字节明确地

在这一点上,我决定寻找一种替代方法,并决定研究cython

  • 赛顿
    • 阅读cython hello world的 10分钟。
    • 15分钟内检查SO如何使用cython setuptools代替distutils
    • 关于cython类型和python类型的阅读,需要 10分钟。我了解到我可以使用大多数内置的python类型进行静态输入。
    • 用cython类型重新注释我的python代码的15分钟。
    • 修改setup.py我的代码库中使用编译模块的10分钟。
    • 将模块直接插入multiprocessing代码库的版本。有用。

根据记录,我当然没有衡量投资的确切时机。很可能是由于我在处理ctypes时需要付出精神上的努力,所以我对时间的感觉有些不专心。但是,应该传达处理的手感cythonctypes

I know this is an old question but this thing comes up on google when you search stuff like ctypes vs cython, and most of the answers here are written by those who are proficient already in cython or c which might not reflect the actual time you needed to invest to learn those to implement your solution. I am a complete beginner in both. I have never touched cython before, and have very little experience on c/c++.

For the last two days, I was looking for a way to delegate a performance heavy part of my code to something more low level than python. I implemented my code both in ctypes and Cython, which consisted basically of two simple functions.

I had a huge string list that needed to processed. Notice list and string. Both types do not correspond perfectly to types in c, because python strings are by default unicode and c strings are not. Lists in python are simply NOT arrays of c.

Here is my verdict. Use cython. It integrates more fluently to python, and easier to work with in general. When something goes wrong ctypes just throws you segfault, at least cython will give you compile warnings with a stack trace whenever it is possible, and you can return a valid python object easily with cython.

Here is a detailed account on how much time I needed to invest in both them to implement the same function. I did very little C/C++ programming by the way:

  • Ctypes:

    • About 2h on researching how to transform my list of unicode strings to a c compatible type.
    • About an hour on how to return a string properly from a c function. Here I actually provided my own solution to SO once I have written the functions.
    • About half an hour to write the code in c, compile it to a dynamic library.
    • 10 minutes to write a test code in python to check if c code works.
    • About an hour of doing some tests and rearranging the c code.
    • Then I plugged the c code into actual code base, and saw that ctypes does not play well with multiprocessing module as its handler is not pickable by default.
    • About 20 minutes I rearranged my code to not use multiprocessing module, and retried.
    • Then second function in my c code generated segfaults in my code base although it passed my testing code. Well, this is probably my fault for not checking well with edge cases, I was looking for a quick solution.
    • For about 40 minutes I tried to determine possible causes of these segfaults.
    • I split my functions into two libraries and tried again. Still had segfaults for my second function.
    • I decided to let go of the second function and use only the first function of c code and at the second or third iteration of the python loop that uses it, I had a UnicodeError about not decoding a byte at the some position though I encoded and decoded everthing explicitely.

At this point, I decided to search for an alternative and decided to look into cython:

  • Cython
    • 10 min of reading cython hello world.
    • 15 min of checking SO on how to use cython with setuptools instead of distutils.
    • 10 min of reading on cython types and python types. I learnt I can use most of the builtin python types for static typing.
    • 15 min of reannotating my python code with cython types.
    • 10 min of modifying my setup.py to use compiled module in my codebase.
    • Plugged in the module directly to the multiprocessing version of codebase. It works.

For the record, I of course, did not measure the exact timings of my investment. It may very well be the case that my perception of time was a little to attentive due to mental effort required while I was dealing with ctypes. But it should convey the feel of dealing with cython and ctypes


如何检测圣诞树?[关闭]

问题:如何检测圣诞树?[关闭]

可以使用哪些图像处理技术来实现检测以下图像中显示的圣诞树的应用程序?

我正在寻找可以在所有这些图像上使用的解决方案。因此,需要训练haar级联分类器模板匹配的方法不是很有趣。

我正在寻找可以使用任何编程语言编写的东西,只要它仅使用开源技术即可。该解决方案必须使用此问题上共享的图像进行测试。有6个输入图像,答案应显示每个图像的处理结果。最后,对于每个输出图像,必须绘制红线以包围检测到的树。

您将如何以编程方式检测这些图像中的树木?

Which image processing techniques could be used to implement an application that detects the Christmas trees displayed in the following images?

I’m searching for solutions that are going to work on all these images. Therefore, approaches that require training haar cascade classifiers or template matching are not very interesting.

I’m looking for something that can be written in any programming language, as long as it uses only Open Source technologies. The solution must be tested with the images that are shared on this question. There are 6 input images and the answer should display the results of processing each of them. Finally, for each output image there must be red lines draw to surround the detected tree.

How would you go about programmatically detecting the trees in these images?


回答 0

我有一种我认为很有趣的方法,与其他方法有所不同。与其他方法相比,我的方法的主要区别在于如何执行图像分割步骤-我使用了来自python scikit-learn 的DBSCAN聚类算法;它经过优化,可找到可能不一定具有单个清晰质心的某种无定形形状。

在最高层,我的方法很简单,可以分解为大约3个步骤。首先,我应用一个阈值(或者实际上是两个单独且不同的阈值的逻辑“或”)。与其他许多答案一样,我假设圣诞树将是场景中较亮的对象之一,因此第一个阈值只是一个简单的单色亮度测试;0-255范围(黑色为0,白色为255)上的值大于220的所有像素将保存到二进制黑白图像。第二个阈值尝试寻找红光和黄光,这在六张图像的左上角和右下角的树木中尤为突出,并且在大多数照片中普遍使用的蓝绿色背景下表现出色。我将rgb图像转换为hsv空间,并要求色相在0.0-1.0范围内小于0.2(大致对应于黄色和绿色之间的边界)或大于0.95(对应于紫色与红色之间的边界),另外我还要求明亮,饱和的颜色:饱和度和值都必须高于0.7。这两个阈值过程的结果在逻辑上“或”在一起,黑白二进制图像的结果矩阵如下所示:

对HSV和单色亮度进行阈值设置后的圣诞树

您可以清楚地看到,每个图像都有一个大的像素簇,大致对应于每棵树的位置,加上一些图像还具有一些其他的小簇,它们对应于某些建筑物的窗户上的灯光,或者对应于背景场景在地平线上。下一步是使计算机识别这些是单独的群集,并使用群集成员ID号正确标记每个像素。

为此,我选择了DBSCAN。相对于其他集群算法,这里有一个很好的视觉比较,可以比较DBSCAN通常的行为。正如我之前说的,它非常适合非晶形形状。此处显示了DBSCAN的输出,其中每个集群以不同的颜色绘制:

DBSCAN集群输出

查看此结果时,需要注意一些事项。首先,DBSCAN要求用户设置一个“接近”参数以调节其行为,该参数有效地控制了一对点必须分开的程度,以便算法声明新的单独簇,而不是将测试点聚结到已经存在的集群。我将此值设置为每个图像对角线大小的0.04倍。由于图像的大小从大约VGA到大约HD 1080不等,因此这种比例相关的定义至关重要。

另一个值得注意的点是,在scikit-learn中实现的DBSCAN算法具有内存限制,对于此示例中的某些较大图像而言,这是相当大的挑战。因此,对于一些较大的图像,我实际上必须每个群集“抽取”(即,仅保留每个第3或第4像素并丢弃其他像素),以保持在此范围内。作为这种剔除处理的结果,在某些较大的图像上很难看到其余的单个稀疏像素。因此,仅出于显示目的,上述图像中的颜色编码像素已被有效地稍微“扩张”了一点,以使其更加突出。出于叙述目的,这纯粹是一种修饰操作;尽管在我的代码中有评论提到此膨胀,

识别并标记了聚类后,第三步也是最后一步很容易:我只是在每个图像中选取最大的聚类(在这种情况下,我选择根据成员像素的总数来衡量“大小”,尽管可以却很容易地使用某种类型的度量标准来衡量物理范围)并计算该集群的凸包。凸包然后成为树的边界。通过此方法计算的六个凸包在下面以红色显示:

带有计算边界的圣诞树

源代码是为Python 2.7.6编写的,它取决于numpyscipymatplotlibscikit-learn。我将其分为两部分。第一部分负责实际的图像处理:

from PIL import Image
import numpy as np
import scipy as sp
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from math import ceil, sqrt

"""
Inputs:

    rgbimg:         [M,N,3] numpy array containing (uint, 0-255) color image

    hueleftthr:     Scalar constant to select maximum allowed hue in the
                    yellow-green region

    huerightthr:    Scalar constant to select minimum allowed hue in the
                    blue-purple region

    satthr:         Scalar constant to select minimum allowed saturation

    valthr:         Scalar constant to select minimum allowed value

    monothr:        Scalar constant to select minimum allowed monochrome
                    brightness

    maxpoints:      Scalar constant maximum number of pixels to forward to
                    the DBSCAN clustering algorithm

    proxthresh:     Proximity threshold to use for DBSCAN, as a fraction of
                    the diagonal size of the image

Outputs:

    borderseg:      [K,2,2] Nested list containing K pairs of x- and y- pixel
                    values for drawing the tree border

    X:              [P,2] List of pixels that passed the threshold step

    labels:         [Q,2] List of cluster labels for points in Xslice (see
                    below)

    Xslice:         [Q,2] Reduced list of pixels to be passed to DBSCAN

"""

def findtree(rgbimg, hueleftthr=0.2, huerightthr=0.95, satthr=0.7, 
             valthr=0.7, monothr=220, maxpoints=5000, proxthresh=0.04):

    # Convert rgb image to monochrome for
    gryimg = np.asarray(Image.fromarray(rgbimg).convert('L'))
    # Convert rgb image (uint, 0-255) to hsv (float, 0.0-1.0)
    hsvimg = colors.rgb_to_hsv(rgbimg.astype(float)/255)

    # Initialize binary thresholded image
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    # Find pixels with hue<0.2 or hue>0.95 (red or yellow) and saturation/value
    # both greater than 0.7 (saturated and bright)--tends to coincide with
    # ornamental lights on trees in some of the images
    boolidx = np.logical_and(
                np.logical_and(
                  np.logical_or((hsvimg[:,:,0] < hueleftthr),
                                (hsvimg[:,:,0] > huerightthr)),
                                (hsvimg[:,:,1] > satthr)),
                                (hsvimg[:,:,2] > valthr))
    # Find pixels that meet hsv criterion
    binimg[np.where(boolidx)] = 255
    # Add pixels that meet grayscale brightness criterion
    binimg[np.where(gryimg > monothr)] = 255

    # Prepare thresholded points for DBSCAN clustering algorithm
    X = np.transpose(np.where(binimg == 255))
    Xslice = X
    nsample = len(Xslice)
    if nsample > maxpoints:
        # Make sure number of points does not exceed DBSCAN maximum capacity
        Xslice = X[range(0,nsample,int(ceil(float(nsample)/maxpoints)))]

    # Translate DBSCAN proximity threshold to units of pixels and run DBSCAN
    pixproxthr = proxthresh * sqrt(binimg.shape[0]**2 + binimg.shape[1]**2)
    db = DBSCAN(eps=pixproxthr, min_samples=10).fit(Xslice)
    labels = db.labels_.astype(int)

    # Find the largest cluster (i.e., with most points) and obtain convex hull   
    unique_labels = set(labels)
    maxclustpt = 0
    for k in unique_labels:
        class_members = [index[0] for index in np.argwhere(labels == k)]
        if len(class_members) > maxclustpt:
            points = Xslice[class_members]
            hull = sp.spatial.ConvexHull(points)
            maxclustpt = len(class_members)
            borderseg = [[points[simplex,0], points[simplex,1]] for simplex
                          in hull.simplices]

    return borderseg, X, labels, Xslice

第二部分是用户级脚本,该脚本调用第一个文件并生成上面的所有图:

#!/usr/bin/env python

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from findtree import findtree

# Image files to process
fname = ['nmzwj.png', 'aVZhC.png', '2K9EF.png',
         'YowlH.png', '2y4o5.png', 'FWhSP.png']

# Initialize figures
fgsz = (16,7)        
figthresh = plt.figure(figsize=fgsz, facecolor='w')
figclust  = plt.figure(figsize=fgsz, facecolor='w')
figcltwo  = plt.figure(figsize=fgsz, facecolor='w')
figborder = plt.figure(figsize=fgsz, facecolor='w')
figthresh.canvas.set_window_title('Thresholded HSV and Monochrome Brightness')
figclust.canvas.set_window_title('DBSCAN Clusters (Raw Pixel Output)')
figcltwo.canvas.set_window_title('DBSCAN Clusters (Slightly Dilated for Display)')
figborder.canvas.set_window_title('Trees with Borders')

for ii, name in zip(range(len(fname)), fname):
    # Open the file and convert to rgb image
    rgbimg = np.asarray(Image.open(name))

    # Get the tree borders as well as a bunch of other intermediate values
    # that will be used to illustrate how the algorithm works
    borderseg, X, labels, Xslice = findtree(rgbimg)

    # Display thresholded images
    axthresh = figthresh.add_subplot(2,3,ii+1)
    axthresh.set_xticks([])
    axthresh.set_yticks([])
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    for v, h in X:
        binimg[v,h] = 255
    axthresh.imshow(binimg, interpolation='nearest', cmap='Greys')

    # Display color-coded clusters
    axclust = figclust.add_subplot(2,3,ii+1) # Raw version
    axclust.set_xticks([])
    axclust.set_yticks([])
    axcltwo = figcltwo.add_subplot(2,3,ii+1) # Dilated slightly for display only
    axcltwo.set_xticks([])
    axcltwo.set_yticks([])
    axcltwo.imshow(binimg, interpolation='nearest', cmap='Greys')
    clustimg = np.ones(rgbimg.shape)    
    unique_labels = set(labels)
    # Generate a unique color for each cluster 
    plcol = cm.rainbow_r(np.linspace(0, 1, len(unique_labels)))
    for lbl, pix in zip(labels, Xslice):
        for col, unqlbl in zip(plcol, unique_labels):
            if lbl == unqlbl:
                # Cluster label of -1 indicates no cluster membership;
                # override default color with black
                if lbl == -1:
                    col = [0.0, 0.0, 0.0, 1.0]
                # Raw version
                for ij in range(3):
                    clustimg[pix[0],pix[1],ij] = col[ij]
                # Dilated just for display
                axcltwo.plot(pix[1], pix[0], 'o', markerfacecolor=col, 
                    markersize=1, markeredgecolor=col)
    axclust.imshow(clustimg)
    axcltwo.set_xlim(0, binimg.shape[1]-1)
    axcltwo.set_ylim(binimg.shape[0], -1)

    # Plot original images with read borders around the trees
    axborder = figborder.add_subplot(2,3,ii+1)
    axborder.set_axis_off()
    axborder.imshow(rgbimg, interpolation='nearest')
    for vseg, hseg in borderseg:
        axborder.plot(hseg, vseg, 'r-', lw=3)
    axborder.set_xlim(0, binimg.shape[1]-1)
    axborder.set_ylim(binimg.shape[0], -1)

plt.show()

I have an approach which I think is interesting and a bit different from the rest. The main difference in my approach, compared to some of the others, is in how the image segmentation step is performed–I used the DBSCAN clustering algorithm from Python’s scikit-learn; it’s optimized for finding somewhat amorphous shapes that may not necessarily have a single clear centroid.

At the top level, my approach is fairly simple and can be broken down into about 3 steps. First I apply a threshold (or actually, the logical “or” of two separate and distinct thresholds). As with many of the other answers, I assumed that the Christmas tree would be one of the brighter objects in the scene, so the first threshold is just a simple monochrome brightness test; any pixels with values above 220 on a 0-255 scale (where black is 0 and white is 255) are saved to a binary black-and-white image. The second threshold tries to look for red and yellow lights, which are particularly prominent in the trees in the upper left and lower right of the six images, and stand out well against the blue-green background which is prevalent in most of the photos. I convert the rgb image to hsv space, and require that the hue is either less than 0.2 on a 0.0-1.0 scale (corresponding roughly to the border between yellow and green) or greater than 0.95 (corresponding to the border between purple and red) and additionally I require bright, saturated colors: saturation and value must both be above 0.7. The results of the two threshold procedures are logically “or”-ed together, and the resulting matrix of black-and-white binary images is shown below:

Christmas trees, after thresholding on HSV as well as monochrome brightness

You can clearly see that each image has one large cluster of pixels roughly corresponding to the location of each tree, plus a few of the images also have some other small clusters corresponding either to lights in the windows of some of the buildings, or to a background scene on the horizon. The next step is to get the computer to recognize that these are separate clusters, and label each pixel correctly with a cluster membership ID number.

For this task I chose DBSCAN. There is a pretty good visual comparison of how DBSCAN typically behaves, relative to other clustering algorithms, available here. As I said earlier, it does well with amorphous shapes. The output of DBSCAN, with each cluster plotted in a different color, is shown here:

DBSCAN clustering output

There are a few things to be aware of when looking at this result. First is that DBSCAN requires the user to set a “proximity” parameter in order to regulate its behavior, which effectively controls how separated a pair of points must be in order for the algorithm to declare a new separate cluster rather than agglomerating a test point onto an already pre-existing cluster. I set this value to be 0.04 times the size along the diagonal of each image. Since the images vary in size from roughly VGA up to about HD 1080, this type of scale-relative definition is critical.

Another point worth noting is that the DBSCAN algorithm as it is implemented in scikit-learn has memory limits which are fairly challenging for some of the larger images in this sample. Therefore, for a few of the larger images, I actually had to “decimate” (i.e., retain only every 3rd or 4th pixel and drop the others) each cluster in order to stay within this limit. As a result of this culling process, the remaining individual sparse pixels are difficult to see on some of the larger images. Therefore, for display purposes only, the color-coded pixels in the above images have been effectively “dilated” just slightly so that they stand out better. It’s purely a cosmetic operation for the sake of the narrative; although there are comments mentioning this dilation in my code, rest assured that it has nothing to do with any calculations that actually matter.

Once the clusters are identified and labeled, the third and final step is easy: I simply take the largest cluster in each image (in this case, I chose to measure “size” in terms of the total number of member pixels, although one could have just as easily instead used some type of metric that gauges physical extent) and compute the convex hull for that cluster. The convex hull then becomes the tree border. The six convex hulls computed via this method are shown below in red:

Christmas trees with their calculated borders

The source code is written for Python 2.7.6 and it depends on numpy, scipy, matplotlib and scikit-learn. I’ve divided it into two parts. The first part is responsible for the actual image processing:

from PIL import Image
import numpy as np
import scipy as sp
import matplotlib.colors as colors
from sklearn.cluster import DBSCAN
from math import ceil, sqrt

"""
Inputs:

    rgbimg:         [M,N,3] numpy array containing (uint, 0-255) color image

    hueleftthr:     Scalar constant to select maximum allowed hue in the
                    yellow-green region

    huerightthr:    Scalar constant to select minimum allowed hue in the
                    blue-purple region

    satthr:         Scalar constant to select minimum allowed saturation

    valthr:         Scalar constant to select minimum allowed value

    monothr:        Scalar constant to select minimum allowed monochrome
                    brightness

    maxpoints:      Scalar constant maximum number of pixels to forward to
                    the DBSCAN clustering algorithm

    proxthresh:     Proximity threshold to use for DBSCAN, as a fraction of
                    the diagonal size of the image

Outputs:

    borderseg:      [K,2,2] Nested list containing K pairs of x- and y- pixel
                    values for drawing the tree border

    X:              [P,2] List of pixels that passed the threshold step

    labels:         [Q,2] List of cluster labels for points in Xslice (see
                    below)

    Xslice:         [Q,2] Reduced list of pixels to be passed to DBSCAN

"""

def findtree(rgbimg, hueleftthr=0.2, huerightthr=0.95, satthr=0.7, 
             valthr=0.7, monothr=220, maxpoints=5000, proxthresh=0.04):

    # Convert rgb image to monochrome for
    gryimg = np.asarray(Image.fromarray(rgbimg).convert('L'))
    # Convert rgb image (uint, 0-255) to hsv (float, 0.0-1.0)
    hsvimg = colors.rgb_to_hsv(rgbimg.astype(float)/255)

    # Initialize binary thresholded image
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    # Find pixels with hue<0.2 or hue>0.95 (red or yellow) and saturation/value
    # both greater than 0.7 (saturated and bright)--tends to coincide with
    # ornamental lights on trees in some of the images
    boolidx = np.logical_and(
                np.logical_and(
                  np.logical_or((hsvimg[:,:,0] < hueleftthr),
                                (hsvimg[:,:,0] > huerightthr)),
                                (hsvimg[:,:,1] > satthr)),
                                (hsvimg[:,:,2] > valthr))
    # Find pixels that meet hsv criterion
    binimg[np.where(boolidx)] = 255
    # Add pixels that meet grayscale brightness criterion
    binimg[np.where(gryimg > monothr)] = 255

    # Prepare thresholded points for DBSCAN clustering algorithm
    X = np.transpose(np.where(binimg == 255))
    Xslice = X
    nsample = len(Xslice)
    if nsample > maxpoints:
        # Make sure number of points does not exceed DBSCAN maximum capacity
        Xslice = X[range(0,nsample,int(ceil(float(nsample)/maxpoints)))]

    # Translate DBSCAN proximity threshold to units of pixels and run DBSCAN
    pixproxthr = proxthresh * sqrt(binimg.shape[0]**2 + binimg.shape[1]**2)
    db = DBSCAN(eps=pixproxthr, min_samples=10).fit(Xslice)
    labels = db.labels_.astype(int)

    # Find the largest cluster (i.e., with most points) and obtain convex hull   
    unique_labels = set(labels)
    maxclustpt = 0
    for k in unique_labels:
        class_members = [index[0] for index in np.argwhere(labels == k)]
        if len(class_members) > maxclustpt:
            points = Xslice[class_members]
            hull = sp.spatial.ConvexHull(points)
            maxclustpt = len(class_members)
            borderseg = [[points[simplex,0], points[simplex,1]] for simplex
                          in hull.simplices]

    return borderseg, X, labels, Xslice

and the second part is a user-level script which calls the first file and generates all of the plots above:

#!/usr/bin/env python

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from findtree import findtree

# Image files to process
fname = ['nmzwj.png', 'aVZhC.png', '2K9EF.png',
         'YowlH.png', '2y4o5.png', 'FWhSP.png']

# Initialize figures
fgsz = (16,7)        
figthresh = plt.figure(figsize=fgsz, facecolor='w')
figclust  = plt.figure(figsize=fgsz, facecolor='w')
figcltwo  = plt.figure(figsize=fgsz, facecolor='w')
figborder = plt.figure(figsize=fgsz, facecolor='w')
figthresh.canvas.set_window_title('Thresholded HSV and Monochrome Brightness')
figclust.canvas.set_window_title('DBSCAN Clusters (Raw Pixel Output)')
figcltwo.canvas.set_window_title('DBSCAN Clusters (Slightly Dilated for Display)')
figborder.canvas.set_window_title('Trees with Borders')

for ii, name in zip(range(len(fname)), fname):
    # Open the file and convert to rgb image
    rgbimg = np.asarray(Image.open(name))

    # Get the tree borders as well as a bunch of other intermediate values
    # that will be used to illustrate how the algorithm works
    borderseg, X, labels, Xslice = findtree(rgbimg)

    # Display thresholded images
    axthresh = figthresh.add_subplot(2,3,ii+1)
    axthresh.set_xticks([])
    axthresh.set_yticks([])
    binimg = np.zeros((rgbimg.shape[0], rgbimg.shape[1]))
    for v, h in X:
        binimg[v,h] = 255
    axthresh.imshow(binimg, interpolation='nearest', cmap='Greys')

    # Display color-coded clusters
    axclust = figclust.add_subplot(2,3,ii+1) # Raw version
    axclust.set_xticks([])
    axclust.set_yticks([])
    axcltwo = figcltwo.add_subplot(2,3,ii+1) # Dilated slightly for display only
    axcltwo.set_xticks([])
    axcltwo.set_yticks([])
    axcltwo.imshow(binimg, interpolation='nearest', cmap='Greys')
    clustimg = np.ones(rgbimg.shape)    
    unique_labels = set(labels)
    # Generate a unique color for each cluster 
    plcol = cm.rainbow_r(np.linspace(0, 1, len(unique_labels)))
    for lbl, pix in zip(labels, Xslice):
        for col, unqlbl in zip(plcol, unique_labels):
            if lbl == unqlbl:
                # Cluster label of -1 indicates no cluster membership;
                # override default color with black
                if lbl == -1:
                    col = [0.0, 0.0, 0.0, 1.0]
                # Raw version
                for ij in range(3):
                    clustimg[pix[0],pix[1],ij] = col[ij]
                # Dilated just for display
                axcltwo.plot(pix[1], pix[0], 'o', markerfacecolor=col, 
                    markersize=1, markeredgecolor=col)
    axclust.imshow(clustimg)
    axcltwo.set_xlim(0, binimg.shape[1]-1)
    axcltwo.set_ylim(binimg.shape[0], -1)

    # Plot original images with read borders around the trees
    axborder = figborder.add_subplot(2,3,ii+1)
    axborder.set_axis_off()
    axborder.imshow(rgbimg, interpolation='nearest')
    for vseg, hseg in borderseg:
        axborder.plot(hseg, vseg, 'r-', lw=3)
    axborder.set_xlim(0, binimg.shape[1]-1)
    axborder.set_ylim(binimg.shape[0], -1)

plt.show()

回答 1

编辑注释:我编辑了这篇文章,以(i)根据要求单独处理每棵树图像,(ii)考虑对象的亮度和形状,以提高结果的质量。


下面介绍一种考虑物体亮度和形状的方法。换句话说,它寻找具有三角形形状且具有明显亮度的物体。它使用Marvin图像处理框架以Java实现。

第一步是颜色阈值。此处的目的是将分析重点放在亮度很高的物体上。

输出图像:

源代码:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);
    }
}
public static void main(String[] args) {
    new ChristmasTree();
}
}

在第二步中,将图像中最亮的点放大以形成形状。该过程的结果是具有明显亮度的物体的可能形状。应用洪水填充分割,可以检测到断开的形状。

输出图像:

源代码:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=5;
    }
    else{
        blue+=5;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

如输出图像所示,检测到多种形状。在此问题中,图像中只有几个亮点。但是,实施此方法是为了处理更复杂的情况。

在下一步中,将分析每个形状。一种简单的算法可以检测形状类似于三角形的形状。该算法逐行分析对象形状。如果每个形状线的质心几乎相同(给定阈值),并且质量随着y的增加而增加,则对象具有三角形的形状。形状线的质量是该线中属于该形状的像素数。想象一下,您将对象水平切片并分析每个水平段。如果它们彼此居中,并且长度以线性模式从第一段到最后一段增加,则您可能有一个类似于三角形的对象。

源代码:

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][2];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][3] = xe;
        mass[y][4] = mc;    
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][5] > 0 &&
            Math.abs(((mass[y][0]+mass[y][6])/2)-xStart) <= 50 &&
            mass[y][7] >= (mass[yStart][8] + (y-yStart)*0.3) &&
            mass[y][9] <= (mass[yStart][10] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

最后,如下图所示,原始图像中突出显示了每个形状类似于三角形且具有明显亮度的位置(在本例中为圣诞树)。

最终输出图像:

最终源代码:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");

        // 4. Detect tree-like shapes
        int[] rect = detectTrees(trees2);

        // 5. Draw the result
        MarvinImage original = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");
        drawBoundary(trees2, original, rect);
        MarvinImageIO.saveImage(original, "./res/trees/new/tree_"+i+"_out_2.jpg");
    }
}

private void drawBoundary(MarvinImage shape, MarvinImage original, int[] rect){
    int yLines[] = new int[6];
    yLines[0] = rect[1];
    yLines[1] = rect[1]+(int)((rect[3]/5));
    yLines[2] = rect[1]+((rect[3]/5)*2);
    yLines[3] = rect[1]+((rect[3]/5)*3);
    yLines[4] = rect[1]+(int)((rect[3]/5)*4);
    yLines[5] = rect[1]+rect[3];

    List<Point> points = new ArrayList<Point>();
    for(int i=0; i<yLines.length; i++){
        boolean in=false;
        Point startPoint=null;
        Point endPoint=null;
        for(int x=rect[0]; x<rect[0]+rect[2]; x++){

            if(shape.getIntColor(x, yLines[i]) != 0xFFFFFFFF){
                if(!in){
                    if(startPoint == null){
                        startPoint = new Point(x, yLines[i]);
                    }
                }
                in = true;
            }
            else{
                if(in){
                    endPoint = new Point(x, yLines[i]);
                }
                in = false;
            }
        }

        if(endPoint == null){
            endPoint = new Point((rect[0]+rect[2])-1, yLines[i]);
        }

        points.add(startPoint);
        points.add(endPoint);
    }

    drawLine(points.get(0).x, points.get(0).y, points.get(1).x, points.get(1).y, 15, original);
    drawLine(points.get(1).x, points.get(1).y, points.get(3).x, points.get(3).y, 15, original);
    drawLine(points.get(3).x, points.get(3).y, points.get(5).x, points.get(5).y, 15, original);
    drawLine(points.get(5).x, points.get(5).y, points.get(7).x, points.get(7).y, 15, original);
    drawLine(points.get(7).x, points.get(7).y, points.get(9).x, points.get(9).y, 15, original);
    drawLine(points.get(9).x, points.get(9).y, points.get(11).x, points.get(11).y, 15, original);
    drawLine(points.get(11).x, points.get(11).y, points.get(10).x, points.get(10).y, 15, original);
    drawLine(points.get(10).x, points.get(10).y, points.get(8).x, points.get(8).y, 15, original);
    drawLine(points.get(8).x, points.get(8).y, points.get(6).x, points.get(6).y, 15, original);
    drawLine(points.get(6).x, points.get(6).y, points.get(4).x, points.get(4).y, 15, original);
    drawLine(points.get(4).x, points.get(4).y, points.get(2).x, points.get(2).y, 15, original);
    drawLine(points.get(2).x, points.get(2).y, points.get(0).x, points.get(0).y, 15, original);
}

private void drawLine(int x1, int y1, int x2, int y2, int length, MarvinImage image){
    int lx1, lx2, ly1, ly2;
    for(int i=0; i<length; i++){
        lx1 = (x1+i >= image.getWidth() ? (image.getWidth()-1)-i: x1);
        lx2 = (x2+i >= image.getWidth() ? (image.getWidth()-1)-i: x2);
        ly1 = (y1+i >= image.getHeight() ? (image.getHeight()-1)-i: y1);
        ly2 = (y2+i >= image.getHeight() ? (image.getHeight()-1)-i: y2);

        image.drawLine(lx1+i, ly1, lx2+i, ly2, Color.red);
        image.drawLine(lx1, ly1+i, lx2, ly2+i, Color.red);
    }
}

private void fillRect(MarvinImage image, int[] rect, int length){
    for(int i=0; i<length; i++){
        image.drawRect(rect[0]+i, rect[1]+i, rect[2]-(i*2), rect[3]-(i*2), Color.red);
    }
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][11];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][12] = xe;
        mass[y][13] = mc;   
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][14] > 0 &&
            Math.abs(((mass[y][0]+mass[y][15])/2)-xStart) <= 50 &&
            mass[y][16] >= (mass[yStart][17] + (y-yStart)*0.3) &&
            mass[y][18] <= (mass[yStart][19] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

private int[] getObjectRect(MarvinImage image, int color){
    int x1=-1;
    int x2=-1;
    int y1=-1;
    int y2=-1;

    for(int y=0; y<image.getHeight(); y++){
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){

                if(x1 == -1 || x < x1){
                    x1 = x;
                }
                if(x2 == -1 || x > x2){
                    x2 = x;
                }
                if(y1 == -1 || y < y1){
                    y1 = y;
                }
                if(y2 == -1 || y > y2){
                    y2 = y;
                }
            }
        }
    }

    return new int[]{x1, y1, (x2-x1), (y2-y1)};
}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=30;
    }
    else{
        blue+=30;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

这种方法的优点在于,由于它可以分析物体的形状,因此可能会与包含其他发光物体的图像一起使用。

圣诞节快乐!


编辑注2

讨论了此解决方案与其他解决方案的输出图像的相似性。实际上,它们非常相似。但是这种方法不仅可以分割对象。它还从某种意义上分析了对象的形状。它可以处理同一场景中的多个发光物体。实际上,圣诞树不必是最亮的圣诞树。我只是为了中止讨论而中止。样本中存在偏差,即仅寻找最亮的对象,便会找到树木。但是,我们真的要在这一点上停止讨论吗?在这一点上,计算机实际上能识别出类似于圣诞树的物体吗?让我们尝试缩小这一差距。

下面给出的结果只是为了阐明这一点:

输入图像

在此处输入图片说明

输出

在此处输入图片说明

EDIT NOTE: I edited this post to (i) process each tree image individually, as requested in the requirements, (ii) to consider both object brightness and shape in order to improve the quality of the result.


Below is presented an approach that takes in consideration the object brightness and shape. In other words, it seeks for objects with triangle-like shape and with significant brightness. It was implemented in Java, using Marvin image processing framework.

The first step is the color thresholding. The objective here is to focus the analysis on objects with significant brightness.

output images:

source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);
    }
}
public static void main(String[] args) {
    new ChristmasTree();
}
}

In the second step, the brightest points in the image are dilated in order to form shapes. The result of this process is the probable shape of the objects with significant brightness. Applying flood fill segmentation, disconnected shapes are detected.

output images:

source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=5;
    }
    else{
        blue+=5;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

As shown in the output image, multiple shapes was detected. In this problem, there a just a few bright points in the images. However, this approach was implemented to deal with more complex scenarios.

In the next step each shape is analyzed. A simple algorithm detects shapes with a pattern similar to a triangle. The algorithm analyze the object shape line by line. If the center of the mass of each shape line is almost the same (given a threshold) and mass increase as y increase, the object has a triangle-like shape. The mass of the shape line is the number of pixels in that line that belongs to the shape. Imagine you slice the object horizontally and analyze each horizontal segment. If they are centralized to each other and the length increase from the first segment to last one in a linear pattern, you probably has an object that resembles a triangle.

source code:

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][2];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][3] = xe;
        mass[y][4] = mc;    
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][5] > 0 &&
            Math.abs(((mass[y][0]+mass[y][6])/2)-xStart) <= 50 &&
            mass[y][7] >= (mass[yStart][8] + (y-yStart)*0.3) &&
            mass[y][9] <= (mass[yStart][10] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

Finally, the position of each shape similar to a triangle and with significant brightness, in this case a Christmas tree, is highlighted in the original image, as shown below.

final output images:

final source code:

public class ChristmasTree {

private MarvinImagePlugin fill = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.fill.boundaryFill");
private MarvinImagePlugin threshold = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.thresholding");
private MarvinImagePlugin invert = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.color.invert");
private MarvinImagePlugin dilation = MarvinPluginLoader.loadImagePlugin("org.marvinproject.image.morphological.dilation");

public ChristmasTree(){
    MarvinImage tree;

    // Iterate each image
    for(int i=1; i<=6; i++){
        tree = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");

        // 1. Threshold
        threshold.setAttribute("threshold", 200);
        threshold.process(tree.clone(), tree);

        // 2. Dilate
        invert.process(tree.clone(), tree);
        tree = MarvinColorModelConverter.rgbToBinary(tree, 127);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+i+"threshold.png");
        dilation.setAttribute("matrix", MarvinMath.getTrueMatrix(50, 50));
        dilation.process(tree.clone(), tree);
        MarvinImageIO.saveImage(tree, "./res/trees/new/tree_"+1+"_dilation.png");
        tree = MarvinColorModelConverter.binaryToRgb(tree);

        // 3. Segment shapes
        MarvinImage trees2 = tree.clone();
        fill(tree, trees2);
        MarvinImageIO.saveImage(trees2, "./res/trees/new/tree_"+i+"_fill.png");

        // 4. Detect tree-like shapes
        int[] rect = detectTrees(trees2);

        // 5. Draw the result
        MarvinImage original = MarvinImageIO.loadImage("./res/trees/tree"+i+".png");
        drawBoundary(trees2, original, rect);
        MarvinImageIO.saveImage(original, "./res/trees/new/tree_"+i+"_out_2.jpg");
    }
}

private void drawBoundary(MarvinImage shape, MarvinImage original, int[] rect){
    int yLines[] = new int[6];
    yLines[0] = rect[1];
    yLines[1] = rect[1]+(int)((rect[3]/5));
    yLines[2] = rect[1]+((rect[3]/5)*2);
    yLines[3] = rect[1]+((rect[3]/5)*3);
    yLines[4] = rect[1]+(int)((rect[3]/5)*4);
    yLines[5] = rect[1]+rect[3];

    List<Point> points = new ArrayList<Point>();
    for(int i=0; i<yLines.length; i++){
        boolean in=false;
        Point startPoint=null;
        Point endPoint=null;
        for(int x=rect[0]; x<rect[0]+rect[2]; x++){

            if(shape.getIntColor(x, yLines[i]) != 0xFFFFFFFF){
                if(!in){
                    if(startPoint == null){
                        startPoint = new Point(x, yLines[i]);
                    }
                }
                in = true;
            }
            else{
                if(in){
                    endPoint = new Point(x, yLines[i]);
                }
                in = false;
            }
        }

        if(endPoint == null){
            endPoint = new Point((rect[0]+rect[2])-1, yLines[i]);
        }

        points.add(startPoint);
        points.add(endPoint);
    }

    drawLine(points.get(0).x, points.get(0).y, points.get(1).x, points.get(1).y, 15, original);
    drawLine(points.get(1).x, points.get(1).y, points.get(3).x, points.get(3).y, 15, original);
    drawLine(points.get(3).x, points.get(3).y, points.get(5).x, points.get(5).y, 15, original);
    drawLine(points.get(5).x, points.get(5).y, points.get(7).x, points.get(7).y, 15, original);
    drawLine(points.get(7).x, points.get(7).y, points.get(9).x, points.get(9).y, 15, original);
    drawLine(points.get(9).x, points.get(9).y, points.get(11).x, points.get(11).y, 15, original);
    drawLine(points.get(11).x, points.get(11).y, points.get(10).x, points.get(10).y, 15, original);
    drawLine(points.get(10).x, points.get(10).y, points.get(8).x, points.get(8).y, 15, original);
    drawLine(points.get(8).x, points.get(8).y, points.get(6).x, points.get(6).y, 15, original);
    drawLine(points.get(6).x, points.get(6).y, points.get(4).x, points.get(4).y, 15, original);
    drawLine(points.get(4).x, points.get(4).y, points.get(2).x, points.get(2).y, 15, original);
    drawLine(points.get(2).x, points.get(2).y, points.get(0).x, points.get(0).y, 15, original);
}

private void drawLine(int x1, int y1, int x2, int y2, int length, MarvinImage image){
    int lx1, lx2, ly1, ly2;
    for(int i=0; i<length; i++){
        lx1 = (x1+i >= image.getWidth() ? (image.getWidth()-1)-i: x1);
        lx2 = (x2+i >= image.getWidth() ? (image.getWidth()-1)-i: x2);
        ly1 = (y1+i >= image.getHeight() ? (image.getHeight()-1)-i: y1);
        ly2 = (y2+i >= image.getHeight() ? (image.getHeight()-1)-i: y2);

        image.drawLine(lx1+i, ly1, lx2+i, ly2, Color.red);
        image.drawLine(lx1, ly1+i, lx2, ly2+i, Color.red);
    }
}

private void fillRect(MarvinImage image, int[] rect, int length){
    for(int i=0; i<length; i++){
        image.drawRect(rect[0]+i, rect[1]+i, rect[2]-(i*2), rect[3]-(i*2), Color.red);
    }
}

private void fill(MarvinImage imageIn, MarvinImage imageOut){
    boolean found;
    int color= 0xFFFF0000;

    while(true){
        found=false;

        Outerloop:
        for(int y=0; y<imageIn.getHeight(); y++){
            for(int x=0; x<imageIn.getWidth(); x++){
                if(imageOut.getIntComponent0(x, y) == 0){
                    fill.setAttribute("x", x);
                    fill.setAttribute("y", y);
                    fill.setAttribute("color", color);
                    fill.setAttribute("threshold", 120);
                    fill.process(imageIn, imageOut);
                    color = newColor(color);

                    found = true;
                    break Outerloop;
                }
            }
        }

        if(!found){
            break;
        }
    }

}

private int[] detectTrees(MarvinImage image){
    HashSet<Integer> analysed = new HashSet<Integer>();
    boolean found;
    while(true){
        found = false;
        for(int y=0; y<image.getHeight(); y++){
            for(int x=0; x<image.getWidth(); x++){
                int color = image.getIntColor(x, y);

                if(!analysed.contains(color)){
                    if(isTree(image, color)){
                        return getObjectRect(image, color);
                    }

                    analysed.add(color);
                    found=true;
                }
            }
        }

        if(!found){
            break;
        }
    }
    return null;
}

private boolean isTree(MarvinImage image, int color){

    int mass[][] = new int[image.getHeight()][11];
    int yStart=-1;
    int xStart=-1;
    for(int y=0; y<image.getHeight(); y++){
        int mc = 0;
        int xs=-1;
        int xe=-1;
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){
                mc++;

                if(yStart == -1){
                    yStart=y;
                    xStart=x;
                }

                if(xs == -1){
                    xs = x;
                }
                if(x > xe){
                    xe = x;
                }
            }
        }
        mass[y][0] = xs;
        mass[y][12] = xe;
        mass[y][13] = mc;   
    }

    int validLines=0;
    for(int y=0; y<image.getHeight(); y++){
        if
        ( 
            mass[y][14] > 0 &&
            Math.abs(((mass[y][0]+mass[y][15])/2)-xStart) <= 50 &&
            mass[y][16] >= (mass[yStart][17] + (y-yStart)*0.3) &&
            mass[y][18] <= (mass[yStart][19] + (y-yStart)*1.5)
        )
        {
            validLines++;
        }
    }

    if(validLines > 100){
        return true;
    }
    return false;
}

private int[] getObjectRect(MarvinImage image, int color){
    int x1=-1;
    int x2=-1;
    int y1=-1;
    int y2=-1;

    for(int y=0; y<image.getHeight(); y++){
        for(int x=0; x<image.getWidth(); x++){
            if(image.getIntColor(x, y) == color){

                if(x1 == -1 || x < x1){
                    x1 = x;
                }
                if(x2 == -1 || x > x2){
                    x2 = x;
                }
                if(y1 == -1 || y < y1){
                    y1 = y;
                }
                if(y2 == -1 || y > y2){
                    y2 = y;
                }
            }
        }
    }

    return new int[]{x1, y1, (x2-x1), (y2-y1)};
}

private int newColor(int color){
    int red = (color & 0x00FF0000) >> 16;
    int green = (color & 0x0000FF00) >> 8;
    int blue = (color & 0x000000FF);

    if(red <= green && red <= blue){
        red+=5;
    }
    else if(green <= red && green <= blue){
        green+=30;
    }
    else{
        blue+=30;
    }

    return 0xFF000000 + (red << 16) + (green << 8) + blue;
}

public static void main(String[] args) {
    new ChristmasTree();
}
}

The advantage of this approach is the fact it will probably work with images containing other luminous objects since it analyzes the object shape.

Merry Christmas!


EDIT NOTE 2

There is a discussion about the similarity of the output images of this solution and some other ones. In fact, they are very similar. But this approach does not just segment objects. It also analyzes the object shapes in some sense. It can handle multiple luminous objects in the same scene. In fact, the Christmas tree does not need to be the brightest one. I’m just abording it to enrich the discussion. There is a bias in the samples that just looking for the brightest object, you will find the trees. But, does we really want to stop the discussion at this point? At this point, how far the computer is really recognizing an object that resembles a Christmas tree? Let’s try to close this gap.

Below is presented a result just to elucidate this point:

input image

enter image description here

output

enter image description here


回答 2

这是我简单而又愚蠢的解决方案。它基于这样的假设,即树将是图片中最亮,最大的东西。

//g++ -Wall -pedantic -ansi -O2 -pipe -s -o christmas_tree christmas_tree.cpp `pkg-config --cflags --libs opencv`
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc,char *argv[])
{
    Mat original,tmp,tmp1;
    vector <vector<Point> > contours;
    Moments m;
    Rect boundrect;
    Point2f center;
    double radius, max_area=0,tmp_area=0;
    unsigned int j, k;
    int i;

    for(i = 1; i < argc; ++i)
    {
        original = imread(argv[i]);
        if(original.empty())
        {
            cerr << "Error"<<endl;
            return -1;
        }

        GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
        erode(tmp, tmp, Mat(), Point(-1, -1), 10);
        cvtColor(tmp, tmp, CV_BGR2HSV);
        inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

        dilate(original, tmp1, Mat(), Point(-1, -1), 15);
        cvtColor(tmp1, tmp1, CV_BGR2HLS);
        inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
        dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }
        tmp1 = Mat::zeros(original.size(),CV_8U);
        approxPolyDP(contours[j], contours[j], 30, true);
        drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

        m = moments(contours[j]);
        boundrect = boundingRect(contours[j]);
        center = Point2f(m.m10/m.m00, m.m01/m.m00);
        radius = (center.y - (boundrect.tl().y))/4.0*3.0;
        Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

        tmp = Mat::zeros(original.size(), CV_8U);
        rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
        circle(tmp, center, radius, Scalar(255, 255, 255), -1);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }

        approxPolyDP(contours[j], contours[j], 30, true);
        convexHull(contours[j], contours[j]);

        drawContours(original, contours, j, Scalar(0, 0, 255), 3);

        namedWindow(argv[i], CV_WINDOW_NORMAL|CV_WINDOW_KEEPRATIO|CV_GUI_EXPANDED);
        imshow(argv[i], original);

        waitKey(0);
        destroyWindow(argv[i]);
    }

    return 0;
}

第一步是检测图片中最亮的像素,但是我们必须对树木本身和反射其光的雪进行区分。在这里,我们尝试排除对颜色代码应用非常简单的滤镜的雪:

GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
erode(tmp, tmp, Mat(), Point(-1, -1), 10);
cvtColor(tmp, tmp, CV_BGR2HSV);
inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

然后我们找到每个“明亮”像素:

dilate(original, tmp1, Mat(), Point(-1, -1), 15);
cvtColor(tmp1, tmp1, CV_BGR2HLS);
inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

最后,我们将两个结果结合起来:

bitwise_and(tmp, tmp1, tmp1);

现在我们寻找最大的明亮物体:

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}
tmp1 = Mat::zeros(original.size(),CV_8U);
approxPolyDP(contours[j], contours[j], 30, true);
drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

现在我们差不多完成了,但是由于下雪还有些不完善。为了将它们剪掉,我们将使用一个圆形和一个矩形构建一个遮罩,以近似于树的形状来删除不需要的片段:

m = moments(contours[j]);
boundrect = boundingRect(contours[j]);
center = Point2f(m.m10/m.m00, m.m01/m.m00);
radius = (center.y - (boundrect.tl().y))/4.0*3.0;
Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

tmp = Mat::zeros(original.size(), CV_8U);
rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
circle(tmp, center, radius, Scalar(255, 255, 255), -1);

bitwise_and(tmp, tmp1, tmp1);

最后一步是找到我们树的轮廓并将其绘制在原始图片上。

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}

approxPolyDP(contours[j], contours[j], 30, true);
convexHull(contours[j], contours[j]);

drawContours(original, contours, j, Scalar(0, 0, 255), 3);

抱歉,目前连接不好,因此无法上传图片。稍后再尝试。

圣诞节快乐。

编辑:

这里是最终输出的一些图片:

Here is my simple and dumb solution. It is based upon the assumption that the tree will be the most bright and big thing in the picture.

//g++ -Wall -pedantic -ansi -O2 -pipe -s -o christmas_tree christmas_tree.cpp `pkg-config --cflags --libs opencv`
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>

using namespace cv;
using namespace std;

int main(int argc,char *argv[])
{
    Mat original,tmp,tmp1;
    vector <vector<Point> > contours;
    Moments m;
    Rect boundrect;
    Point2f center;
    double radius, max_area=0,tmp_area=0;
    unsigned int j, k;
    int i;

    for(i = 1; i < argc; ++i)
    {
        original = imread(argv[i]);
        if(original.empty())
        {
            cerr << "Error"<<endl;
            return -1;
        }

        GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
        erode(tmp, tmp, Mat(), Point(-1, -1), 10);
        cvtColor(tmp, tmp, CV_BGR2HSV);
        inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

        dilate(original, tmp1, Mat(), Point(-1, -1), 15);
        cvtColor(tmp1, tmp1, CV_BGR2HLS);
        inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
        dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }
        tmp1 = Mat::zeros(original.size(),CV_8U);
        approxPolyDP(contours[j], contours[j], 30, true);
        drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

        m = moments(contours[j]);
        boundrect = boundingRect(contours[j]);
        center = Point2f(m.m10/m.m00, m.m01/m.m00);
        radius = (center.y - (boundrect.tl().y))/4.0*3.0;
        Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

        tmp = Mat::zeros(original.size(), CV_8U);
        rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
        circle(tmp, center, radius, Scalar(255, 255, 255), -1);

        bitwise_and(tmp, tmp1, tmp1);

        findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
        max_area = 0;
        j = 0;
        for(k = 0; k < contours.size(); k++)
        {
            tmp_area = contourArea(contours[k]);
            if(tmp_area > max_area)
            {
                max_area = tmp_area;
                j = k;
            }
        }

        approxPolyDP(contours[j], contours[j], 30, true);
        convexHull(contours[j], contours[j]);

        drawContours(original, contours, j, Scalar(0, 0, 255), 3);

        namedWindow(argv[i], CV_WINDOW_NORMAL|CV_WINDOW_KEEPRATIO|CV_GUI_EXPANDED);
        imshow(argv[i], original);

        waitKey(0);
        destroyWindow(argv[i]);
    }

    return 0;
}

The first step is to detect the most bright pixels in the picture, but we have to do a distinction between the tree itself and the snow which reflect its light. Here we try to exclude the snow appling a really simple filter on the color codes:

GaussianBlur(original, tmp, Size(3, 3), 0, 0, BORDER_DEFAULT);
erode(tmp, tmp, Mat(), Point(-1, -1), 10);
cvtColor(tmp, tmp, CV_BGR2HSV);
inRange(tmp, Scalar(0, 0, 0), Scalar(180, 255, 200), tmp);

Then we find every “bright” pixel:

dilate(original, tmp1, Mat(), Point(-1, -1), 15);
cvtColor(tmp1, tmp1, CV_BGR2HLS);
inRange(tmp1, Scalar(0, 185, 0), Scalar(180, 255, 255), tmp1);
dilate(tmp1, tmp1, Mat(), Point(-1, -1), 10);

Finally we join the two results:

bitwise_and(tmp, tmp1, tmp1);

Now we look for the biggest bright object:

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}
tmp1 = Mat::zeros(original.size(),CV_8U);
approxPolyDP(contours[j], contours[j], 30, true);
drawContours(tmp1, contours, j, Scalar(255,255,255), CV_FILLED);

Now we have almost done, but there are still some imperfection due to the snow. To cut them off we’ll build a mask using a circle and a rectangle to approximate the shape of a tree to delete unwanted pieces:

m = moments(contours[j]);
boundrect = boundingRect(contours[j]);
center = Point2f(m.m10/m.m00, m.m01/m.m00);
radius = (center.y - (boundrect.tl().y))/4.0*3.0;
Rect heightrect(center.x-original.cols/5, boundrect.tl().y, original.cols/5*2, boundrect.size().height);

tmp = Mat::zeros(original.size(), CV_8U);
rectangle(tmp, heightrect, Scalar(255, 255, 255), -1);
circle(tmp, center, radius, Scalar(255, 255, 255), -1);

bitwise_and(tmp, tmp1, tmp1);

The last step is to find the contour of our tree and draw it on the original picture.

findContours(tmp1, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
max_area = 0;
j = 0;
for(k = 0; k < contours.size(); k++)
{
    tmp_area = contourArea(contours[k]);
    if(tmp_area > max_area)
    {
        max_area = tmp_area;
        j = k;
    }
}

approxPolyDP(contours[j], contours[j], 30, true);
convexHull(contours[j], contours[j]);

drawContours(original, contours, j, Scalar(0, 0, 255), 3);

I’m sorry but at the moment I have a bad connection so it is not possible for me to upload pictures. I’ll try to do it later.

Merry Christmas.

EDIT:

Here some pictures of the final output:


回答 3

我在Matlab R2007a中编写了代码。我用k均值粗略提取了圣诞树。我将只用一张图像显示中间结果,而用全部六个图像显示最终结果。

首先,我将RGB空间映射到Lab空间,这可以增强b通道中红色的对比度:

colorTransform = makecform('srgb2lab');
I = applycform(I, colorTransform);
L = double(I(:,:,1));
a = double(I(:,:,2));
b = double(I(:,:,3));

在此处输入图片说明

除了色彩空间中的功能外,我还使用了与邻域相关的纹理功能,而不是与每个像素本身相关。在这里,我将三个原始通道(R,G,B)的强度线性组合。我采用这种格式的原因是,图片中的圣诞树上都有红色的灯光,有时还有绿色/有时是蓝色的照明。

R=double(Irgb(:,:,1));
G=double(Irgb(:,:,2));
B=double(Irgb(:,:,3));
I0 = (3*R + max(G,B)-min(G,B))/2;

在此处输入图片说明

我在其上应用了3X3局部二进制模式I0,将中心像素用作阈值,并通过计算阈值以上的平均像素强度值与阈值以下的平均值之间的差来获得对比度。

I0_copy = zeros(size(I0));
for i = 2 : size(I0,1) - 1
    for j = 2 : size(I0,2) - 1
        tmp = I0(i-1:i+1,j-1:j+1) >= I0(i,j);
        I0_copy(i,j) = mean(mean(tmp.*I0(i-1:i+1,j-1:j+1))) - ...
            mean(mean(~tmp.*I0(i-1:i+1,j-1:j+1))); % Contrast
    end
end

在此处输入图片说明

由于我总共有4个特征,因此我将在聚类方法中选择K = 5。k-means的代码如下所示(来自Andrew Ng博士的机器学习类。我之前参加过该类,我自己在程序设计中编写了代码)。

[centroids, idx] = runkMeans(X, initial_centroids, max_iters);
mask=reshape(idx,img_size(1),img_size(2));

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [centroids, idx] = runkMeans(X, initial_centroids, ...
                                  max_iters, plot_progress)
   [m n] = size(X);
   K = size(initial_centroids, 1);
   centroids = initial_centroids;
   previous_centroids = centroids;
   idx = zeros(m, 1);

   for i=1:max_iters    
      % For each example in X, assign it to the closest centroid
      idx = findClosestCentroids(X, centroids);

      % Given the memberships, compute new centroids
      centroids = computeCentroids(X, idx, K);

   end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function idx = findClosestCentroids(X, centroids)
   K = size(centroids, 1);
   idx = zeros(size(X,1), 1);
   for xi = 1:size(X,1)
      x = X(xi, :);
      % Find closest centroid for x.
      best = Inf;
      for mui = 1:K
        mu = centroids(mui, :);
        d = dot(x - mu, x - mu);
        if d < best
           best = d;
           idx(xi) = mui;
        end
      end
   end 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function centroids = computeCentroids(X, idx, K)
   [m n] = size(X);
   centroids = zeros(K, n);
   for mui = 1:K
      centroids(mui, :) = sum(X(idx == mui, :)) / sum(idx == mui);
   end

由于该程序在我的计算机上运行非常慢,因此我只运行了3次迭代。通常,停止条件是(i)迭代时间至少为10,或(ii)质心不再变化。以我的测试而言,增加迭代次数可能会更准确地区分背景(天空和树木,天空和建筑物等),但在圣诞树提取中并未显示出明显的变化。还要注意,k均值不能不受随机质心初始化的影响,因此建议多次运行该程序进行比较。

在k均值之后,I0选择具有最大强度的标记区域。并使用边界跟踪提取边界。对我来说,最后一棵圣诞树是最难提取的圣诞树,因为该图片中的对比度不如前五棵圣诞树高。我方法中的另一个问题是,我bwboundaries在Matlab中使用函数来跟踪边界,但是有时在第3、5、6个结果中也可以看到内部边界。圣诞树上的阴暗面不仅无法与发光面聚在一起,而且还导致了许多细微的内部边界追踪(imfill改善不多)。总之我的算法还有很大的改进空间。

一些出版物指出,均值平移可能比k均值更健壮,并且许多 基于图割的算法在复杂的边界分割上也很有竞争力。我自己编写了均值漂移算法,似乎可以在没有足够光线的情况下更好地提取区域。但是均值移动有点过分,需要一些合并策略。它在我的计算机上的运行速度甚至比k-means慢得多,恐怕我不得不放弃它。我热切期待看到其他人将通过上述现代算法在此处提交出色的结果。

但是我始终相信特征选择是图像分割中的关键组成部分。选择适当的特征以使对象和背景之间的余量最大化,许多分割算法肯定会起作用。不同的算法可能会将结果从1提高到10,但是功能选择可能会将结果从0提高到1。

圣诞节快乐 !

I wrote the code in Matlab R2007a. I used k-means to roughly extract the christmas tree. I will show my intermediate result only with one image, and final results with all the six.

First, I mapped the RGB space onto Lab space, which could enhance the contrast of red in its b channel:

colorTransform = makecform('srgb2lab');
I = applycform(I, colorTransform);
L = double(I(:,:,1));
a = double(I(:,:,2));
b = double(I(:,:,3));

enter image description here

Besides the feature in color space, I also used texture feature that is relevant with the neighborhood rather than each pixel itself. Here I linearly combined the intensity from the 3 original channels (R,G,B). The reason why I formatted this way is because the christmas trees in the picture all have red lights on them, and sometimes green/sometimes blue illumination as well.

R=double(Irgb(:,:,1));
G=double(Irgb(:,:,2));
B=double(Irgb(:,:,3));
I0 = (3*R + max(G,B)-min(G,B))/2;

enter image description here

I applied a 3X3 local binary pattern on I0, used the center pixel as the threshold, and obtained the contrast by calculating the difference between the mean pixel intensity value above the threshold and the mean value below it.

I0_copy = zeros(size(I0));
for i = 2 : size(I0,1) - 1
    for j = 2 : size(I0,2) - 1
        tmp = I0(i-1:i+1,j-1:j+1) >= I0(i,j);
        I0_copy(i,j) = mean(mean(tmp.*I0(i-1:i+1,j-1:j+1))) - ...
            mean(mean(~tmp.*I0(i-1:i+1,j-1:j+1))); % Contrast
    end
end

enter image description here

Since I have 4 features in total, I would choose K=5 in my clustering method. The code for k-means are shown below (it is from Dr. Andrew Ng’s machine learning course. I took the course before, and I wrote the code myself in his programming assignment).

[centroids, idx] = runkMeans(X, initial_centroids, max_iters);
mask=reshape(idx,img_size(1),img_size(2));

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [centroids, idx] = runkMeans(X, initial_centroids, ...
                                  max_iters, plot_progress)
   [m n] = size(X);
   K = size(initial_centroids, 1);
   centroids = initial_centroids;
   previous_centroids = centroids;
   idx = zeros(m, 1);

   for i=1:max_iters    
      % For each example in X, assign it to the closest centroid
      idx = findClosestCentroids(X, centroids);

      % Given the memberships, compute new centroids
      centroids = computeCentroids(X, idx, K);

   end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function idx = findClosestCentroids(X, centroids)
   K = size(centroids, 1);
   idx = zeros(size(X,1), 1);
   for xi = 1:size(X,1)
      x = X(xi, :);
      % Find closest centroid for x.
      best = Inf;
      for mui = 1:K
        mu = centroids(mui, :);
        d = dot(x - mu, x - mu);
        if d < best
           best = d;
           idx(xi) = mui;
        end
      end
   end 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function centroids = computeCentroids(X, idx, K)
   [m n] = size(X);
   centroids = zeros(K, n);
   for mui = 1:K
      centroids(mui, :) = sum(X(idx == mui, :)) / sum(idx == mui);
   end

Since the program runs very slow in my computer, I just ran 3 iterations. Normally the stop criteria is (i) iteration time at least 10, or (ii) no change on the centroids any more. To my test, increasing the iteration may differentiate the background (sky and tree, sky and building,…) more accurately, but did not show a drastic changes in christmas tree extraction. Also note k-means is not immune to the random centroid initialization, so running the program several times to make a comparison is recommended.

After the k-means, the labelled region with the maximum intensity of I0 was chosen. And boundary tracing was used to extracted the boundaries. To me, the last christmas tree is the most difficult one to extract since the contrast in that picture is not high enough as they are in the first five. Another issue in my method is that I used bwboundaries function in Matlab to trace the boundary, but sometimes the inner boundaries are also included as you can observe in 3rd, 5th, 6th results. The dark side within the christmas trees are not only failed to be clustered with the illuminated side, but they also lead to so many tiny inner boundaries tracing (imfill doesn’t improve very much). In all my algorithm still has a lot improvement space.

Some publications indicates that mean-shift may be more robust than k-means, and many graph-cut based algorithms are also very competitive on complicated boundaries segmentation. I wrote a mean-shift algorithm myself, it seems to better extract the regions without enough light. But mean-shift is a little bit over-segmented, and some strategy of merging is needed. It ran even much slower than k-means in my computer, I am afraid I have to give it up. I eagerly look forward to see others would submit excellent results here with those modern algorithms mentioned above.

Yet I always believe the feature selection is the key component in image segmentation. With a proper feature selection that can maximize the margin between object and background, many segmentation algorithms will definitely work. Different algorithms may improve the result from 1 to 10, but the feature selection may improve it from 0 to 1.

Merry Christmas !


回答 4

这是我使用传统图像处理方法的最后一篇文章。

在这里,我以某种方式结合了其他两个建议,甚至取得了更好的结果。事实上,我看不到这些结果如何更好(尤其是当您查看该方法生成的蒙版图像时)。

该方法的核心是三个关键假设的组合:

  1. 图像在树状区域中应该有很大的波动
  2. 图像在树状区域中应具有更高的强度
  3. 背景区域应具有较低的强度,并且大部分为蓝色

考虑到这些假设,该方法的工作方式如下:

  1. 将图像转换为HSV
  2. 用LoG滤波器过滤V通道
  3. 对LoG滤波图像应用硬阈值以获得“活动”蒙版A
  4. 对V通道应用硬阈值以获得强度遮罩B
  5. 应用H通道阈值以将低强度的蓝色区域捕获到背景遮罩C中
  6. 使用AND合并蒙版以获得最终蒙版
  7. 扩展遮罩以扩大区域并连接分散的像素
  8. 消除小区域并获得最终的蒙版,该蒙版最终仅代表树

这是MATLAB中的代码(同样,脚本将所有jpg图像加载到当前文件夹中,同样,这并不是一段经过优化的代码):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
back_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to HSV colorspace
    images{end+1}=rgb2hsv(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}(:,:,3)))/thres_div);
    log_image{end+1} = imfilter( images{i}(:,:,3),fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}(:,:,3)));
    int_image{end+1} = images{i}(:,:,3) > int_thres;

    % get the most probable background regions of the image
    back_image{end+1} = images{i}(:,:,1)>(150/360) & images{i}(:,:,1)<(320/360) & images{i}(:,:,3)<0.5;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = logical( log_image{i}) & logical( int_image{i}) & ~logical( back_image{i});

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');

    % iterative enlargement of the structuring element for better connectivity
    while length(measurements{i})>14 && strel_size<(min(size(imgs{i}(:,:,1)))/2),
        strel_size = round( 1.5 * strel_size);
        dilated_image{i} = imdilate( bin_image{i}, strel('disk',strel_size));
        measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    end

    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

结果

结果

高分辨率结果仍可在这里!
在这里可以找到更多带有其他图像的实验。

This is my final post using the traditional image processing approaches…

Here I somehow combine my two other proposals, achieving even better results. As a matter of fact I cannot see how these results could be better (especially when you look at the masked images that the method produces).

At the heart of the approach is the combination of three key assumptions:

  1. Images should have high fluctuations in the tree regions
  2. Images should have higher intensity in the tree regions
  3. Background regions should have low intensity and be mostly blue-ish

With these assumptions in mind the method works as follows:

  1. Convert the images to HSV
  2. Filter the V channel with a LoG filter
  3. Apply hard thresholding on LoG filtered image to get ‘activity’ mask A
  4. Apply hard thresholding to V channel to get intensity mask B
  5. Apply H channel thresholding to capture low intensity blue-ish regions into background mask C
  6. Combine masks using AND to get the final mask
  7. Dilate the mask to enlarge regions and connect dispersed pixels
  8. Eliminate small regions and get the final mask which will eventually represent only the tree

Here is the code in MATLAB (again, the script loads all jpg images in the current folder and, again, this is far from being an optimized piece of code):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
back_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to HSV colorspace
    images{end+1}=rgb2hsv(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}(:,:,3)))/thres_div);
    log_image{end+1} = imfilter( images{i}(:,:,3),fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}(:,:,3)));
    int_image{end+1} = images{i}(:,:,3) > int_thres;

    % get the most probable background regions of the image
    back_image{end+1} = images{i}(:,:,1)>(150/360) & images{i}(:,:,1)<(320/360) & images{i}(:,:,3)<0.5;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = logical( log_image{i}) & logical( int_image{i}) & ~logical( back_image{i});

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');

    % iterative enlargement of the structuring element for better connectivity
    while length(measurements{i})>14 && strel_size<(min(size(imgs{i}(:,:,1)))/2),
        strel_size = round( 1.5 * strel_size);
        dilated_image{i} = imdilate( bin_image{i}, strel('disk',strel_size));
        measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    end

    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

Results

results

High resolution results still available here!
Even more experiments with additional images can be found here.


回答 5

我的解决步骤:

  1. 获取R通道(从RGB)-我们在此通道上进行的所有操作:

  2. 创建兴趣区(ROI)

    • 最小值为149的阈值R通道(右上图)

    • 扩大结果区域(左中图)

  3. 在计算的投资回报率中检测矿石。树有很多边缘(右中图)

    • 膨胀结果

    • 半径较大的腐蚀(左下图)

  4. 选择最大的(按区域)对象-这是结果区域

  5. ConvexHull(树是凸多边形)(右下图)

  6. 边界框(右下图-grren框)

一步步: 在此处输入图片说明

第一个结果-最简单但不是开源软件-“ Adaptive Vision Studio + Adaptive Vision Library”:这不是开源的,但原型制作起来确实非常快:

完整的圣诞树检测算法(11个块): AVL解决方案

下一步。我们需要开源解决方案。将AVL滤镜更改为OpenCV滤镜:在这里,我进行了一些更改,例如,“边缘检测”使用cvCanny滤镜,以尊重roi,我确实将区域图像与边缘图像相乘,选择了我使用的最大元素findContours + outlineArea,但是想法是相同的。

https://www.youtube.com/watch?v=sfjB3MigLH0&index=1&list=UUpSRrkMHNHiLDXgylwhWNQQ

OpenCV解决方案

我现在无法显示具有中间步骤的图像,因为我只能放置2个链接。

好的,现在我们使用openSource过滤器,但它还不是全部开源。最后一步-移植到C ++代码。我在版本2.4.4中使用了OpenCV

最终的c ++代码的结果是: 在此处输入图片说明

C ++代码也很短:

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/opencv.hpp"
#include <algorithm>
using namespace cv;

int main()
{

    string images[6] = {"..\\1.png","..\\2.png","..\\3.png","..\\4.png","..\\5.png","..\\6.png"};

    for(int i = 0; i < 6; ++i)
    {
        Mat img, thresholded, tdilated, tmp, tmp1;
        vector<Mat> channels(3);

        img = imread(images[i]);
        split(img, channels);
        threshold( channels[2], thresholded, 149, 255, THRESH_BINARY);                      //prepare ROI - threshold
        dilate( thresholded, tdilated,  getStructuringElement( MORPH_RECT, Size(22,22) ) ); //prepare ROI - dilate
        Canny( channels[2], tmp, 75, 125, 3, true );    //Canny edge detection
        multiply( tmp, tdilated, tmp1 );    // set ROI

        dilate( tmp1, tmp, getStructuringElement( MORPH_RECT, Size(20,16) ) ); // dilate
        erode( tmp, tmp1, getStructuringElement( MORPH_RECT, Size(36,36) ) ); // erode

        vector<vector<Point> > contours, contours1(1);
        vector<Point> convex;
        vector<Vec4i> hierarchy;
        findContours( tmp1, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );

        //get element of maximum area
        //int bestID = std::max_element( contours.begin(), contours.end(), 
        //  []( const vector<Point>& A, const vector<Point>& B ) { return contourArea(A) < contourArea(B); } ) - contours.begin();

            int bestID = 0;
        int bestArea = contourArea( contours[0] );
        for( int i = 1; i < contours.size(); ++i )
        {
            int area = contourArea( contours[i] );
            if( area > bestArea )
            {
                bestArea  = area;
                bestID = i;
            }
        }

        convexHull( contours[bestID], contours1[0] ); 
        drawContours( img, contours1, 0, Scalar( 100, 100, 255 ), img.rows / 100, 8, hierarchy, 0, Point() );

        imshow("image", img );
        waitKey(0);
    }


    return 0;
}

My solution steps:

  1. Get R channel (from RGB) – all operations we make on this channel:

  2. Create Region of Interest (ROI)

    • Threshold R channel with min value 149 (top right image)

    • Dilate result region (middle left image)

  3. Detect eges in computed roi. Tree has a lot of edges (middle right image)

    • Dilate result

    • Erode with bigger radius ( bottom left image)

  4. Select the biggest (by area) object – it’s the result region

  5. ConvexHull ( tree is convex polygon ) ( bottom right image )

  6. Bounding box (bottom right image – grren box )

Step by step: enter image description here

The first result – most simple but not in open source software – “Adaptive Vision Studio + Adaptive Vision Library”: This is not open source but really fast to prototype:

Whole algorithm to detect christmas tree (11 blocks): AVL solution

Next step. We want open source solution. Change AVL filters to OpenCV filters: Here I did little changes e.g. Edge Detection use cvCanny filter, to respect roi i did multiply region image with edges image, to select the biggest element i used findContours + contourArea but idea is the same.

https://www.youtube.com/watch?v=sfjB3MigLH0&index=1&list=UUpSRrkMHNHiLDXgylwhWNQQ

OpenCV solution

I can’t show images with intermediate steps now because I can put only 2 links.

Ok now we use openSource filters but it’s not still whole open source. Last step – port to c++ code. I used OpenCV in version 2.4.4

The result of final c++ code is: enter image description here

c++ code is also quite short:

#include "opencv2/highgui/highgui.hpp"
#include "opencv2/opencv.hpp"
#include <algorithm>
using namespace cv;

int main()
{

    string images[6] = {"..\\1.png","..\\2.png","..\\3.png","..\\4.png","..\\5.png","..\\6.png"};

    for(int i = 0; i < 6; ++i)
    {
        Mat img, thresholded, tdilated, tmp, tmp1;
        vector<Mat> channels(3);

        img = imread(images[i]);
        split(img, channels);
        threshold( channels[2], thresholded, 149, 255, THRESH_BINARY);                      //prepare ROI - threshold
        dilate( thresholded, tdilated,  getStructuringElement( MORPH_RECT, Size(22,22) ) ); //prepare ROI - dilate
        Canny( channels[2], tmp, 75, 125, 3, true );    //Canny edge detection
        multiply( tmp, tdilated, tmp1 );    // set ROI

        dilate( tmp1, tmp, getStructuringElement( MORPH_RECT, Size(20,16) ) ); // dilate
        erode( tmp, tmp1, getStructuringElement( MORPH_RECT, Size(36,36) ) ); // erode

        vector<vector<Point> > contours, contours1(1);
        vector<Point> convex;
        vector<Vec4i> hierarchy;
        findContours( tmp1, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0, 0) );

        //get element of maximum area
        //int bestID = std::max_element( contours.begin(), contours.end(), 
        //  []( const vector<Point>& A, const vector<Point>& B ) { return contourArea(A) < contourArea(B); } ) - contours.begin();

            int bestID = 0;
        int bestArea = contourArea( contours[0] );
        for( int i = 1; i < contours.size(); ++i )
        {
            int area = contourArea( contours[i] );
            if( area > bestArea )
            {
                bestArea  = area;
                bestID = i;
            }
        }

        convexHull( contours[bestID], contours1[0] ); 
        drawContours( img, contours1, 0, Scalar( 100, 100, 255 ), img.rows / 100, 8, hierarchy, 0, Point() );

        imshow("image", img );
        waitKey(0);
    }


    return 0;
}

回答 6

…另一种老式解决方案-完全基于HSV处理

  1. 将图像转换为HSV色彩空间
  2. 根据HSV中的启发式方法创建蒙版(请参见下文)
  3. 对面罩进行形态学扩张以连接断开的区域
  4. 丢弃小区域和水平块(记住树是垂直块)
  5. 计算边界框

一个字的启发式在HSV处理:

  1. 色调(H)在210-320度之间的所有物体被丢弃为蓝紫色,应该是在背景或不相关的区域
  2. 一切与值(V)降低40%也被丢弃,因为太暗是相关

当然,可以尝试许多其他可能性来微调这种方法。

这是实现此技巧的MATLAB代码(警告:代码远未优化!!!我使用了不推荐用于MATLAB编程的技术,只是为了能够跟踪过程中的任何内容,因此可以对其进行极大地优化):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
num=length(ims);

imgs={};
hsvs={}; 
masks={};
dilated_images={};
measurements={};
boxs={};

for i=1:num, 
    % load original image
    imgs{end+1} = imread(ims(i).name);
    flt_x_size = round(size(imgs{i},2)*0.005);
    flt_y_size = round(size(imgs{i},1)*0.005);
    flt = fspecial( 'average', max( flt_y_size, flt_x_size));
    imgs{i} = imfilter( imgs{i}, flt, 'same');
    % convert to HSV colorspace
    hsvs{end+1} = rgb2hsv(imgs{i});
    % apply a hard thresholding and binary operation to construct the mask
    masks{end+1} = medfilt2( ~(hsvs{i}(:,:,1)>(210/360) & hsvs{i}(:,:,1)<(320/360))&hsvs{i}(:,:,3)>0.4);
    % apply morphological dilation to connect distonnected components
    strel_size = round(0.03*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_images{end+1} = imdilate( masks{i}, strel('disk',strel_size));
    % do some measurements to eliminate small objects
    measurements{i} = regionprops( dilated_images{i},'Perimeter','Area','BoundingBox'); 
    for m=1:length(measurements{i})
        if (measurements{i}(m).Area < 0.02*numel( dilated_images{i})) || (measurements{i}(m).BoundingBox(3)>1.2*measurements{i}(m).BoundingBox(4))
            dilated_images{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    dilated_images{i} = dilated_images{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_images{i});
    if isempty( y)
        boxs{end+1}=[];
    else
        boxs{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end

end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(boxs{i})
        hold on;
        rr = rectangle( 'position', boxs{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_images{i},[1 1 3])));
end

结果:

在结果中,我显示了蒙版的图像和边界框。 在此处输入图片说明

…another old fashioned solution – purely based on HSV processing:

  1. Convert images to the HSV colorspace
  2. Create masks according to heuristics in the HSV (see below)
  3. Apply morphological dilation to the mask to connect disconnected areas
  4. Discard small areas and horizontal blocks (remember trees are vertical blocks)
  5. Compute the bounding box

A word on the heuristics in the HSV processing:

  1. everything with Hues (H) between 210 – 320 degrees is discarded as blue-magenta that is supposed to be in the background or in non-relevant areas
  2. everything with Values (V) lower that 40% is also discarded as being too dark to be relevant

Of course one may experiment with numerous other possibilities to fine-tune this approach…

Here is the MATLAB code to do the trick (warning: the code is far from being optimized!!! I used techniques not recommended for MATLAB programming just to be able to track anything in the process-this can be greatly optimized):

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
num=length(ims);

imgs={};
hsvs={}; 
masks={};
dilated_images={};
measurements={};
boxs={};

for i=1:num, 
    % load original image
    imgs{end+1} = imread(ims(i).name);
    flt_x_size = round(size(imgs{i},2)*0.005);
    flt_y_size = round(size(imgs{i},1)*0.005);
    flt = fspecial( 'average', max( flt_y_size, flt_x_size));
    imgs{i} = imfilter( imgs{i}, flt, 'same');
    % convert to HSV colorspace
    hsvs{end+1} = rgb2hsv(imgs{i});
    % apply a hard thresholding and binary operation to construct the mask
    masks{end+1} = medfilt2( ~(hsvs{i}(:,:,1)>(210/360) & hsvs{i}(:,:,1)<(320/360))&hsvs{i}(:,:,3)>0.4);
    % apply morphological dilation to connect distonnected components
    strel_size = round(0.03*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_images{end+1} = imdilate( masks{i}, strel('disk',strel_size));
    % do some measurements to eliminate small objects
    measurements{i} = regionprops( dilated_images{i},'Perimeter','Area','BoundingBox'); 
    for m=1:length(measurements{i})
        if (measurements{i}(m).Area < 0.02*numel( dilated_images{i})) || (measurements{i}(m).BoundingBox(3)>1.2*measurements{i}(m).BoundingBox(4))
            dilated_images{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    dilated_images{i} = dilated_images{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_images{i});
    if isempty( y)
        boxs{end+1}=[];
    else
        boxs{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end

end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(boxs{i})
        hold on;
        rr = rectangle( 'position', boxs{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_images{i},[1 1 3])));
end

Results:

In the results I show the masked image and the bounding box. enter image description here


回答 7

一些老式的图像处理方法…
这个想法是基于这样的假设,即图像在通常较暗和较平滑的背景(在某些情况下为前景)上描绘了发光的树。该点燃树面积更“有活力”,具有较高的强度
流程如下:

  1. 转换为灰度
  2. 应用LoG过滤以获取最“活跃”的区域
  3. 应用专心的阈值以获得最明亮的区域
  4. 结合之前的2个以获得初步的蒙版
  5. 应用形态学扩张来扩大区域并连接相邻的组件
  6. 根据面积缩小消除候选区域

您得到的是每个图像的二进制掩码和边界框。

这是使用这种幼稚技术的结果: 在此处输入图片说明

MATLAB上 的代码如下:该代码在包含JPG图像的文件夹上运行。加载所有图像并返回检测到的结果。

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to grayscale
    images{end+1}=rgb2gray(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}))/thres_div);
    log_image{end+1} = imfilter( images{i},fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}));
    int_image{end+1} = images{i} > int_thres;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = log_image{i} .* int_image{i};

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

Some old-fashioned image processing approach…
The idea is based on the assumption that images depict lighted trees on typically darker and smoother backgrounds (or foregrounds in some cases). The lighted tree area is more “energetic” and has higher intensity.
The process is as follows:

  1. Convert to graylevel
  2. Apply LoG filtering to get the most “active” areas
  3. Apply an intentisy thresholding to get the most bright areas
  4. Combine the previous 2 to get a preliminary mask
  5. Apply a morphological dilation to enlarge areas and connect neighboring components
  6. Eliminate small candidate areas according to their area size

What you get is a binary mask and a bounding box for each image.

Here are the results using this naive technique: enter image description here

Code on MATLAB follows: The code runs on a folder with JPG images. Loads all images and returns detected results.

% clear everything
clear;
pack;
close all;
close all hidden;
drawnow;
clc;

% initialization
ims=dir('./*.jpg');
imgs={};
images={}; 
blur_images={}; 
log_image={}; 
dilated_image={};
int_image={};
bin_image={};
measurements={};
box={};
num=length(ims);
thres_div = 3;

for i=1:num, 
    % load original image
    imgs{end+1}=imread(ims(i).name);

    % convert to grayscale
    images{end+1}=rgb2gray(imgs{i});

    % apply laplacian filtering and heuristic hard thresholding
    val_thres = (max(max(images{i}))/thres_div);
    log_image{end+1} = imfilter( images{i},fspecial('log')) > val_thres;

    % get the most bright regions of the image
    int_thres = 0.26*max(max( images{i}));
    int_image{end+1} = images{i} > int_thres;

    % compute the final binary image by combining 
    % high 'activity' with high intensity
    bin_image{end+1} = log_image{i} .* int_image{i};

    % apply morphological dilation to connect distonnected components
    strel_size = round(0.01*max(size(imgs{i})));        % structuring element for morphological dilation
    dilated_image{end+1} = imdilate( bin_image{i}, strel('disk',strel_size));

    % do some measurements to eliminate small objects
    measurements{i} = regionprops( logical( dilated_image{i}),'Area','BoundingBox');
    for m=1:length(measurements{i})
        if measurements{i}(m).Area < 0.05*numel( dilated_image{i})
            dilated_image{i}( round(measurements{i}(m).BoundingBox(2):measurements{i}(m).BoundingBox(4)+measurements{i}(m).BoundingBox(2)),...
                round(measurements{i}(m).BoundingBox(1):measurements{i}(m).BoundingBox(3)+measurements{i}(m).BoundingBox(1))) = 0;
        end
    end
    % make sure the dilated image is the same size with the original
    dilated_image{i} = dilated_image{i}(1:size(imgs{i},1),1:size(imgs{i},2));
    % compute the bounding box
    [y,x] = find( dilated_image{i});
    if isempty( y)
        box{end+1}=[];
    else
        box{end+1} = [ min(x) min(y) max(x)-min(x)+1 max(y)-min(y)+1];
    end
end 

%%% additional code to display things
for i=1:num,
    figure;
    subplot(121);
    colormap gray;
    imshow( imgs{i});
    if ~isempty(box{i})
        hold on;
        rr = rectangle( 'position', box{i});
        set( rr, 'EdgeColor', 'r');
        hold off;
    end
    subplot(122);
    imshow( imgs{i}.*uint8(repmat(dilated_image{i},[1 1 3])));
end

回答 8

使用与我所见不同的方法,我创建了一个 通过它们的灯光检测圣诞树的脚本。结果始终是对称的三角形,如有必要,还可以使用数字值,例如树的角度(“脂肪度”)。

显然,此算法的最大威胁是(大量)或树前(更大的问题,直到进一步优化)旁边的灯。编辑(添加):做不到的事情:找出是否有一棵圣诞树,在一幅图像中找到多棵圣诞树,正确检测拉斯维加斯中部的圣诞前夜树,检测严重弯曲的圣诞树,倒置或切碎…;)

不同的阶段是:

  • 计算每个像素的附加亮度(R + G + B)
  • 将每个像素上方所有8个相邻像素的值相加
  • 按此值对所有像素进行排名(最亮的优先)-我知道,不是很细微…
  • 从顶部开始选择N个,跳过距离太近的
  • 计算 这前N个(给我们大约树的中心)
  • 从中位数位置开始,在一个加宽的搜索光束中,从所选的最亮光源发出的最上面的光线(人们倾向于在最上方放置至少一个光线)
  • 从那里开始,想象线条向左和向右向下倾斜60度(圣诞节树不应该那么胖)
  • 降低60度,直到20%的最亮的光不在此三角形内
  • 在三角形的最底部找到光线,为您提供树的下部水平边框
  • 完成了

标记说明:

  • 树中心的大红十字:N个最亮的灯光的中位数
  • 从上方向上的虚线:“搜索光束”为树的顶部
  • 较小的红十字:树顶
  • 很小的红叉:所有N个最亮的灯
  • 红色三角形:D!

源代码:

<?php

ini_set('memory_limit', '1024M');

header("Content-type: image/png");

$chosenImage = 6;

switch($chosenImage){
    case 1:
        $inputImage     = imagecreatefromjpeg("nmzwj.jpg");
        break;
    case 2:
        $inputImage     = imagecreatefromjpeg("2y4o5.jpg");
        break;
    case 3:
        $inputImage     = imagecreatefromjpeg("YowlH.jpg");
        break;
    case 4:
        $inputImage     = imagecreatefromjpeg("2K9Ef.jpg");
        break;
    case 5:
        $inputImage     = imagecreatefromjpeg("aVZhC.jpg");
        break;
    case 6:
        $inputImage     = imagecreatefromjpeg("FWhSP.jpg");
        break;
    case 7:
        $inputImage     = imagecreatefromjpeg("roemerberg.jpg");
        break;
    default:
        exit();
}

// Process the loaded image

$topNspots = processImage($inputImage);

imagejpeg($inputImage);
imagedestroy($inputImage);

// Here be functions

function processImage($image) {
    $orange = imagecolorallocate($image, 220, 210, 60);
    $black = imagecolorallocate($image, 0, 0, 0);
    $red = imagecolorallocate($image, 255, 0, 0);

    $maxX = imagesx($image)-1;
    $maxY = imagesy($image)-1;

    // Parameters
    $spread = 1; // Number of pixels to each direction that will be added up
    $topPositions = 80; // Number of (brightest) lights taken into account
    $minLightDistance = round(min(array($maxX, $maxY)) / 30); // Minimum number of pixels between the brigtests lights
    $searchYperX = 5; // spread of the "search beam" from the median point to the top

    $renderStage = 3; // 1 to 3; exits the process early


    // STAGE 1
    // Calculate the brightness of each pixel (R+G+B)

    $maxBrightness = 0;
    $stage1array = array();

    for($row = 0; $row <= $maxY; $row++) {

        $stage1array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {

            $rgb = imagecolorat($image, $col, $row);
            $brightness = getBrightnessFromRgb($rgb);
            $stage1array[$row][$col] = $brightness;

            if($renderStage == 1){
                $brightnessToGrey = round($brightness / 765 * 256);
                $greyRgb = imagecolorallocate($image, $brightnessToGrey, $brightnessToGrey, $brightnessToGrey);
                imagesetpixel($image, $col, $row, $greyRgb);
            }

            if($brightness > $maxBrightness) {
                $maxBrightness = $brightness;
                if($renderStage == 1){
                    imagesetpixel($image, $col, $row, $red);
                }
            }
        }
    }
    if($renderStage == 1) {
        return;
    }


    // STAGE 2
    // Add up brightness of neighbouring pixels

    $stage2array = array();
    $maxStage2 = 0;

    for($row = 0; $row <= $maxY; $row++) {
        $stage2array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {
            if(!isset($stage2array[$row][$col])) $stage2array[$row][$col] = 0;

            // Look around the current pixel, add brightness
            for($y = $row-$spread; $y <= $row+$spread; $y++) {
                for($x = $col-$spread; $x <= $col+$spread; $x++) {

                    // Don't read values from outside the image
                    if($x >= 0 && $x <= $maxX && $y >= 0 && $y <= $maxY){
                        $stage2array[$row][$col] += $stage1array[$y][$x]+10;
                    }
                }
            }

            $stage2value = $stage2array[$row][$col];
            if($stage2value > $maxStage2) {
                $maxStage2 = $stage2value;
            }
        }
    }

    if($renderStage >= 2){
        // Paint the accumulated light, dimmed by the maximum value from stage 2
        for($row = 0; $row <= $maxY; $row++) {
            for($col = 0; $col <= $maxX; $col++) {
                $brightness = round($stage2array[$row][$col] / $maxStage2 * 255);
                $greyRgb = imagecolorallocate($image, $brightness, $brightness, $brightness);
                imagesetpixel($image, $col, $row, $greyRgb);
            }
        }
    }

    if($renderStage == 2) {
        return;
    }


    // STAGE 3

    // Create a ranking of bright spots (like "Top 20")
    $topN = array();

    for($row = 0; $row <= $maxY; $row++) {
        for($col = 0; $col <= $maxX; $col++) {

            $stage2Brightness = $stage2array[$row][$col];
            $topN[$col.":".$row] = $stage2Brightness;
        }
    }
    arsort($topN);

    $topNused = array();
    $topPositionCountdown = $topPositions;

    if($renderStage == 3){
        foreach ($topN as $key => $val) {
            if($topPositionCountdown <= 0){
                break;
            }

            $position = explode(":", $key);

            foreach($topNused as $usedPosition => $usedValue) {
                $usedPosition = explode(":", $usedPosition);
                $distance = abs($usedPosition[0] - $position[0]) + abs($usedPosition[1] - $position[1]);
                if($distance < $minLightDistance) {
                    continue 2;
                }
            }

            $topNused[$key] = $val;

            paintCrosshair($image, $position[0], $position[1], $red, 2);

            $topPositionCountdown--;

        }
    }


    // STAGE 4
    // Median of all Top N lights
    $topNxValues = array();
    $topNyValues = array();

    foreach ($topNused as $key => $val) {
        $position = explode(":", $key);
        array_push($topNxValues, $position[0]);
        array_push($topNyValues, $position[1]);
    }

    $medianXvalue = round(calculate_median($topNxValues));
    $medianYvalue = round(calculate_median($topNyValues));
    paintCrosshair($image, $medianXvalue, $medianYvalue, $red, 15);


    // STAGE 5
    // Find treetop

    $filename = 'debug.log';
    $handle = fopen($filename, "w");
    fwrite($handle, "\n\n STAGE 5");

    $treetopX = $medianXvalue;
    $treetopY = $medianYvalue;

    $searchXmin = $medianXvalue;
    $searchXmax = $medianXvalue;

    $width = 0;
    for($y = $medianYvalue; $y >= 0; $y--) {
        fwrite($handle, "\nAt y = ".$y);

        if(($y % $searchYperX) == 0) { // Modulo
            $width++;
            $searchXmin = $medianXvalue - $width;
            $searchXmax = $medianXvalue + $width;
            imagesetpixel($image, $searchXmin, $y, $red);
            imagesetpixel($image, $searchXmax, $y, $red);
        }

        foreach ($topNused as $key => $val) {
            $position = explode(":", $key); // "x:y"

            if($position[1] != $y){
                continue;
            }

            if($position[0] >= $searchXmin && $position[0] <= $searchXmax){
                $treetopX = $position[0];
                $treetopY = $y;
            }
        }

    }

    paintCrosshair($image, $treetopX, $treetopY, $red, 5);


    // STAGE 6
    // Find tree sides
    fwrite($handle, "\n\n STAGE 6");

    $treesideAngle = 60; // The extremely "fat" end of a christmas tree
    $treeBottomY = $treetopY;

    $topPositionsExcluded = 0;
    $xymultiplier = 0;
    while(($topPositionsExcluded < ($topPositions / 5)) && $treesideAngle >= 1){
        fwrite($handle, "\n\nWe're at angle ".$treesideAngle);
        $xymultiplier = sin(deg2rad($treesideAngle));
        fwrite($handle, "\nMultiplier: ".$xymultiplier);

        $topPositionsExcluded = 0;
        foreach ($topNused as $key => $val) {
            $position = explode(":", $key);
            fwrite($handle, "\nAt position ".$key);

            if($position[1] > $treeBottomY) {
                $treeBottomY = $position[1];
            }

            // Lights above the tree are outside of it, but don't matter
            if($position[1] < $treetopY){
                $topPositionsExcluded++;
                fwrite($handle, "\nTOO HIGH");
                continue;
            }

            // Top light will generate division by zero
            if($treetopY-$position[1] == 0) {
                fwrite($handle, "\nDIVISION BY ZERO");
                continue;
            }

            // Lights left end right of it are also not inside
            fwrite($handle, "\nLight position factor: ".(abs($treetopX-$position[0]) / abs($treetopY-$position[1])));
            if((abs($treetopX-$position[0]) / abs($treetopY-$position[1])) > $xymultiplier){
                $topPositionsExcluded++;
                fwrite($handle, "\n --- Outside tree ---");
            }
        }

        $treesideAngle--;
    }
    fclose($handle);

    // Paint tree's outline
    $treeHeight = abs($treetopY-$treeBottomY);
    $treeBottomLeft = 0;
    $treeBottomRight = 0;
    $previousState = false; // line has not started; assumes the tree does not "leave"^^

    for($x = 0; $x <= $maxX; $x++){
        if(abs($treetopX-$x) != 0 && abs($treetopX-$x) / $treeHeight > $xymultiplier){
            if($previousState == true){
                $treeBottomRight = $x;
                $previousState = false;
            }
            continue;
        }
        imagesetpixel($image, $x, $treeBottomY, $red);
        if($previousState == false){
            $treeBottomLeft = $x;
            $previousState = true;
        }
    }
    imageline($image, $treeBottomLeft, $treeBottomY, $treetopX, $treetopY, $red);
    imageline($image, $treeBottomRight, $treeBottomY, $treetopX, $treetopY, $red);


    // Print out some parameters

    $string = "Min dist: ".$minLightDistance." | Tree angle: ".$treesideAngle." deg | Tree bottom: ".$treeBottomY;

    $px     = (imagesx($image) - 6.5 * strlen($string)) / 2;
    imagestring($image, 2, $px, 5, $string, $orange);

    return $topN;
}

/**
 * Returns values from 0 to 765
 */
function getBrightnessFromRgb($rgb) {
    $r = ($rgb >> 16) & 0xFF;
    $g = ($rgb >> 8) & 0xFF;
    $b = $rgb & 0xFF;

    return $r+$r+$b;
}

function paintCrosshair($image, $posX, $posY, $color, $size=5) {
    for($x = $posX-$size; $x <= $posX+$size; $x++) {
        if($x>=0 && $x < imagesx($image)){
            imagesetpixel($image, $x, $posY, $color);
        }
    }
    for($y = $posY-$size; $y <= $posY+$size; $y++) {
        if($y>=0 && $y < imagesy($image)){
            imagesetpixel($image, $posX, $y, $color);
        }
    }
}

// From http://www.mdj.us/web-development/php-programming/calculating-the-median-average-values-of-an-array-with-php/
function calculate_median($arr) {
    sort($arr);
    $count = count($arr); //total numbers in array
    $middleval = floor(($count-1)/2); // find the middle value, or the lowest middle value
    if($count % 2) { // odd number, middle is the median
        $median = $arr[$middleval];
    } else { // even number, calculate avg of 2 medians
        $low = $arr[$middleval];
        $high = $arr[$middleval+1];
        $median = (($low+$high)/2);
    }
    return $median;
}


?>

图片: 左上 下中心 左下 右上方 上中 右下

奖励:来自维基百科的德国人Weihnachtsbaum,网址:http://commons.wikimedia.org/wiki/File:Weihnachtsbaum_R%C3%B6merberg.jpg 罗默贝格

Using a quite different approach from what I’ve seen, I created a script that detects christmas trees by their lights. The result ist always a symmetrical triangle, and if necessary numeric values like the angle (“fatness”) of the tree.

The biggest threat to this algorithm obviously are lights next to (in great numbers) or in front of the tree (the greater problem until further optimization). Edit (added): What it can’t do: Find out if there’s a christmas tree or not, find multiple christmas trees in one image, correctly detect a cristmas tree in the middle of Las Vegas, detect christmas trees that are heavily bent, upside-down or chopped down… ;)

The different stages are:

  • Calculate the added brightness (R+G+B) for each pixel
  • Add up this value of all 8 neighbouring pixels on top of each pixel
  • Rank all pixels by this value (brightest first) – I know, not really subtle…
  • Choose N of these, starting from the top, skipping ones that are too close
  • Calculate the of these top N (gives us the approximate center of the tree)
  • Start from the median position upwards in a widening search beam for the topmost light from the selected brightest ones (people tend to put at least one light at the very top)
  • From there, imagine lines going 60 degrees left and right downwards (christmas trees shouldn’t be that fat)
  • Decrease those 60 degrees until 20% of the brightest lights are outside this triangle
  • Find the light at the very bottom of the triangle, giving you the lower horizontal border of the tree
  • Done

Explanation of the markings:

  • Big red cross in the center of the tree: Median of the top N brightest lights
  • Dotted line from there upwards: “search beam” for the top of the tree
  • Smaller red cross: top of the tree
  • Really small red crosses: All of the top N brightest lights
  • Red triangle: D’uh!

Source code:

<?php

ini_set('memory_limit', '1024M');

header("Content-type: image/png");

$chosenImage = 6;

switch($chosenImage){
    case 1:
        $inputImage     = imagecreatefromjpeg("nmzwj.jpg");
        break;
    case 2:
        $inputImage     = imagecreatefromjpeg("2y4o5.jpg");
        break;
    case 3:
        $inputImage     = imagecreatefromjpeg("YowlH.jpg");
        break;
    case 4:
        $inputImage     = imagecreatefromjpeg("2K9Ef.jpg");
        break;
    case 5:
        $inputImage     = imagecreatefromjpeg("aVZhC.jpg");
        break;
    case 6:
        $inputImage     = imagecreatefromjpeg("FWhSP.jpg");
        break;
    case 7:
        $inputImage     = imagecreatefromjpeg("roemerberg.jpg");
        break;
    default:
        exit();
}

// Process the loaded image

$topNspots = processImage($inputImage);

imagejpeg($inputImage);
imagedestroy($inputImage);

// Here be functions

function processImage($image) {
    $orange = imagecolorallocate($image, 220, 210, 60);
    $black = imagecolorallocate($image, 0, 0, 0);
    $red = imagecolorallocate($image, 255, 0, 0);

    $maxX = imagesx($image)-1;
    $maxY = imagesy($image)-1;

    // Parameters
    $spread = 1; // Number of pixels to each direction that will be added up
    $topPositions = 80; // Number of (brightest) lights taken into account
    $minLightDistance = round(min(array($maxX, $maxY)) / 30); // Minimum number of pixels between the brigtests lights
    $searchYperX = 5; // spread of the "search beam" from the median point to the top

    $renderStage = 3; // 1 to 3; exits the process early


    // STAGE 1
    // Calculate the brightness of each pixel (R+G+B)

    $maxBrightness = 0;
    $stage1array = array();

    for($row = 0; $row <= $maxY; $row++) {

        $stage1array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {

            $rgb = imagecolorat($image, $col, $row);
            $brightness = getBrightnessFromRgb($rgb);
            $stage1array[$row][$col] = $brightness;

            if($renderStage == 1){
                $brightnessToGrey = round($brightness / 765 * 256);
                $greyRgb = imagecolorallocate($image, $brightnessToGrey, $brightnessToGrey, $brightnessToGrey);
                imagesetpixel($image, $col, $row, $greyRgb);
            }

            if($brightness > $maxBrightness) {
                $maxBrightness = $brightness;
                if($renderStage == 1){
                    imagesetpixel($image, $col, $row, $red);
                }
            }
        }
    }
    if($renderStage == 1) {
        return;
    }


    // STAGE 2
    // Add up brightness of neighbouring pixels

    $stage2array = array();
    $maxStage2 = 0;

    for($row = 0; $row <= $maxY; $row++) {
        $stage2array[$row] = array();

        for($col = 0; $col <= $maxX; $col++) {
            if(!isset($stage2array[$row][$col])) $stage2array[$row][$col] = 0;

            // Look around the current pixel, add brightness
            for($y = $row-$spread; $y <= $row+$spread; $y++) {
                for($x = $col-$spread; $x <= $col+$spread; $x++) {

                    // Don't read values from outside the image
                    if($x >= 0 && $x <= $maxX && $y >= 0 && $y <= $maxY){
                        $stage2array[$row][$col] += $stage1array[$y][$x]+10;
                    }
                }
            }

            $stage2value = $stage2array[$row][$col];
            if($stage2value > $maxStage2) {
                $maxStage2 = $stage2value;
            }
        }
    }

    if($renderStage >= 2){
        // Paint the accumulated light, dimmed by the maximum value from stage 2
        for($row = 0; $row <= $maxY; $row++) {
            for($col = 0; $col <= $maxX; $col++) {
                $brightness = round($stage2array[$row][$col] / $maxStage2 * 255);
                $greyRgb = imagecolorallocate($image, $brightness, $brightness, $brightness);
                imagesetpixel($image, $col, $row, $greyRgb);
            }
        }
    }

    if($renderStage == 2) {
        return;
    }


    // STAGE 3

    // Create a ranking of bright spots (like "Top 20")
    $topN = array();

    for($row = 0; $row <= $maxY; $row++) {
        for($col = 0; $col <= $maxX; $col++) {

            $stage2Brightness = $stage2array[$row][$col];
            $topN[$col.":".$row] = $stage2Brightness;
        }
    }
    arsort($topN);

    $topNused = array();
    $topPositionCountdown = $topPositions;

    if($renderStage == 3){
        foreach ($topN as $key => $val) {
            if($topPositionCountdown <= 0){
                break;
            }

            $position = explode(":", $key);

            foreach($topNused as $usedPosition => $usedValue) {
                $usedPosition = explode(":", $usedPosition);
                $distance = abs($usedPosition[0] - $position[0]) + abs($usedPosition[1] - $position[1]);
                if($distance < $minLightDistance) {
                    continue 2;
                }
            }

            $topNused[$key] = $val;

            paintCrosshair($image, $position[0], $position[1], $red, 2);

            $topPositionCountdown--;

        }
    }


    // STAGE 4
    // Median of all Top N lights
    $topNxValues = array();
    $topNyValues = array();

    foreach ($topNused as $key => $val) {
        $position = explode(":", $key);
        array_push($topNxValues, $position[0]);
        array_push($topNyValues, $position[1]);
    }

    $medianXvalue = round(calculate_median($topNxValues));
    $medianYvalue = round(calculate_median($topNyValues));
    paintCrosshair($image, $medianXvalue, $medianYvalue, $red, 15);


    // STAGE 5
    // Find treetop

    $filename = 'debug.log';
    $handle = fopen($filename, "w");
    fwrite($handle, "\n\n STAGE 5");

    $treetopX = $medianXvalue;
    $treetopY = $medianYvalue;

    $searchXmin = $medianXvalue;
    $searchXmax = $medianXvalue;

    $width = 0;
    for($y = $medianYvalue; $y >= 0; $y--) {
        fwrite($handle, "\nAt y = ".$y);

        if(($y % $searchYperX) == 0) { // Modulo
            $width++;
            $searchXmin = $medianXvalue - $width;
            $searchXmax = $medianXvalue + $width;
            imagesetpixel($image, $searchXmin, $y, $red);
            imagesetpixel($image, $searchXmax, $y, $red);
        }

        foreach ($topNused as $key => $val) {
            $position = explode(":", $key); // "x:y"

            if($position[1] != $y){
                continue;
            }

            if($position[0] >= $searchXmin && $position[0] <= $searchXmax){
                $treetopX = $position[0];
                $treetopY = $y;
            }
        }

    }

    paintCrosshair($image, $treetopX, $treetopY, $red, 5);


    // STAGE 6
    // Find tree sides
    fwrite($handle, "\n\n STAGE 6");

    $treesideAngle = 60; // The extremely "fat" end of a christmas tree
    $treeBottomY = $treetopY;

    $topPositionsExcluded = 0;
    $xymultiplier = 0;
    while(($topPositionsExcluded < ($topPositions / 5)) && $treesideAngle >= 1){
        fwrite($handle, "\n\nWe're at angle ".$treesideAngle);
        $xymultiplier = sin(deg2rad($treesideAngle));
        fwrite($handle, "\nMultiplier: ".$xymultiplier);

        $topPositionsExcluded = 0;
        foreach ($topNused as $key => $val) {
            $position = explode(":", $key);
            fwrite($handle, "\nAt position ".$key);

            if($position[1] > $treeBottomY) {
                $treeBottomY = $position[1];
            }

            // Lights above the tree are outside of it, but don't matter
            if($position[1] < $treetopY){
                $topPositionsExcluded++;
                fwrite($handle, "\nTOO HIGH");
                continue;
            }

            // Top light will generate division by zero
            if($treetopY-$position[1] == 0) {
                fwrite($handle, "\nDIVISION BY ZERO");
                continue;
            }

            // Lights left end right of it are also not inside
            fwrite($handle, "\nLight position factor: ".(abs($treetopX-$position[0]) / abs($treetopY-$position[1])));
            if((abs($treetopX-$position[0]) / abs($treetopY-$position[1])) > $xymultiplier){
                $topPositionsExcluded++;
                fwrite($handle, "\n --- Outside tree ---");
            }
        }

        $treesideAngle--;
    }
    fclose($handle);

    // Paint tree's outline
    $treeHeight = abs($treetopY-$treeBottomY);
    $treeBottomLeft = 0;
    $treeBottomRight = 0;
    $previousState = false; // line has not started; assumes the tree does not "leave"^^

    for($x = 0; $x <= $maxX; $x++){
        if(abs($treetopX-$x) != 0 && abs($treetopX-$x) / $treeHeight > $xymultiplier){
            if($previousState == true){
                $treeBottomRight = $x;
                $previousState = false;
            }
            continue;
        }
        imagesetpixel($image, $x, $treeBottomY, $red);
        if($previousState == false){
            $treeBottomLeft = $x;
            $previousState = true;
        }
    }
    imageline($image, $treeBottomLeft, $treeBottomY, $treetopX, $treetopY, $red);
    imageline($image, $treeBottomRight, $treeBottomY, $treetopX, $treetopY, $red);


    // Print out some parameters

    $string = "Min dist: ".$minLightDistance." | Tree angle: ".$treesideAngle." deg | Tree bottom: ".$treeBottomY;

    $px     = (imagesx($image) - 6.5 * strlen($string)) / 2;
    imagestring($image, 2, $px, 5, $string, $orange);

    return $topN;
}

/**
 * Returns values from 0 to 765
 */
function getBrightnessFromRgb($rgb) {
    $r = ($rgb >> 16) & 0xFF;
    $g = ($rgb >> 8) & 0xFF;
    $b = $rgb & 0xFF;

    return $r+$r+$b;
}

function paintCrosshair($image, $posX, $posY, $color, $size=5) {
    for($x = $posX-$size; $x <= $posX+$size; $x++) {
        if($x>=0 && $x < imagesx($image)){
            imagesetpixel($image, $x, $posY, $color);
        }
    }
    for($y = $posY-$size; $y <= $posY+$size; $y++) {
        if($y>=0 && $y < imagesy($image)){
            imagesetpixel($image, $posX, $y, $color);
        }
    }
}

// From http://www.mdj.us/web-development/php-programming/calculating-the-median-average-values-of-an-array-with-php/
function calculate_median($arr) {
    sort($arr);
    $count = count($arr); //total numbers in array
    $middleval = floor(($count-1)/2); // find the middle value, or the lowest middle value
    if($count % 2) { // odd number, middle is the median
        $median = $arr[$middleval];
    } else { // even number, calculate avg of 2 medians
        $low = $arr[$middleval];
        $high = $arr[$middleval+1];
        $median = (($low+$high)/2);
    }
    return $median;
}


?>

Images: Upper left Lower center Lower left Upper right Upper center Lower right

Bonus: A german Weihnachtsbaum, from Wikipedia Römerberg http://commons.wikimedia.org/wiki/File:Weihnachtsbaum_R%C3%B6merberg.jpg


回答 9

我将python与opencv一起使用。

我的算法是这样的:

  1. 首先,它从图像中获取红色通道
  2. 将阈值(最小值200)应用于红色通道
  3. 然后应用形态学梯度,然后执行“闭合”(先扩张,然后进行侵蚀)
  4. 然后,它在平面中找到轮廓,并选择最长的轮廓。

结果:

编码:

import numpy as np
import cv2
import copy


def findTree(image,num):
    im = cv2.imread(image)
    im = cv2.resize(im, (400,250))
    gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)
    imf = copy.deepcopy(im)

    b,g,r = cv2.split(im)
    minR = 200
    _,thresh = cv2.threshold(r,minR,255,0)
    kernel = np.ones((25,5))
    dst = cv2.morphologyEx(thresh, cv2.MORPH_GRADIENT, kernel)
    dst = cv2.morphologyEx(dst, cv2.MORPH_CLOSE, kernel)

    contours = cv2.findContours(dst,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[0]
    cv2.drawContours(im, contours,-1, (0,255,0), 1)

    maxI = 0
    for i in range(len(contours)):
        if len(contours[maxI]) < len(contours[i]):
            maxI = i

    img = copy.deepcopy(r)
    cv2.polylines(img,[contours[maxI]],True,(255,255,255),3)
    imf[:,:,2] = img

    cv2.imshow(str(num), imf)

def main():
    findTree('tree.jpg',1)
    findTree('tree2.jpg',2)
    findTree('tree3.jpg',3)
    findTree('tree4.jpg',4)
    findTree('tree5.jpg',5)
    findTree('tree6.jpg',6)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

如果将内核从(25,5)更改为(10,5),则除左下角以外的所有树都可获得更好的结果, 在此处输入图片说明

我的算法假设这棵树上有灯,在左下方的树中,顶部的灯比其他树的灯少。

I used python with opencv.

My algorithm goes like this:

  1. First it takes the red channel from the image
  2. Apply a threshold (min value 200) to the Red channel
  3. Then apply Morphological Gradient and then do a ‘Closing’ (dilation followed by Erosion)
  4. Then it finds the contours in the plane and it picks the longest contour.

The outcome:

The code:

import numpy as np
import cv2
import copy


def findTree(image,num):
    im = cv2.imread(image)
    im = cv2.resize(im, (400,250))
    gray = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY)
    imf = copy.deepcopy(im)

    b,g,r = cv2.split(im)
    minR = 200
    _,thresh = cv2.threshold(r,minR,255,0)
    kernel = np.ones((25,5))
    dst = cv2.morphologyEx(thresh, cv2.MORPH_GRADIENT, kernel)
    dst = cv2.morphologyEx(dst, cv2.MORPH_CLOSE, kernel)

    contours = cv2.findContours(dst,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)[0]
    cv2.drawContours(im, contours,-1, (0,255,0), 1)

    maxI = 0
    for i in range(len(contours)):
        if len(contours[maxI]) < len(contours[i]):
            maxI = i

    img = copy.deepcopy(r)
    cv2.polylines(img,[contours[maxI]],True,(255,255,255),3)
    imf[:,:,2] = img

    cv2.imshow(str(num), imf)

def main():
    findTree('tree.jpg',1)
    findTree('tree2.jpg',2)
    findTree('tree3.jpg',3)
    findTree('tree4.jpg',4)
    findTree('tree5.jpg',5)
    findTree('tree6.jpg',6)

    cv2.waitKey(0)
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

If I change the kernel from (25,5) to (10,5) I get nicer results on all trees but the bottom left, enter image description here

my algorithm assumes that the tree has lights on it, and in the bottom left tree, the top has less light then the others.


从Python调用C / C ++?

问题:从Python调用C / C ++?

构造与C或C ++库的Python绑定的最快方法是什么?

(如果这很重要,我正在使用Windows。)

What would be the quickest way to construct a Python binding to a C or C++ library?

(I am using Windows if this matters.)


回答 0

您应该看看Boost.Python。以下是他们网站上的简短介绍:

Boost Python库是用于连接Python和C ++的框架。它使您可以快速而无缝地将C ++类的函数和对象暴露给Python,反之亦然,而无需使用特殊工具-仅使用C ++编译器即可。它旨在非侵入性地包装C ++接口,因此您不必为了包装而完全更改C ++代码,从而使Boost.Python成为将第三方库公开给Python的理想选择。该库对高级元编程技术的使用简化了用户的语法,因此包装代码具有一种声明性接口定义语言(IDL)的外观。

You should have a look at Boost.Python. Here is the short introduction taken from their website:

The Boost Python Library is a framework for interfacing Python and C++. It allows you to quickly and seamlessly expose C++ classes functions and objects to Python, and vice-versa, using no special tools — just your C++ compiler. It is designed to wrap C++ interfaces non-intrusively, so that you should not have to change the C++ code at all in order to wrap it, making Boost.Python ideal for exposing 3rd-party libraries to Python. The library’s use of advanced metaprogramming techniques simplifies its syntax for users, so that wrapping code takes on the look of a kind of declarative interface definition language (IDL).


回答 1

ctypes模块是标准库的一部分,因此比swig更稳定和更易于使用,而swig总是会给我带来麻烦

使用ctypes时,您需要满足对python的任何编译时依赖性,并且绑定将对任何具有ctypes的python起作用,而不仅仅是针对ctypes的python。

假设您要在一个名为foo.cpp的文件中讨论一个简单的C ++示例类:

#include <iostream>

class Foo{
    public:
        void bar(){
            std::cout << "Hello" << std::endl;
        }
};

由于ctypes只能与C函数对话,因此您需要提供将其声明为extern“ C”的那些函数

extern "C" {
    Foo* Foo_new(){ return new Foo(); }
    void Foo_bar(Foo* foo){ foo->bar(); }
}

接下来,您必须将其编译为共享库

g++ -c -fPIC foo.cpp -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

最后,您必须编写python包装器(例如,在fooWrapper.py中)

from ctypes import cdll
lib = cdll.LoadLibrary('./libfoo.so')

class Foo(object):
    def __init__(self):
        self.obj = lib.Foo_new()

    def bar(self):
        lib.Foo_bar(self.obj)

一旦有了,您可以像这样称呼它

f = Foo()
f.bar() #and you will see "Hello" on the screen

ctypes module is part of the standard library, and therefore is more stable and widely available than swig, which always tended to give me problems.

With ctypes, you need to satisfy any compile time dependency on python, and your binding will work on any python that has ctypes, not just the one it was compiled against.

Suppose you have a simple C++ example class you want to talk to in a file called foo.cpp:

#include <iostream>

class Foo{
    public:
        void bar(){
            std::cout << "Hello" << std::endl;
        }
};

Since ctypes can only talk to C functions, you need to provide those declaring them as extern “C”

extern "C" {
    Foo* Foo_new(){ return new Foo(); }
    void Foo_bar(Foo* foo){ foo->bar(); }
}

Next you have to compile this to a shared library

g++ -c -fPIC foo.cpp -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

And finally you have to write your python wrapper (e.g. in fooWrapper.py)

from ctypes import cdll
lib = cdll.LoadLibrary('./libfoo.so')

class Foo(object):
    def __init__(self):
        self.obj = lib.Foo_new()

    def bar(self):
        lib.Foo_bar(self.obj)

Once you have that you can call it like

f = Foo()
f.bar() #and you will see "Hello" on the screen

回答 2

最快的方法是使用SWIG

来自SWIG 教程的示例

/* File : example.c */
int fact(int n) {
    if (n <= 1) return 1;
    else return n*fact(n-1);
}

接口文件:

/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern int fact(int n);
%}

extern int fact(int n);

在Unix上构建Python模块:

swig -python example.i
gcc -fPIC -c example.c example_wrap.c -I/usr/local/include/python2.7
gcc -shared example.o example_wrap.o -o _example.so

用法:

>>> import example
>>> example.fact(5)
120

请注意,您必须具有python-dev。同样在某些系统中,python头文件会根据您的安装方式位于/usr/include/python2.7中。

从教程中:

SWIG是一个相当完整的C ++编译器,几乎支持所有语言功能。这包括预处理,指针,类,继承,甚至C ++模板。SWIG还可以用于以目标语言将结构和类打包为代理类,从而以非常自然的方式公开基础功能。

The quickest way to do this is using SWIG.

Example from SWIG tutorial:

/* File : example.c */
int fact(int n) {
    if (n <= 1) return 1;
    else return n*fact(n-1);
}

Interface file:

/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern int fact(int n);
%}

extern int fact(int n);

Building a Python module on Unix:

swig -python example.i
gcc -fPIC -c example.c example_wrap.c -I/usr/local/include/python2.7
gcc -shared example.o example_wrap.o -o _example.so

Usage:

>>> import example
>>> example.fact(5)
120

Note that you have to have python-dev. Also in some systems python header files will be in /usr/include/python2.7 based on the way you have installed it.

From the tutorial:

SWIG is a fairly complete C++ compiler with support for nearly every language feature. This includes preprocessing, pointers, classes, inheritance, and even C++ templates. SWIG can also be used to package structures and classes into proxy classes in the target language — exposing the underlying functionality in a very natural manner.


回答 3

我从这一页的Python <-> C ++绑定开始了我的旅程,目的是链接高级数据类型(带有Python列表的多维STL向量):-)

尝试过基于ctypesboost.python的解决方案(并且不是软件工程师),当需要高级数据类型绑定时,我发现它们很复杂,而对于这种情况,我发现SWIG更加简单。

因此,该示例使用了SWIG,并且已经在Linux中进行了测试(但是SWIG可用,并且在Windows中也被广泛使用)。

目的是使C ++函数可用于Python,该函数采用2D STL向量形式的矩阵并返回每一行的平均值(作为1D STL向量)。

C ++中的代码(“ code.cpp”)如下:

#include <vector>
#include "code.h"

using namespace std;

vector<double> average (vector< vector<double> > i_matrix) {

  // Compute average of each row..
  vector <double> averages;
  for (int r = 0; r < i_matrix.size(); r++){
    double rsum = 0.0;
    double ncols= i_matrix[r].size();
    for (int c = 0; c< i_matrix[r].size(); c++){
      rsum += i_matrix[r][c];
    }
    averages.push_back(rsum/ncols);
  }
  return averages;
}

等效的标头(“ code.h”)为:

#ifndef _code
#define _code

#include <vector>

std::vector<double> average (std::vector< std::vector<double> > i_matrix);

#endif

我们首先编译C ++代码以创建目标文件:

g++ -c -fPIC code.cpp

然后,我们为C ++函数定义一个SWIG接口定义文件(“ code.i”)。

%module code
%{
#include "code.h"
%}
%include "std_vector.i"
namespace std {

  /* On a side note, the names VecDouble and VecVecdouble can be changed, but the order of first the inner vector matters! */
  %template(VecDouble) vector<double>;
  %template(VecVecdouble) vector< vector<double> >;
}

%include "code.h"

使用SWIG,我们从SWIG接口定义文件生成C ++接口源代码。

swig -c++ -python code.i

最后,我们编译生成的C ++接口源文件,并将所有内容链接在一起,以生成可由Python直接导入的共享库(“ _”很重要):

g++ -c -fPIC code_wrap.cxx  -I/usr/include/python2.7 -I/usr/lib/python2.7
g++ -shared -Wl,-soname,_code.so -o _code.so code.o code_wrap.o

现在,我们可以在Python脚本中使用该函数:

#!/usr/bin/env python

import code
a= [[3,5,7],[8,10,12]]
print a
b = code.average(a)
print "Assignment done"
print a
print b

I started my journey in the Python <-> C++ binding from this page, with the objective of linking high level data types (multidimensional STL vectors with Python lists) :-)

Having tried the solutions based on both ctypes and boost.python (and not being a software engineer) I have found them complex when high level datatypes binding is required, while I have found SWIG much more simple for such cases.

This example uses therefore SWIG, and it has been tested in Linux (but SWIG is available and is widely used in Windows too).

The objective is to make a C++ function available to Python that takes a matrix in form of a 2D STL vector and returns an average of each row (as a 1D STL vector).

The code in C++ (“code.cpp”) is as follow:

#include <vector>
#include "code.h"

using namespace std;

vector<double> average (vector< vector<double> > i_matrix) {

  // Compute average of each row..
  vector <double> averages;
  for (int r = 0; r < i_matrix.size(); r++){
    double rsum = 0.0;
    double ncols= i_matrix[r].size();
    for (int c = 0; c< i_matrix[r].size(); c++){
      rsum += i_matrix[r][c];
    }
    averages.push_back(rsum/ncols);
  }
  return averages;
}

The equivalent header (“code.h”) is:

#ifndef _code
#define _code

#include <vector>

std::vector<double> average (std::vector< std::vector<double> > i_matrix);

#endif

We first compile the C++ code to create an object file:

g++ -c -fPIC code.cpp

We then define a SWIG interface definition file (“code.i”) for our C++ functions.

%module code
%{
#include "code.h"
%}
%include "std_vector.i"
namespace std {

  /* On a side note, the names VecDouble and VecVecdouble can be changed, but the order of first the inner vector matters! */
  %template(VecDouble) vector<double>;
  %template(VecVecdouble) vector< vector<double> >;
}

%include "code.h"

Using SWIG, we generate a C++ interface source code from the SWIG interface definition file..

swig -c++ -python code.i

We finally compile the generated C++ interface source file and link everything together to generate a shared library that is directly importable by Python (the “_” matters):

g++ -c -fPIC code_wrap.cxx  -I/usr/include/python2.7 -I/usr/lib/python2.7
g++ -shared -Wl,-soname,_code.so -o _code.so code.o code_wrap.o

We can now use the function in Python scripts:

#!/usr/bin/env python

import code
a= [[3,5,7],[8,10,12]]
print a
b = code.average(a)
print "Assignment done"
print a
print b

回答 4

还有pybind11,就像Boost.Python的轻量级版本,并且与所有现代C ++编译器兼容:

https://pybind11.readthedocs.io/en/latest/

There is also pybind11, which is like a lightweight version of Boost.Python and compatible with all modern C++ compilers:

https://pybind11.readthedocs.io/en/latest/


回答 5

检查出pyrexCython。它们是类似于Python的语言,用于C / C ++和Python之间的接口。

Check out pyrex or Cython. They’re Python-like languages for interfacing between C/C++ and Python.


回答 6

对于现代C ++,请使用cppyy: http

它基于Cling / Clang / LLVM的C ++解释器。绑定是在运行时执行的,不需要其他中间语言。感谢Clang,它支持C ++ 17。

使用pip安装它:

    $ pip install cppyy

对于小型项目,只需加载相关的库和您感兴趣的标头。例如,从ctypes示例中获取代码就是该线程,但分为标头和代码部分:

    $ cat foo.h
    class Foo {
    public:
        void bar();
    };

    $ cat foo.cpp
    #include "foo.h"
    #include <iostream>

    void Foo::bar() { std::cout << "Hello" << std::endl; }

编译:

    $ g++ -c -fPIC foo.cpp -o foo.o
    $ g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

并使用它:

    $ python
    >>> import cppyy
    >>> cppyy.include("foo.h")
    >>> cppyy.load_library("foo")
    >>> from cppyy.gbl import Foo
    >>> f = Foo()
    >>> f.bar()
    Hello
    >>>

大型项目通过自动加载准备的反射信息和cmake片段来创建它们而受支持,因此安装包的用户可以简单地运行:

    $ python
    >>> import cppyy
    >>> f = cppyy.gbl.Foo()
    >>> f.bar()
    Hello
    >>>

多亏了LLVM,高级功能才得以实现,例如自动模板实例化。继续示例:

    >>> v = cppyy.gbl.std.vector[cppyy.gbl.Foo]()
    >>> v.push_back(f)
    >>> len(v)
    1
    >>> v[0].bar()
    Hello
    >>>

注意:我是cppyy的作者。

For modern C++, use cppyy: http://cppyy.readthedocs.io/en/latest/

It’s based on Cling, the C++ interpreter for Clang/LLVM. Bindings are at run-time and no additional intermediate language is necessary. Thanks to Clang, it supports C++17.

Install it using pip:

    $ pip install cppyy

For small projects, simply load the relevant library and the headers that you are interested in. E.g. take the code from the ctypes example is this thread, but split in header and code sections:

    $ cat foo.h
    class Foo {
    public:
        void bar();
    };

    $ cat foo.cpp
    #include "foo.h"
    #include <iostream>

    void Foo::bar() { std::cout << "Hello" << std::endl; }

Compile it:

    $ g++ -c -fPIC foo.cpp -o foo.o
    $ g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

and use it:

    $ python
    >>> import cppyy
    >>> cppyy.include("foo.h")
    >>> cppyy.load_library("foo")
    >>> from cppyy.gbl import Foo
    >>> f = Foo()
    >>> f.bar()
    Hello
    >>>

Large projects are supported with auto-loading of prepared reflection information and the cmake fragments to create them, so that users of installed packages can simply run:

    $ python
    >>> import cppyy
    >>> f = cppyy.gbl.Foo()
    >>> f.bar()
    Hello
    >>>

Thanks to LLVM, advanced features are possible, such as automatic template instantiation. To continue the example:

    >>> v = cppyy.gbl.std.vector[cppyy.gbl.Foo]()
    >>> v.push_back(f)
    >>> len(v)
    1
    >>> v[0].bar()
    Hello
    >>>

Note: I’m the author of cppyy.


回答 7

本文声称Python是科学家的全部需要,基本上说:首先用Python制作一切原型。然后,当您需要加快零件速度时,请使用SWIG并将其转换为C。

This paper, claiming Python to be all a scientist needs, basically says: First prototype everything in Python. Then when you need to speed a part up, use SWIG and translate this part to C.


回答 8

我从未使用过它,但是我听说过有关ctypes的好东西。如果您要在C ++中使用它,请确保通过来避开名称修饰extern "C"感谢弗洛里安·博斯(FlorianBösch)的评论。

I’ve never used it but I’ve heard good things about ctypes. If you’re trying to use it with C++, be sure to evade name mangling via extern "C". Thanks for the comment, Florian Bösch.


回答 9

我认为cffi for python是一个选择。

目的是从Python调用C代码。您应该能够在不学习第三语言的情况下进行操作:每种选择都要求您学习他们自己的语言(Cython,SWIG)或API(ctypes)。因此,我们尝试假设您了解Python和C,并尽量减少了您需要学习的API附加位。

http://cffi.readthedocs.org/en/release-0.7/

I think cffi for python can be an option.

The goal is to call C code from Python. You should be able to do so without learning a 3rd language: every alternative requires you to learn their own language (Cython, SWIG) or API (ctypes). So we tried to assume that you know Python and C and minimize the extra bits of API that you need to learn.

http://cffi.readthedocs.org/en/release-0.7/


回答 10

问题是,如果我理解正确的话,如何从Python调用C函数。然后最好的选择是Ctypes(BTW可在所有Python变体中移植)。

>>> from ctypes import *
>>> libc = cdll.msvcrt
>>> print libc.time(None)
1438069008
>>> printf = libc.printf
>>> printf("Hello, %s\n", "World!")
Hello, World!
14
>>> printf("%d bottles of beer\n", 42)
42 bottles of beer
19

有关详细指南,您可能需要参考我的博客文章

The question is how to call a C function from Python, if I understood correctly. Then the best bet are Ctypes (BTW portable across all variants of Python).

>>> from ctypes import *
>>> libc = cdll.msvcrt
>>> print libc.time(None)
1438069008
>>> printf = libc.printf
>>> printf("Hello, %s\n", "World!")
Hello, World!
14
>>> printf("%d bottles of beer\n", 42)
42 bottles of beer
19

For a detailed guide you may want to refer to my blog article.


回答 11

一份正式的Python文档包含有关使用C / C ++扩展Python的详细信息。即使不使用SWIG,它也非常简单,并且在Windows上也能很好地工作。

One of the official Python documents contains details on extending Python using C/C++. Even without the use of SWIG, it’s quite straightforward and works perfectly well on Windows.


回答 12

除非您期望编写Java包装程序,否则Cython绝对是必经之路,在这种情况下,SWIG可能更可取。

我建议使用 runcython命令行实用程序,它使使用Cython的过程非常容易。如果您需要将结构化数据传递给C ++,请查看Google的protobuf库,它非常方便。

这是我使用这两种工具的最小示例:

https://github.com/nicodjimenez/python2cpp

希望它可以是一个有用的起点。

Cython is definitely the way to go, unless you anticipate writing Java wrappers, in which case SWIG may be preferable.

I recommend using the runcython command line utility, it makes the process of using Cython extremely easy. If you need to pass structured data to C++, take a look at Google’s protobuf library, it’s very convenient.

Here is a minimal examples I made that uses both tools:

https://github.com/nicodjimenez/python2cpp

Hope it can be a useful starting point.


回答 13

首先,您应该确定自己的特定目的。上面提到有关扩展和嵌入Python解释器的官方Python文档,我可以添加一个很好的二进制扩展概述。用例可分为3类:

  • 加速器模块:运行速度比CPython中运行的等效纯Python代码更快。
  • 包装模块:将现有的C接口公开给Python代码。
  • 低级系统访问:访问CPython运行时,操作系统或底层硬件的低级功能。

为了给其他感兴趣的人提供更广阔的视野,并且由于您的最初问题有点含糊(“对C或C ++库”),我认为此信息可能对您很有趣。在上面的链接上,您可以了解使用二进制扩展名及其替代方法的缺点。

除了建议的其他答案外,如果您需要加速器模块,还可以尝试Numba。它的工作原理是“通过在导入时间,运行时或静态(使用附带的pycc工具)使用LLVM编译器基础结构生成优化的机器代码”。

First you should decide what is your particular purpose. The official Python documentation on extending and embedding the Python interpreter was mentioned above, I can add a good overview of binary extensions. The use cases can be divided into 3 categories:

  • accelerator modules: to run faster than the equivalent pure Python code runs in CPython.
  • wrapper modules: to expose existing C interfaces to Python code.
  • low level system access: to access lower level features of the CPython runtime, the operating system, or the underlying hardware.

In order to give some broader perspective for other interested and since your initial question is a bit vague (“to a C or C++ library”) I think this information might be interesting to you. On the link above you can read on disadvantages of using binary extensions and its alternatives.

Apart from the other answers suggested, if you want an accelerator module, you can try Numba. It works “by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool)”.


回答 14

我喜欢cppyy,它很容易用C ++代码扩展Python,并在需要时大大提高了性能。

它功能强大且坦率地说非常易于使用,

这是一个示例,说明如何创建numpy数组并将其传递给C ++中的类成员函数。

import cppyy
cppyy.add_include_path("include")
cppyy.include('mylib/Buffer.h')


s = cppyy.gbl.buffer.Buffer()
numpy_array = np.empty(32000, np.float64)
s.get_numpy_array(numpy_array.data, numpy_array.size)

在C ++中:

struct Buffer {
  void get_numpy_array(int beam, double *ad, int size) {
    // fill the array
  }
}

您还可以非常轻松地(使用CMake)创建Python模块,这样您就可以避免一直重新编译C ++代码。

I love cppyy, it makes it very easy to extend Python with C++ code, dramatically increasing performance when needed.

It is powerful and frankly very simple to use,

here it is an example of how you can create a numpy array and pass it to a class member function in C++.

import cppyy
cppyy.add_include_path("include")
cppyy.include('mylib/Buffer.h')


s = cppyy.gbl.buffer.Buffer()
numpy_array = np.empty(32000, np.float64)
s.get_numpy_array(numpy_array.data, numpy_array.size)

in C++ :

struct Buffer {
  void get_numpy_array(int beam, double *ad, int size) {
    // fill the array
  }
}

You can also create a Python module very easily (with CMake), this way you will avoid recompile the C++ code all the times.


回答 15

pybind11最小可运行示例

pybind11之前在https://stackoverflow.com/a/38542539/895245中提到过,但是我想在这里给出一个具体的用法示例以及有关实现的更多讨论。

总而言之,我强烈推荐pybind11,因为它确实很容易使用:您只需包含一个标头,然后pybind11使用模板魔术检查要公开给Python的C ++类并透明地进行。

这种模板魔术的缺点是,它会立即减慢编译速度,从而会给使用pybind11的任何文件增加几秒钟的时间,例如,请参见对此问题进行的调查PyTorch同意

这是一个最小的可运行示例,使您了解pybind11的出色程度:

class_test.cpp

#include <string>

#include <pybind11/pybind11.h>

struct ClassTest {
    ClassTest(const std::string &name) : name(name) { }
    void setName(const std::string &name_) { name = name_; }
    const std::string &getName() const { return name; }
    std::string name;
};

namespace py = pybind11;

PYBIND11_PLUGIN(class_test) {
    py::module m("my_module", "pybind11 example plugin");
    py::class_<ClassTest>(m, "ClassTest")
        .def(py::init<const std::string &>())
        .def("setName", &ClassTest::setName)
        .def("getName", &ClassTest::getName)
        .def_readwrite("name", &ClassTest::name);
    return m.ptr();
}

class_test_main.py

#!/usr/bin/env python3

import class_test

my_class_test = class_test.ClassTest("abc");
print(my_class_test.getName())
my_class_test.setName("012")
print(my_class_test.getName())
assert(my_class_test.getName() == my_class_test.name)

编译并运行:

#!/usr/bin/env bash
set -eux
g++ `python3-config --cflags` -shared -std=c++11 -fPIC class_test.cpp \
  -o class_test`python3-config --extension-suffix` `python3-config --libs`
./class_test_main.py

此示例说明pybind11如何使您轻松地将ClassTestC ++类公开给Python!编译会产生一个名为的文件class_test.cpython-36m-x86_64-linux-gnu.so,该文件会class_test_main.py自动作为文件的定义点class_test本地定义的模块。

也许只有在您尝试使用本机Python API手动执行相同操作时,才会意识到它的强大程度,例如,请参见以下示例,该示例包含大约10倍的代码:https : //github.com /cirosantilli/python-cheat/blob/4f676f62e87810582ad53b2fb426b74eae52aad5/py_from_c/pure.c在该示例上,您可以看到C代码如何痛苦地,明确地定义Python类及其所包含的所有信息(成员,方法,其他信息)。元数据…)。也可以看看:

pybind11声称与https://stackoverflow.com/a/145436/895245Boost.Python所提到的相似,但因其摆脱了Boost项目中的膨胀而变得更加微不足道:

pybind11是一个轻量级的仅标头的库,它公开了Python中的C ++类型,反之亦然,主要是创建现有C ++代码的Python绑定。它的目标和语法类似于David Abrahams出色的Boost.Python库:通过使用编译时自省来推断类型信息,从而最大程度地减少了传统扩展模块中的样板代码。

Boost.Python的主要问题以及创建类似项目的原因是Boost。Boost是庞大而复杂的实用程序套件,可与几乎所有现有的C ++编译器一起使用。这种兼容性有其代价:奥秘的模板技巧和变通办法对于支持最早的和最新的编译器标本是必需的。现在,与C ++ 11兼容的编译器已广泛可用,这种繁琐的机制已变得过大且不必要。

可以将此库视为Boost.Python的小型独立版本,其中剥离了与绑定生成无关的所有内容。没有注释,核心头文件仅需要约4K行代码,并依赖于Python(2.7或3.x,或PyPy2.7> = 5.7)和C ++标准库。由于某些新的C ++ 11语言功能(特别是:元组,lambda函数和可变参数模板),因此可以实现这种紧凑的实现。自创建以来,该库在许多方面都超越了Boost.Python,从而在许多常见情况下大大简化了绑定代码。

pybind11还是当前Microsoft Python C绑定文档中突出显示的唯一非本地替代方法,网址为:https ://docs.microsoft.com/zh-cn/visualstudio/python/working-with-c-cpp-python-in- visual-studio?view = vs-2019存档)。

已在Ubuntu 18.04,pybind11 2.0.1,Python 3.6.8,GCC 7.4.0上进行了测试。

pybind11 minimal runnable example

pybind11 was previously mentioned at https://stackoverflow.com/a/38542539/895245 but I would like to give here a concrete usage example and some further discussion about implementation.

All and all, I highly recommend pybind11 because it is really easy to use: you just include a header and then pybind11 uses template magic to inspect the C++ class you want to expose to Python and does that transparently.

The downside of this template magic is that it slows down compilation immediately adding a few seconds to any file that uses pybind11, see for example the investigation done on this issue. PyTorch agrees.

Here is a minimal runnable example to give you a feel of how awesome pybind11 is:

class_test.cpp

#include <string>

#include <pybind11/pybind11.h>

struct ClassTest {
    ClassTest(const std::string &name) : name(name) { }
    void setName(const std::string &name_) { name = name_; }
    const std::string &getName() const { return name; }
    std::string name;
};

namespace py = pybind11;

PYBIND11_PLUGIN(class_test) {
    py::module m("my_module", "pybind11 example plugin");
    py::class_<ClassTest>(m, "ClassTest")
        .def(py::init<const std::string &>())
        .def("setName", &ClassTest::setName)
        .def("getName", &ClassTest::getName)
        .def_readwrite("name", &ClassTest::name);
    return m.ptr();
}

class_test_main.py

#!/usr/bin/env python3

import class_test

my_class_test = class_test.ClassTest("abc");
print(my_class_test.getName())
my_class_test.setName("012")
print(my_class_test.getName())
assert(my_class_test.getName() == my_class_test.name)

Compile and run:

#!/usr/bin/env bash
set -eux
g++ `python3-config --cflags` -shared -std=c++11 -fPIC class_test.cpp \
  -o class_test`python3-config --extension-suffix` `python3-config --libs`
./class_test_main.py

This example shows how pybind11 allows you to effortlessly expose the ClassTest C++ class to Python! Compilation produces a file named class_test.cpython-36m-x86_64-linux-gnu.so which class_test_main.py automatically picks up as the definition point for the class_test natively defined module.

Perhaps the realization of how awesome this is only sinks in if you try to do the same thing by hand with the native Python API, see for example this example of doing that, which has about 10x more code: https://github.com/cirosantilli/python-cheat/blob/4f676f62e87810582ad53b2fb426b74eae52aad5/py_from_c/pure.c On that example you can see how the C code has to painfully and explicitly define the Python class bit by bit with all the information it contains (members, methods, further metadata…). See also:

pybind11 claims to be similar to Boost.Python which was mentioned at https://stackoverflow.com/a/145436/895245 but more minimal because it is freed from the bloat of being inside the Boost project:

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code. Its goals and syntax are similar to the excellent Boost.Python library by David Abrahams: to minimize boilerplate code in traditional extension modules by inferring type information using compile-time introspection.

The main issue with Boost.Python—and the reason for creating such a similar project—is Boost. Boost is an enormously large and complex suite of utility libraries that works with almost every C++ compiler in existence. This compatibility has its cost: arcane template tricks and workarounds are necessary to support the oldest and buggiest of compiler specimens. Now that C++11-compatible compilers are widely available, this heavy machinery has become an excessively large and unnecessary dependency.

Think of this library as a tiny self-contained version of Boost.Python with everything stripped away that isn’t relevant for binding generation. Without comments, the core header files only require ~4K lines of code and depend on Python (2.7 or 3.x, or PyPy2.7 >= 5.7) and the C++ standard library. This compact implementation was possible thanks to some of the new C++11 language features (specifically: tuples, lambda functions and variadic templates). Since its creation, this library has grown beyond Boost.Python in many ways, leading to dramatically simpler binding code in many common situations.

pybind11 is also the only non-native alternative hightlighted by the current Microsoft Python C binding documentation at: https://docs.microsoft.com/en-us/visualstudio/python/working-with-c-cpp-python-in-visual-studio?view=vs-2019 (archive).

Tested on Ubuntu 18.04, pybind11 2.0.1, Python 3.6.8, GCC 7.4.0.


与Project Euler的速度比较:C,Python,Erlang,Haskell

问题:与Project Euler的速度比较:C,Python,Erlang,Haskell

我已经采取了问题#12项目欧拉作为编程锻炼和比较我的(肯定不是最优的)实现在C,Python和Erlang和Haskell的。为了获得更高的执行时间,我搜索的第一个三角形数的除数大于1000,而不是原始问题中所述的500。

结果如下:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

带有PyPy的Python:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

摘要:

  • C:100%
  • Python:692%(使用PyPy时为118%)
  • Erlang:436%(135%感谢RichardC)
  • 哈斯克尔:1421%

我认为C具有很大的优势,因为它使用long进行计算,而不使用其他任意三个整数。另外,它不需要先加载运行时(其他加载项吗?)。

问题1: Erlang,Python和Haskell是否会由于使用任意长度的整数而导致速度降低,或者只要值小于,它们是否会失去速度MAXINT

问题2: 为什么Haskell这么慢?是否有编译器标志可以使您刹车?或者它是我的实现?(后者很有可能是因为Haskell是一本对我有七个印章的书。)

问题3: 您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

编辑:

问题4: 我的功能实现是否允许LCO(最后一次调用优化,又称为尾部递归消除),从而避免在调用堆栈上添加不必要的帧?

尽管我不得不承认我的Haskell和Erlang知识非常有限,但我确实试图在四种语言中尽可能地实现相同的算法。


使用的源代码:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

I have taken Problem #12 from Project Euler as a programming exercise and to compare my (surely not optimal) implementations in C, Python, Erlang and Haskell. In order to get some higher execution times, I search for the first triangle number with more than 1000 divisors instead of 500 as stated in the original problem.

The result is the following:

C:

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python:

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

Python with PyPy:

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang:

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell:

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

Summary:

  • C: 100%
  • Python: 692% (118% with PyPy)
  • Erlang: 436% (135% thanks to RichardC)
  • Haskell: 1421%

I suppose that C has a big advantage as it uses long for the calculations and not arbitrary length integers as the other three. Also it doesn’t need to load a runtime first (Do the others?).

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question 2: Why is Haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as Haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

EDIT:

Question 4: Do my functional implementations permit LCO (last call optimization, a.k.a tail recursion elimination) and hence avoid adding unnecessary frames onto the call stack?

I really tried to implement the same algorithm as similar as possible in the four languages, although I have to admit that my Haskell and Erlang knowledge is very limited.


Source codes used:

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

回答 0

使用GHC 7.0.3gcc 4.4.6Linux 2.6.29一个x86_64的Core2双核(2.5GHz的)机器上,编译使用ghc -O2 -fllvm -fforce-recomp用于Haskell和gcc -O3 -lm为C.

  • 您的C例程运行8.4秒(比您的运行速度快,原因可能是-O3
  • Haskell解决方案可在36秒内运行(由于显示-O2标记)
  • 您的factorCount'代码未明确输入且默认为Integer(感谢Daniel在这里纠正我的误诊!)。使用给出显式类型签名(无论如何都是标准做法)Int,时间更改为11.1秒
  • factorCount'你不必要的呼唤fromIntegral。修复不会导致任何变化(编译器很聪明,对您来说很幸运)。
  • modrem更快更充分的地方使用了。这将时间更改为8.5秒
  • factorCount'不断应用两个永不更改的自变量(numbersqrt)。工人/包装工人的转型为我们提供了:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

没错,7.95秒。始终比C解决方案快半秒。没有-fllvm标记,我仍然会收到消息8.182 seconds,因此NCG后端在这种情况下也表现良好。

结论:Haskell很棒。

结果代码

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

编辑:现在我们已经探索了,让我们解决问题

问题1:erlang,python和haskell是否由于使用任意长度的整数而失去速度,还是只要它们的值小于MAXINT,它们就不会丢失吗?

在Haskell中,使用Integer速度慢于Int但慢多少取决于所执行的计算。幸运的是(对于64位计算机)Int就足够了。出于可移植性的考虑,您可能应该重写我的代码以使用Int64Word64(C不是唯一带有的语言long)。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

这就是我上面回答的。答案是

  • 0)通过使用优化 -O2
  • 1)尽可能使用快速(特别是:不可装箱)类型
  • 2)rem不是mod(经常被遗忘的优化),并且
  • 3)worker / wrapper转换(也许是最常见的优化)。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,这不是问题。做得好,很高兴您考虑了这一点。

Using GHC 7.0.3, gcc 4.4.6, Linux 2.6.29 on an x86_64 Core2 Duo (2.5GHz) machine, compiling using ghc -O2 -fllvm -fforce-recomp for Haskell and gcc -O3 -lm for C.

  • Your C routine runs in 8.4 seconds (faster than your run probably because of -O3)
  • The Haskell solution runs in 36 seconds (due to the -O2 flag)
  • Your factorCount' code isn’t explicitly typed and defaulting to Integer (thanks to Daniel for correcting my misdiagnosis here!). Giving an explicit type signature (which is standard practice anyway) using Int and the time changes to 11.1 seconds
  • in factorCount' you have needlessly called fromIntegral. A fix results in no change though (the compiler is smart, lucky for you).
  • You used mod where rem is faster and sufficient. This changes the time to 8.5 seconds.
  • factorCount' is constantly applying two extra arguments that never change (number, sqrt). A worker/wrapper transformation gives us:
 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s  

That’s right, 7.95 seconds. Consistently half a second faster than the C solution. Without the -fllvm flag I’m still getting 8.182 seconds, so the NCG backend is doing well in this case too.

Conclusion: Haskell is awesome.

Resulting Code

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

EDIT: So now that we’ve explored that, lets address the questions

Question 1: Do erlang, python and haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

In Haskell, using Integer is slower than Int but how much slower depends on the computations performed. Luckily (for 64 bit machines) Int is sufficient. For portability sake you should probably rewrite my code to use Int64 or Word64 (C isn’t the only language with a long).

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

That was what I answered above. The answer was

  • 0) Use optimization via -O2
  • 1) Use fast (notably: unbox-able) types when possible
  • 2) rem not mod (a frequently forgotten optimization) and
  • 3) worker/wrapper transformation (perhaps the most common optimization).

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, that wasn’t the issue. Good work and glad you considered this.


回答 1

Erlang实现存在一些问题。作为以下内容的基准,我测得的未经修改的Erlang程序的执行时间为47.6秒,而C代码为12.7秒。

如果要运行计算密集型的Erlang代码,您应该做的第一件事就是使用本机代码。使用进行编译,erlc +native euler12时间缩短至41.3秒。但是,这比这种代码的本机编译要低得多(仅为15%),问题是您使用-compile(export_all)。这对于实验很有用,但是所有功能都可以从外部访问的事实导致本机编译器非常保守。(普通的BEAM仿真器不会受到太大影响。)用替换该声明可以-export([solve/0]).提供更好的加速:31.5秒(比基线快将近35%)。

但是代码本身有一个问题:对于factorCount循环中的每个迭代,您都可以执行以下测试:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C代码不会执行此操作。通常,在同一代码的不同实现之间进行公平的比较可能比较棘手,特别是如果算法是数字的,因为您需要确保它们实际上在做相同的事情。尽管某个实现最终会达到相同的结果,但由于某个地方的某些类型转换而导致的一种实现中的轻微舍入错误可能会导致其执行的迭代次数要多于另一个实现。

为了消除这种可能的错误源(并消除每次迭代中的额外测试),我重新编写了factorCount函数,如下所示,该函数非常类似于C代码:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

此重写(no export_all)和本机编译为我提供了以下运行时间:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

与C代码相比,还算不错:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

考虑到Erlang根本不适合编写数字代码,在这样的程序上,它仅比C慢50%。

最后,关于您的问题:

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

是的,有点。在Erlang中,没有办法说“结合使用32/64位算术”,因此,除非编译器可以证明整数的某些界限(通常不能),否则它必须检查所有计算以查看是否可以将它们放入单个标记的单词中,或者是否必须将它们转换为堆分配的bignum。即使在运行时实际上没有使用大数,也必须执行这些检查。另一方面,这意味着您知道,如果您突然给它比以前更大的输入,该算法将永远不会因为意外的整数环绕而失败。

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

是的,您的Erlang代码在上次通话优化方面是正确的。

There are some problems with the Erlang implementation. As baseline for the following, my measured execution time for your unmodified Erlang program was 47.6 seconds, compared to 12.7 seconds for the C code.

The first thing you should do if you want to run computationally intensive Erlang code is to use native code. Compiling with erlc +native euler12 got the time down to 41.3 seconds. This is however a much lower speedup (just 15%) than expected from native compilation on this kind of code, and the problem is your use of -compile(export_all). This is useful for experimentation, but the fact that all functions are potentially reachable from the outside causes the native compiler to be very conservative. (The normal BEAM emulator is not that much affected.) Replacing this declaration with -export([solve/0]). gives a much better speedup: 31.5 seconds (almost 35% from the baseline).

But the code itself has a problem: for each iteration in the factorCount loop, you perform this test:

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

The C code doesn’t do this. In general, it can be tricky to make a fair comparison between different implementations of the same code, and in particular if the algorithm is numerical, because you need to be sure that they are actually doing the same thing. A slight rounding error in one implementation due to some typecast somewhere may cause it to do many more iterations than the other even though both eventually reach the same result.

To eliminate this possible error source (and get rid of the extra test in each iteration), I rewrote the factorCount function as follows, closely modelled on the C code:

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

This rewrite, no export_all, and native compilation, gave me the following run time:

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

which is not too bad compared to the C code:

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

considering that Erlang is not at all geared towards writing numerical code, being only 50% slower than C on a program like this is pretty good.

Finally, regarding your questions:

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Yes, somewhat. In Erlang, there is no way of saying “use 32/64-bit arithmetic with wrap-around”, so unless the compiler can prove some bounds on your integers (and it usually can’t), it must check all computations to see if they can fit in a single tagged word or if it has to turn them into heap-allocated bignums. Even if no bignums are ever used in practice at runtime, these checks will have to be performed. On the other hand, that means you know that the algorithm will never fail because of an unexpected integer wraparound if you suddenly give it larger inputs than before.

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

Yes, your Erlang code is correct with respect to last call optimization.


回答 2

关于Python优化,除了使用PyPy(在代码零更改的情况下实现相当不错的加速)之外,您还可以使用PyPy的翻译工具链来编译兼容RPython的版本,或者使用Cython来构建扩展模块,两者Cython模块比我的测试中的C版本要快,而Cython模块的速度几乎是后者的两倍。作为参考,我还包括C和PyPy基准测试结果:

C(与编译gcc -O3 -lm

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython(使用最新的PyPy修订版c2f583445aee

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

RPython版本有几个关键更改。要转换为独立程序,您需要定义您的target,在这种情况下为main函数。它应该接受,sys.argv因为它是唯一的参数,并且需要返回一个int。您可以使用translate.py对其进行翻译,% translate.py euler12-rpython.py后者将转换为C并为您编译。

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

Cython版本被重写为扩展模块_euler12.pyx,我从普通的python文件导入并调用了该模块。在_euler12.pyx本质上是相同的版本,有一些额外的静态类型声明。setup.py具有使用来构建扩展的普通样板python setup.py build_ext --inplace

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

老实说,我对RPython或Cython的经验很少,并对结果感到惊喜。如果您使用的是CPython,则在Cython扩展模块中编写CPU密集型代码似乎是优化程序的一种简便方法。

In regards to Python optimization, in addition to using PyPy (for pretty impressive speed-ups with zero change to your code), you could use PyPy’s translation toolchain to compile an RPython-compliant version, or Cython to build an extension module, both of which are faster than the C version in my testing, with the Cython module nearly twice as fast. For reference I include C and PyPy benchmark results as well:

C (compiled with gcc -O3 -lm)

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython (using latest PyPy revision, c2f583445aee)

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

The RPython version has a couple of key changes. To translate into a standalone program you need to define your target, which in this case is the main function. It’s expected to accept sys.argv as it’s only argument, and is required to return an int. You can translate it by using translate.py, % translate.py euler12-rpython.py which translates to C and compiles it for you.

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

The Cython version was rewritten as an extension module _euler12.pyx, which I import and call from a normal python file. The _euler12.pyx is essentially the same as your version, with some additional static type declarations. The setup.py has the normal boilerplate to build the extension, using python setup.py build_ext --inplace.

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

I honestly have very little experience with either RPython or Cython, and was pleasantly surprised at the results. If you are using CPython, writing your CPU-intensive bits of code in a Cython extension module seems like a really easy way to optimize your program.


回答 3

问题3:您能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

C实现是次优的(由Thomas M. DuBuisson暗示),该版本使用64位整数(即long数据类型)。稍后我将研究程序集列表,但经过深思熟虑的猜测,编译后的代码中存在一些内存访问,这使得使用64位整数的速度大大降低。这就是生成的代码(因为可以在SSE寄存器中容纳更少的64位整数,或者将双精度数舍入为64位整数的事实比较慢)。

这是修改后的代码(只需将int替换为long,并且我显式内联了factorCount,尽管我认为gcc -O3不需要这样做):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

运行+计时它给出:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

作为参考,Thomas在前面的答案中的haskell实现给出:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

结论:ghc是一个很棒的编译器,它没有任何缺点,但是gcc通常会生成更快的代码。

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

The C implementation is suboptimal (as hinted at by Thomas M. DuBuisson), the version uses 64-bit integers (i.e. long datatype). I’ll investigate the assembly listing later, but with an educated guess, there are some memory accesses going on in the compiled code, which make using 64-bit integers significantly slower. It’s that or generated code (be it the fact that you can fit less 64-bit ints in a SSE register or round a double to a 64-bit integer is slower).

Here is the modified code (simply replace long with int and I explicitly inlined factorCount, although I do not think that this is necessary with gcc -O3):

#include <stdio.h>
#include <math.h>

static inline int factorCount(int n)
{
    double square = sqrt (n);
    int isquare = (int)square;
    int count = isquare == square ? -1 : 0;
    int candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    int triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index++;
        triangle += index;
    }
    printf ("%d\n", triangle);
}

Running + timing it gives:

$ gcc -O3 -lm -o euler12 euler12.c; time ./euler12
842161320
./euler12  2.95s user 0.00s system 99% cpu 2.956 total

For reference, the haskell implementation by Thomas in the earlier answer gives:

$ ghc -O2 -fllvm -fforce-recomp euler12.hs; time ./euler12                                                                                      [9:40]
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12 ...
842161320
./euler12  9.43s user 0.13s system 99% cpu 9.602 total

Conclusion: Taking nothing away from ghc, its a great compiler, but gcc normally generates faster code.


回答 4

看一下这个博客。在过去的一年左右的时间里,他用Haskell和Python解决了一些Project Euler问题,并且他通常发现Haskell的速度要快得多。我认为在这些语言之间,它与您的流畅程度和编码风格有关。

说到Python速度,您使用的是错误的实现!尝试使用PyPy,对于这样的事情,您会发现它的运行速度要快得多。

Take a look at this blog. Over the past year or so he’s done a few of the Project Euler problems in Haskell and Python, and he’s generally found Haskell to be much faster. I think that between those languages it has more to do with your fluency and coding style.

When it comes to Python speed, you’re using the wrong implementation! Try PyPy, and for things like this you’ll find it to be much, much faster.


回答 5

通过使用Haskell软件包中的某些功能,可以大大加快Haskell的实现。在这种情况下,我使用了素数,它仅与“ cabal install primes”一起安装;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

时间:

您的原始程序:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

改进的实施

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

如您所见,这台计算机在您运行16秒钟的同一台计算机上运行的时间为38毫秒:)

编译命令:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

Your Haskell implementation could be greatly sped up by using some functions from Haskell packages. In this case I used primes, which is just installed with ‘cabal install primes’ ;)

import Data.Numbers.Primes
import Data.List

triangleNumbers = scanl1 (+) [1..]
nDivisors n = product $ map ((+1) . length) (group (primeFactors n))
answer = head $ filter ((> 500) . nDivisors) triangleNumbers

main :: IO ()
main = putStrLn $ "First triangle number to have over 500 divisors: " ++ (show answer)

Timings:

Your original program:

PS> measure-command { bin\012_slow.exe }

TotalSeconds      : 16.3807409
TotalMilliseconds : 16380.7409

Improved implementation

PS> measure-command { bin\012.exe }

TotalSeconds      : 0.0383436
TotalMilliseconds : 38.3436

As you can see, this one runs in 38 milliseconds on the same machine where yours ran in 16 seconds :)

Compilation commands:

ghc -O2 012.hs -o bin\012.exe
ghc -O2 012_slow.hs -o bin\012_slow.exe

回答 6

纯娱乐。以下是更“原生”的Haskell实现:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

使用 ghc -O3,这可以在我的机器(1.73GHz Core i7)上持续运行0.55-0.58秒。

对于C版本,更有效的factorCount函数:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

使用来将main中的longs更改为int gcc -O3 -lm,这始终在0.31-0.35秒内运行。

如果利用第n个三角形数= n *(n + 1)/ 2,并且n和(n + 1)具有完全不同的素因式分解这一事实,这两个函数都可以运行得更快。可以乘以一半以找出整体的因素数量。以下:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

会将c代码运行时间减少到0.17-0.19秒,并且它可以处理更大的搜索-大于10000的因素在我的计算机上花费大约43秒。我对感兴趣的读者留下了类似的haskell提速。

Just for fun. The following is a more ‘native’ Haskell implementation:

import Control.Applicative
import Control.Monad
import Data.Either
import Math.NumberTheory.Powers.Squares

isInt :: RealFrac c => c -> Bool
isInt = (==) <$> id <*> fromInteger . round

intSqrt :: (Integral a) => a -> Int
--intSqrt = fromIntegral . floor . sqrt . fromIntegral
intSqrt = fromIntegral . integerSquareRoot'

factorize :: Int -> [Int]
factorize 1 = []
factorize n = first : factorize (quot n first)
  where first = (!! 0) $ [a | a <- [2..intSqrt n], rem n a == 0] ++ [n]

factorize2 :: Int -> [(Int,Int)]
factorize2 = foldl (\ls@((val,freq):xs) y -> if val == y then (val,freq+1):xs else (y,1):ls) [(0,0)] . factorize

numDivisors :: Int -> Int
numDivisors = foldl (\acc (_,y) -> acc * (y+1)) 1 <$> factorize2

nextTriangleNumber :: (Int,Int) -> (Int,Int)
nextTriangleNumber (n,acc) = (n+1,acc+n+1)

forward :: Int -> (Int, Int) -> Either (Int, Int) (Int, Int)
forward k val@(n,acc) = if numDivisors acc > k then Left val else Right (nextTriangleNumber val)

problem12 :: Int -> (Int, Int)
problem12 n = (!!0) . lefts . scanl (>>=) (forward n (1,1)) . repeat . forward $ n

main = do
  let (n,val) = problem12 1000
  print val

Using ghc -O3, this consistently runs in 0.55-0.58 seconds on my machine (1.73GHz Core i7).

A more efficient factorCount function for the C version:

int factorCount (int n)
{
  int count = 1;
  int candidate,tmpCount;
  while (n % 2 == 0) {
    count++;
    n /= 2;
  }
    for (candidate = 3; candidate < n && candidate * candidate < n; candidate += 2)
    if (n % candidate == 0) {
      tmpCount = 1;
      do {
        tmpCount++;
        n /= candidate;
      } while (n % candidate == 0);
       count*=tmpCount;
      }
  if (n > 1)
    count *= 2;
  return count;
}

Changing longs to ints in main, using gcc -O3 -lm, this consistently runs in 0.31-0.35 seconds.

Both can be made to run even faster if you take advantage of the fact that the nth triangle number = n*(n+1)/2, and n and (n+1) have completely disparate prime factorizations, so the number of factors of each half can be multiplied to find the number of factors of the whole. The following:

int main ()
{
  int triangle = 0,count1,count2 = 1;
  do {
    count1 = count2;
    count2 = ++triangle % 2 == 0 ? factorCount(triangle+1) : factorCount((triangle+1)/2);
  } while (count1*count2 < 1001);
  printf ("%lld\n", ((long long)triangle)*(triangle+1)/2);
}

will reduce the c code run time to 0.17-0.19 seconds, and it can handle much larger searches — greater than 10000 factors takes about 43 seconds on my machine. I leave a similar haskell speedup to the interested reader.


回答 7

问题1:erlang,python和haskell是否由于使用任意长度的整数而导致速度变慢,或者只要值小于MAXINT,它们是否会变慢?

这不太可能。关于Erlang和Haskell,我不能说太多(嗯,下面可能是关于Haskell的话),但是我可以指出Python中的许多其他瓶颈。程序每次尝试使用Python中的某些值执行操作时,都应验证这些值是否来自正确的类型,这会花费一些时间。您的factorCount函数只是分配一个具有range (1, isquare + 1)不同时间的列表,而运行时malloc风格的内存分配要比在C语言中对带有计数器的范围进行迭代要慢得多。factorCount()方法被多次调用,因此分配了很多列表。另外,让我们不要忘记Python是被解释的,而CPython解释器并没有将重点放在优化上。

编辑:哦,嗯,我注意到您正在使用Python 3,因此range()不会返回列表,而是返回一个生成器。在这种情况下,我关于分配列表的观点是错误的:该函数只是分配range对象,尽管这些对象效率不高,但效率却不如分配包含许多项目的列表。

问题2:为什么haskell这么慢?是否有编译器标志可以使您刹车,或者它是我的实现?(后者很可能因为haskell是一本对我有七个印章的书。)

您在使用拥抱吗?拥抱是一个相当慢的解释器。如果您正在使用它,也许您可​​以获得GHC的美好时光 -但我只是混淆假设,一个好的Haskell编译器在幕后所做的工作非常有趣,并且超出了我的理解范围:)

问题3:能否为我提供一些提示,说明如何在不改变因素确定方式的情况下优化这些实现?以任何方式进行优化:对语言更好,更快,更“原生”。

我会说你在玩一个有趣的游戏。了解各种语言的最好的部分就是尽可能以最不同的方式使用它们:)但是我离题,我只是对此没有任何建议。抱歉,在这种情况下,希望有人能为您提供帮助:)

问题4:我的功能实现是否允许LCO,从而避免在调用堆栈中添加不必要的帧?

据我所记得,您只需要确保递归调用是返回值之前的最后一个命令。换句话说,下面的函数可以使用这种优化:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

但是,如果您的函数如下所示,则您将没有这样的优化,因为在递归调用之后有一个运算(乘法):

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

我将操作分为一些局部变量,以明确执行哪些操作。但是,最常见的是如下所示查看这些功能,但对于我要指出的内容,它们是等效的:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

注意,由编译器/解释器决定是否进行尾递归。例如,如果我没有记错的话,Python解释器就不会这样做(我之所以使用Python是因为它的语法很流利)。无论如何,如果您发现奇怪的东西,例如带有两个参数的阶乘函数(并且其中一个参数的名称具有诸如accaccumulator等等),现在您知道人们为什么这样做了:)

Question 1: Do erlang, python and haskell loose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

This is unlikely. I cannot say much about Erlang and Haskell (well, maybe a bit about Haskell below) but I can point a lot of other bottlenecks in Python. Every time the program tries to execute an operation with some values in Python, it should verify whether the values are from the proper type, and it costs a bit of time. Your factorCount function just allocates a list with range (1, isquare + 1) various times, and runtime, malloc-styled memory allocation is way slower than iterating on a range with a counter as you do in C. Notably, the factorCount() is called multiple times and so allocates a lot of lists. Also, let us not forget that Python is interpreted and the CPython interpreter has no great focus on being optimized.

EDIT: oh, well, I note that you are using Python 3 so range() does not return a list, but a generator. In this case, my point about allocating lists is half-wrong: the function just allocates range objects, which are inefficient nonetheless but not as inefficient as allocating a list with a lot of items.

Question 2: Why is haskell so slow? Is there a compiler flag that turns off the brakes or is it my implementation? (The latter is quite probable as haskell is a book with seven seals to me.)

Are you using Hugs? Hugs is a considerably slow interpreter. If you are using it, maybe you can get a better time with GHC – but I am only cogitating hypotesis, the kind of stuff a good Haskell compiler does under the hood is pretty fascinating and way beyond my comprehension :)

Question 3: Can you offer me some hints how to optimize these implementations without changing the way I determine the factors? Optimization in any way: nicer, faster, more “native” to the language.

I’d say you are playing an unfunny game. The best part of knowing various languages is to use them the most different way possible :) But I digress, I just do not have any recommendation for this point. Sorry, I hope someone can help you in this case :)

Question 4: Do my functional implementations permit LCO and hence avoid adding unnecessary frames onto the call stack?

As far as I remember, you just need to make sure that your recursive call is the last command before returning a value. In other words, a function like the one below could use such optimization:

def factorial(n, acc=1):
    if n > 1:
        acc = acc * n
        n = n - 1
        return factorial(n, acc)
    else:
        return acc

However, you would not have such optimization if your function were such as the one below, because there is an operation (multiplication) after the recursive call:

def factorial2(n):
    if n > 1:
        f = factorial2(n-1)
        return f*n
    else:
        return 1

I separated the operations in some local variables for make it clear which operations are executed. However, the most usual is to see these functions as below, but they are equivalent for the point I am making:

def factorial(n, acc=1):
    if n > 1:
        return factorial(n-1, acc*n)
    else:
        return acc

def factorial2(n):
    if n > 1:
        return n*factorial(n-1)
    else:
        return 1

Note that it is up to the compiler/interpreter to decide if it will make tail recursion. For example, the Python interpreter does not do it if I remember well (I used Python in my example only because of its fluent syntax). Anyway, if you find strange stuff such as factorial functions with two parameters (and one of the parameters has names such as acc, accumulator etc.) now you know why people do it :)


回答 8

使用Haskell,您实际上不需要显式考虑递归。

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

在上面的代码中,我用普通的列表操作替换了@Thomas答案中的显式递归。该代码仍然执行完全相同的操作,而无需我们担心尾部递归。它运行(〜7.49s)比@Thomas的答案(〜7.04s)在我的装有GHC 7.6.2的机器上的速度慢约6%,而@Raedwulf的C版本的运行速度约为3.15s。一年以来,GHC似乎有所改善。

PS。我知道这是一个老问题,我在Google搜索中偶然发现了这个问题(现在我忘记了正在搜索的内容…)。只是想评论有关LCO的问题,并总体上表达我对Haskell的看法。我想对最佳答案发表评论,但评论不允许使用代码块。

With Haskell, you really don’t need to think in recursions explicitly.

factorCount number = foldr factorCount' 0 [1..isquare] -
                     (fromEnum $ square == fromIntegral isquare)
    where
      square = sqrt $ fromIntegral number
      isquare = floor square
      factorCount' candidate
        | number `rem` candidate == 0 = (2 +)
        | otherwise = id

triangles :: [Int]
triangles = scanl1 (+) [1,2..]

main = print . head $ dropWhile ((< 1001) . factorCount) triangles

In the above code, I have replaced explicit recursions in @Thomas’ answer with common list operations. The code still does exactly the same thing without us worrying about tail recursion. It runs (~ 7.49s) about 6% slower than the version in @Thomas’ answer (~ 7.04s) on my machine with GHC 7.6.2, while the C version from @Raedwulf runs ~ 3.15s. It seems GHC has improved over the year.

PS. I know it is an old question, and I stumble upon it from google searches (I forgot what I was searching, now…). Just wanted to comment on the question about LCO and express my feelings about Haskell in general. I wanted to comment on the top answer, but comments do not allow code blocks.


回答 9

有关C版本的更多数字和说明。显然,这些年来没有人这样做。记住要对这个答案进行投票,这样它才能在每个人的视野和学习中获得领先。

第一步:作者程序的基准

笔记本电脑规格:

  • CPU i3 M380(931 MHz-最大省电模式)
  • 4GB内存
  • Win7 64位
  • Microsoft Visual Studio 2012旗舰版
  • Cygwin与GCC 4.9.3
  • Python 2.7.10

命令:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

文件名是: integertype_architecture_compiler.exe

  • 现在,integertype与原始程序相同(稍后会详细介绍)
  • 架构是x86还是x64,具体取决于编译器设置
  • 编译器是gcc还是vs2012

第二步:再次调查,改进并进行基准测试

VS比gcc快250%。两种编译器的速度应该相似。显然,代码或编译器选项出了点问题。让我们调查一下!

第一个关注点是整数类型。转换可能会很昂贵,并且一致性对于更好的代码生成和优化很重要。所有整数应为同一类型。

这是一个混合的混乱intlong现在。我们将对此进行改进。使用什么类型?最快的。要对它们进行基准测试!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

整数类型为int long int32_t uint32_t int64_tand uint64_tfrom#include <stdint.h>

在C中有很多整数类型,还有一些带符号/无符号可玩,以及选择编译为x86或x64(不要与实际整数大小混淆)的选择。有很多版本可以编译和运行^^

第三步:了解数字

明确的结论:

  • 32位整数比64位等效项快200%
  • 无符号的64位整数比有符号的64位快25%(不幸的是,我对此没有任何解释)

技巧问题:“ C中int和long的大小是多少?”
正确的答案是:int的大小和C中的long大小没有明确定义!

从C规范:

int至少为32位
长至少是一个int

从gcc手册页(-m32和-m64标志):

32位环境将int,long和指针设置为32位,并生成可在任何i386系统上运行的代码。
64位环境将int设置为32位长,将指针设置为64位,并为AMD的x86-64体系结构生成代码。

从MSDN文档(数据类型范围)https://msdn.microsoft.com/zh-cn/library/s3f49ktz%28v=vs.110%29.aspx

int,4字节,也称为有符号
long,4字节,也称为long int和有符号long int

结论:吸取的教训

  • 32位整数比64位整数快。

  • 标准整数类型在C和C ++中都没有很好的定义,它们随编译器和体系结构的不同而不同。需要一致性和可预测性时,请使用中的uint32_t整数族#include <stdint.h>

  • 速度问题解决了。所有其他语言都落后了数百%,C&C ++再次获胜!他们总是这样做。下一个改进将是使用OpenMP:D的多线程

Some more numbers and explanations for the C version. Apparently noone did it during all those years. Remember to upvote this answer so it can get on top for everyone to see and learn.

Step One: Benchmark of the author’s programs

Laptop Specifications:

  • CPU i3 M380 (931 MHz – maximum battery saving mode)
  • 4GB memory
  • Win7 64 bits
  • Microsoft Visual Studio 2012 Ultimate
  • Cygwin with gcc 4.9.3
  • Python 2.7.10

Commands:

compiling on VS x64 command prompt > `for /f %f in ('dir /b *.c') do cl /O2 /Ot /Ox %f -o %f_x64_vs2012.exe`
compiling on cygwin with gcc x64   > `for f in ./*.c; do gcc -m64 -O3 $f -o ${f}_x64_gcc.exe ; done`
time (unix tools) using cygwin > `for f in ./*.exe; do  echo "----------"; echo $f ; time $f ; done`

.

----------
$ time python ./original.py

real    2m17.748s
user    2m15.783s
sys     0m0.093s
----------
$ time ./original_x86_vs2012.exe

real    0m8.377s
user    0m0.015s
sys     0m0.000s
----------
$ time ./original_x64_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./original_x64_gcc.exe

real    0m20.951s
user    0m20.732s
sys     0m0.030s

Filenames are: integertype_architecture_compiler.exe

  • integertype is the same as the original program for now (more on that later)
  • architecture is x86 or x64 depending on the compiler settings
  • compiler is gcc or vs2012

Step Two: Investigate, Improve and Benchmark Again

VS is 250% faster than gcc. The two compilers should give a similar speed. Obviously, something is wrong with the code or the compiler options. Let’s investigate!

The first point of interest is the integer types. Conversions can be expensive and consistency is important for better code generation & optimizations. All integers should be the same type.

It’s a mixed mess of int and long right now. We’re going to improve that. What type to use? The fastest. Gotta benchmark them’all!

----------
$ time ./int_x86_vs2012.exe

real    0m8.440s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int_x64_vs2012.exe

real    0m8.408s
user    0m0.016s
sys     0m0.015s
----------
$ time ./int32_x86_vs2012.exe

real    0m8.408s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int32_x64_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x86_vs2012.exe

real    0m18.112s
user    0m0.000s
sys     0m0.015s
----------
$ time ./int64_x64_vs2012.exe

real    0m18.611s
user    0m0.000s
sys     0m0.015s
----------
$ time ./long_x86_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.000s
----------
$ time ./long_x64_vs2012.exe

real    0m8.440s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x86_vs2012.exe

real    0m8.362s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint32_x64_vs2012.exe

real    0m8.393s
user    0m0.015s
sys     0m0.015s
----------
$ time ./uint64_x86_vs2012.exe

real    0m15.428s
user    0m0.000s
sys     0m0.015s
----------
$ time ./uint64_x64_vs2012.exe

real    0m15.725s
user    0m0.015s
sys     0m0.015s
----------
$ time ./int_x64_gcc.exe

real    0m8.531s
user    0m8.329s
sys     0m0.015s
----------
$ time ./int32_x64_gcc.exe

real    0m8.471s
user    0m8.345s
sys     0m0.000s
----------
$ time ./int64_x64_gcc.exe

real    0m20.264s
user    0m20.186s
sys     0m0.015s
----------
$ time ./long_x64_gcc.exe

real    0m20.935s
user    0m20.809s
sys     0m0.015s
----------
$ time ./uint32_x64_gcc.exe

real    0m8.393s
user    0m8.346s
sys     0m0.015s
----------
$ time ./uint64_x64_gcc.exe

real    0m16.973s
user    0m16.879s
sys     0m0.030s

Integer types are int long int32_t uint32_t int64_t and uint64_t from #include <stdint.h>

There are LOTS of integer types in C, plus some signed/unsigned to play with, plus the choice to compile as x86 or x64 (not to be confused with the actual integer size). That is a lot of versions to compile and run ^^

Step Three: Understanding the Numbers

Definitive conclusions:

  • 32 bits integers are ~200% faster than 64 bits equivalents
  • unsigned 64 bits integers are 25 % faster than signed 64 bits (Unfortunately, I have no explanation for that)

Trick question: “What are the sizes of int and long in C?”
The right answer is: The size of int and long in C are not well-defined!

From the C spec:

int is at least 32 bits
long is at least an int

From the gcc man page (-m32 and -m64 flags):

The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system.
The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD’s x86-64 architecture.

From MSDN documentation (Data Type Ranges) https://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.110%29.aspx :

int, 4 bytes, also knows as signed
long, 4 bytes, also knows as long int and signed long int

To Conclude This: Lessons Learned

  • 32 bits integers are faster than 64 bits integers.

  • Standard integers types are not well defined in C nor C++, they vary depending on compilers and architectures. When you need consistency and predictability, use the uint32_t integer family from #include <stdint.h>.

  • Speed issues solved. All other languages are back hundreds percent behind, C & C++ win again ! They always do. The next improvement will be multithreading using OpenMP :D


回答 10

查看您的Erlang实现。时间包括启动整个虚拟机,运行程序以及停止虚拟机。非常确定,设置和停止erlang vm会花费一些时间。

如果计时是在erlang虚拟机本身内部完成的,则结果将有所不同,因为在这种情况下,我们仅拥有有关程序的实际时间。否则,我相信启动和加载Erlang Vm的过程加上停止它(如您在程序中所述)所花费的总时间都包含在您用于计时的方法的总时间中。程序正在输出。考虑使用erlang计时本身,当我们想在虚拟机本身中计时程序时使用 timer:tc/1 or timer:tc/2 or timer:tc/3。这样,来自erlang的结果将排除启动和停止/杀死/停止虚拟机所花费的时间。那就是我的理由,考虑一下,然后再次尝试基准测试。

我实际上建议我们尝试在那些语言的运行时内对那些具有运行时的语言计时(对于具有运行时的语言),以便获得精确的值。例如,C与Erlang,Python和Haskell一样,没有启动和关闭运行时系统的开销(对此有98%的把握-我可以纠正)。因此(基于此推理),我最后说,该基准对于在运行时系统之上运行的语言而言不够精确/不够合理。让我们再次进行这些更改。

编辑:即使所有语言都有运行时系统,启动每种语言和停止它的开销也会有所不同。所以我建议我们从运行时系统中进行计时(适用于这种语言)。众所周知,Erlang VM在启动时会有相当大的开销!

Looking at your Erlang implementation. The timing has included the start up of the entire virtual machine, running your program and halting the virtual machine. Am pretty sure that setting up and halting the erlang vm takes some time.

If the timing was done within the erlang virtual machine itself, results would be different as in that case we would have the actual time for only the program in question. Otherwise, i believe that the total time taken by the process of starting and loading of the Erlang Vm plus that of halting it (as you put it in your program) are all included in the total time which the method you are using to time the program is outputting. Consider using the erlang timing itself which we use when we want to time our programs within the virtual machine itself timer:tc/1 or timer:tc/2 or timer:tc/3. In this way, the results from erlang will exclude the time taken to start and stop/kill/halt the virtual machine. That is my reasoning there, think about it, and then try your bench mark again.

I actually suggest that we try to time the program (for languages that have a runtime), within the runtime of those languages in order to get a precise value. C for example has no overhead of starting and shutting down a runtime system as does Erlang, Python and Haskell (98% sure of this – i stand correction). So (based on this reasoning) i conclude by saying that this benchmark wasnot precise /fair enough for languages running on top of a runtime system. Lets do it again with these changes.

EDIT: besides even if all the languages had runtime systems, the overhead of starting each and halting it would differ. so i suggest we time from within the runtime systems (for the languages for which this applies). The Erlang VM is known to have considerable overhead at start up!


回答 11

问题1:Erlang,Python和Haskell是否由于使用任意长度的整数而导致速度下降,还是只要它们的值小于MAXINT,它们就不会丢失吗?

对于Erlang,可以否定回答第一个问题。通过适当地使用Erlang可以回答最后一个问题,如下所示:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

因为它比您最初的C示例要快,所以我想会有很多问题,因为其他问题已经详细介绍了。

这个Erlang模块在便宜的上网本上执行大约需要5秒钟的时间。它使用erlang中的网络线程模型,从而演示了如何利用事件模型。它可以分布在许多节点上。而且速度很快。不是我的代码。

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

以下测试在以下环境下进行:Intel(R)Atom(TM)CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

Question 1: Do Erlang, Python and Haskell lose speed due to using arbitrary length integers or don’t they as long as the values are less than MAXINT?

Question one can be answered in the negative for Erlang. The last question is answered by using Erlang appropriately, as in:

http://bredsaal.dk/learning-erlang-using-projecteuler-net

Since it’s faster than your initial C example, I would guess there are numerous problems as others have already covered in detail.

This Erlang module executes on a cheap netbook in about 5 seconds … It uses the network threads model in erlang and, as such demonstrates how to take advantage of the event model. It could be distributed over many nodes. And it’s fast. Not my code.

-module(p12dist).  
-author("Jannich Brendle, jannich@bredsaal.dk, http://blog.bredsaal.dk").  
-compile(export_all).

server() ->  
  server(1).

server(Number) ->  
  receive {getwork, Worker_PID} -> Worker_PID ! {work,Number,Number+100},  
  server(Number+101);  
  {result,T} -> io:format("The result is: \~w.\~n", [T]);  
  _ -> server(Number)  
  end.

worker(Server_PID) ->  
  Server_PID ! {getwork, self()},  
  receive {work,Start,End} -> solve(Start,End,Server_PID)  
  end,  
  worker(Server_PID).

start() ->  
  Server_PID = spawn(p12dist, server, []),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]),  
  spawn(p12dist, worker, [Server_PID]).

solve(N,End,_) when N =:= End -> no_solution;

solve(N,End,Server_PID) ->  
  T=round(N*(N+1)/2),
  case (divisor(T,round(math:sqrt(T))) > 500) of  
    true ->  
      Server_PID ! {result,T};  
    false ->  
      solve(N+1,End,Server_PID)  
  end.

divisors(N) ->  
  divisor(N,round(math:sqrt(N))).

divisor(_,0) -> 1;  
divisor(N,I) ->  
  case (N rem I) =:= 0 of  
  true ->  
    2+divisor(N,I-1);  
  false ->  
    divisor(N,I-1)  
  end.

The test below took place on an: Intel(R) Atom(TM) CPU N270 @ 1.60GHz

~$ time erl -noshell -s p12dist start

The result is: 76576500.

^C

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
a

real    0m5.510s
user    0m5.836s
sys 0m0.152s

回答 12

C ++ 11,<我20ms的运行在这里

我了解您想获得一些技巧来帮助您改善特定于语言的知识,但是由于本文对此已作了详尽介绍,因此我想为可能看过您对问题的数学评论等内容的人添加一些上下文,并想知道为什么这样做代码太慢了。

该答案主要是提供上下文,以希望帮助人们更轻松地评估您的问题/其他答案中的代码。

这段代码仅基于以下几点使用了与丑陋语言无关的(丑陋的)优化:

  1. 每个训练序列号的形式为n(n + 1)/ 2
  2. n和n + 1是互质的
  3. 除数是一个乘法函数

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

我的台式机平均要花19毫秒,而笔记本电脑平均要花80毫秒,这与我在这里看到的大多数其他代码相去甚远。毫无疑问,仍有许多优化措施可用。

C++11, < 20ms for meRun it here

I understand that you want tips to help improve your language specific knowledge, but since that has been well covered here, I thought I would add some context for people who may have looked at the mathematica comment on your question, etc, and wondered why this code was so much slower.

This answer is mainly to provide context to hopefully help people evaluate the code in your question / other answers more easily.

This code uses only a couple of (uglyish) optimisations, unrelated to the language used, based on:

  1. every traingle number is of the form n(n+1)/2
  2. n and n+1 are coprime
  3. the number of divisors is a multiplicative function

#include <iostream>
#include <cmath>
#include <tuple>
#include <chrono>

using namespace std;

// Calculates the divisors of an integer by determining its prime factorisation.

int get_divisors(long long n)
{
    int divisors_count = 1;

    for(long long i = 2;
        i <= sqrt(n);
        /* empty */)
    {
        int divisions = 0;
        while(n % i == 0)
        {
            n /= i;
            divisions++;
        }

        divisors_count *= (divisions + 1);

        //here, we try to iterate more efficiently by skipping
        //obvious non-primes like 4, 6, etc
        if(i == 2)
            i++;
        else
            i += 2;
    }

    if(n != 1) //n is a prime
        return divisors_count * 2;
    else
        return divisors_count;
}

long long euler12()
{
    //n and n + 1
    long long n, n_p_1;

    n = 1; n_p_1 = 2;

    // divisors_x will store either the divisors of x or x/2
    // (the later iff x is divisible by two)
    long long divisors_n = 1;
    long long divisors_n_p_1 = 2;

    for(;;)
    {
        /* This loop has been unwound, so two iterations are completed at a time
         * n and n + 1 have no prime factors in common and therefore we can
         * calculate their divisors separately
         */

        long long total_divisors;                 //the divisors of the triangle number
                                                  // n(n+1)/2

        //the first (unwound) iteration

        divisors_n_p_1 = get_divisors(n_p_1 / 2); //here n+1 is even and we

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1);   //n_p_1 is now odd!

        //now the second (unwound) iteration

        total_divisors =
                  divisors_n
                * divisors_n_p_1;

        if(total_divisors > 1000)
            break;

        //move n and n+1 forward
        n = n_p_1;
        n_p_1 = n + 1;

        //fix the divisors
        divisors_n = divisors_n_p_1;
        divisors_n_p_1 = get_divisors(n_p_1 / 2);   //n_p_1 is now even!
    }

    return (n * n_p_1) / 2;
}

int main()
{
    for(int i = 0; i < 1000; i++)
    {
        using namespace std::chrono;
        auto start = high_resolution_clock::now();
        auto result = euler12();
        auto end = high_resolution_clock::now();

        double time_elapsed = duration_cast<milliseconds>(end - start).count();

        cout << result << " " << time_elapsed << '\n';
    }
    return 0;
}

That takes around 19ms on average for my desktop and 80ms for my laptop, a far cry from most of the other code I’ve seen here. And there are, no doubt, many optimisations still available.


回答 13

尝试去:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

我得到:

原始C版本:9.1690 100%
执行:8.2520 111%

但是使用:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

我得到:

原始c版本:9.1690 100%
thaumkid的c版本:0.1060 8650%
初版:8.2520 111%
次版:0.0230 39865%

我也尝试了Python3.6和pypy3.3-5.5-alpha:

原始C版本:8.629 100%
thaumkid的C版本:0.109 7916%
Python3.6:54.795 16%
pypy3.3-5.5-alpha:13.291 65%

然后用下面的代码我得到:

原始c版本:8.629 100%
thaumkid的c版本:0.109 8650%
Python3.6:1.489 580%
pypy3.3-5.5-alpha:0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

Trying GO:

package main

import "fmt"
import "math"

func main() {
    var n, m, c int
    for i := 1; ; i++ {
        n, m, c = i * (i + 1) / 2, int(math.Sqrt(float64(n))), 0
        for f := 1; f < m; f++ {
            if n % f == 0 { c++ }
    }
    c *= 2
    if m * m == n { c ++ }
    if c > 1001 {
        fmt.Println(n)
        break
        }
    }
}

I get:

original c version: 9.1690 100%
go: 8.2520 111%

But using:

package main

import (
    "math"
    "fmt"
 )

// Sieve of Eratosthenes
func PrimesBelow(limit int) []int {
    switch {
        case limit < 2:
            return []int{}
        case limit == 2:
            return []int{2}
    }
    sievebound := (limit - 1) / 2
    sieve := make([]bool, sievebound+1)
    crosslimit := int(math.Sqrt(float64(limit))-1) / 2
    for i := 1; i <= crosslimit; i++ {
        if !sieve[i] {
            for j := 2 * i * (i + 1); j <= sievebound; j += 2*i + 1 {
                sieve[j] = true
            }
        }
    }
    plimit := int(1.3*float64(limit)) / int(math.Log(float64(limit)))
    primes := make([]int, plimit)
    p := 1
    primes[0] = 2
    for i := 1; i <= sievebound; i++ {
        if !sieve[i] {
            primes[p] = 2*i + 1
            p++
            if p >= plimit {
                break
            }
        }
    }
    last := len(primes) - 1
    for i := last; i > 0; i-- {
        if primes[i] != 0 {
            break
        }
        last = i
    }
    return primes[0:last]
}



func main() {
    fmt.Println(p12())
}
// Requires PrimesBelow from utils.go
func p12() int {
    n, dn, cnt := 3, 2, 0
    primearray := PrimesBelow(1000000)
    for cnt <= 1001 {
        n++
        n1 := n
        if n1%2 == 0 {
            n1 /= 2
        }
        dn1 := 1
        for i := 0; i < len(primearray); i++ {
            if primearray[i]*primearray[i] > n1 {
                dn1 *= 2
                break
            }
            exponent := 1
            for n1%primearray[i] == 0 {
                exponent++
                n1 /= primearray[i]
            }
            if exponent > 1 {
                dn1 *= exponent
            }
            if n1 == 1 {
                break
            }
        }
        cnt = dn * dn1
        dn = dn1
    }
    return n * (n - 1) / 2
}

I get:

original c version: 9.1690 100%
thaumkid’s c version: 0.1060 8650%
first go version: 8.2520 111%
second go version: 0.0230 39865%

I also tried Python3.6 and pypy3.3-5.5-alpha:

original c version: 8.629 100%
thaumkid’s c version: 0.109 7916%
Python3.6: 54.795 16%
pypy3.3-5.5-alpha: 13.291 65%

and then with following code I got:

original c version: 8.629 100%
thaumkid’s c version: 0.109 8650%
Python3.6: 1.489 580%
pypy3.3-5.5-alpha: 0.582 1483%

def D(N):
    if N == 1: return 1
    sqrtN = int(N ** 0.5)
    nf = 1
    for d in range(2, sqrtN + 1):
        if N % d == 0:
            nf = nf + 1
    return 2 * nf - (1 if sqrtN**2 == N else 0)

L = 1000
Dt, n = 0, 0

while Dt <= L:
    t = n * (n + 1) // 2
    Dt = D(n/2)*D(n+1) if n%2 == 0 else D(n)*D((n+1)/2)
    n = n + 1

print (t)

回答 14

更改: case (divisor(T,round(math:sqrt(T))) > 500) of

至: case (divisor(T,round(math:sqrt(T))) > 1000) of

这将为Erlang多进程示例提供正确的答案。

Change: case (divisor(T,round(math:sqrt(T))) > 500) of

To: case (divisor(T,round(math:sqrt(T))) > 1000) of

This will produce the correct answer for the Erlang multi-process example.


回答 15

我假设只有在涉及的因素有很多小因素的情况下,因素的数量才会很大。因此,我使用了thaumkid出色的算法,但首先使用了对因数的近似值,该值永远不会太小。这很简单:检查最多29个主要因子,然后检查剩余数量并计算n个因子的上限。用它来计算因子数量的上限,如果该数量足够高,请计算因子的确切数量。

下面的代码不需要出于正确性的这种假设,而是要快速。似乎可行;十万分之一的数字中只有大约一个给出的估计值足够高,需要进行全面检查。

这是代码:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

这将在0.7秒内找到第14,753,024个三角形,其中有13824个因数;在34秒内找到了879,207,615个三角形,其中有61,440个因数;在10分钟5秒内发现了第123,524,486,975个三角形,具有138,240个因数;在第26,467,792,064个三角形中,具有172,032个因数。 21分25秒(2.4 GHz Core2 Duo),因此该代码平均每个数字仅需要116个处理器周期。最后一个三角形数字本身大于2 ^ 68,因此

I made the assumption that the number of factors is only large if the numbers involved have many small factors. So I used thaumkid’s excellent algorithm, but first used an approximation to the factor count that is never too small. It’s quite simple: Check for prime factors up to 29, then check the remaining number and calculate an upper bound for the nmber of factors. Use this to calculate an upper bound for the number of factors, and if that number is high enough, calculate the exact number of factors.

The code below doesn’t need this assumption for correctness, but to be fast. It seems to work; only about one in 100,000 numbers gives an estimate that is high enough to require a full check.

Here’s the code:

// Return at least the number of factors of n.
static uint64_t approxfactorcount (uint64_t n)
{
    uint64_t count = 1, add;

#define CHECK(d)                            \
    do {                                    \
        if (n % d == 0) {                   \
            add = count;                    \
            do { n /= d; count += add; }    \
            while (n % d == 0);             \
        }                                   \
    } while (0)

    CHECK ( 2); CHECK ( 3); CHECK ( 5); CHECK ( 7); CHECK (11); CHECK (13);
    CHECK (17); CHECK (19); CHECK (23); CHECK (29);
    if (n == 1) return count;
    if (n < 1ull * 31 * 31) return count * 2;
    if (n < 1ull * 31 * 31 * 37) return count * 4;
    if (n < 1ull * 31 * 31 * 37 * 37) return count * 8;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41) return count * 16;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43) return count * 32;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47) return count * 64;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53) return count * 128;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59) return count * 256;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61) return count * 512;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67) return count * 1024;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71) return count * 2048;
    if (n < 1ull * 31 * 31 * 37 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73) return count * 4096;
    return count * 1000000;
}

// Return the number of factors of n.
static uint64_t factorcount (uint64_t n)
{
    uint64_t count = 1, add;

    CHECK (2); CHECK (3);

    uint64_t d = 5, inc = 2;
    for (; d*d <= n; d += inc, inc = (6 - inc))
        CHECK (d);

    if (n > 1) count *= 2; // n must be a prime number
    return count;
}

// Prints triangular numbers with record numbers of factors.
static void printrecordnumbers (uint64_t limit)
{
    uint64_t record = 30000;

    uint64_t count1, factor1;
    uint64_t count2 = 1, factor2 = 1;

    for (uint64_t n = 1; n <= limit; ++n)
    {
        factor1 = factor2;
        count1 = count2;

        factor2 = n + 1; if (factor2 % 2 == 0) factor2 /= 2;
        count2 = approxfactorcount (factor2);

        if (count1 * count2 > record)
        {
            uint64_t factors = factorcount (factor1) * factorcount (factor2);
            if (factors > record)
            {
                printf ("%lluth triangular number = %llu has %llu factors\n", n, factor1 * factor2, factors);
                record = factors;
            }
        }
    }
}

This finds the 14,753,024th triangular with 13824 factors in about 0.7 seconds, the 879,207,615th triangular number with 61,440 factors in 34 seconds, the 12,524,486,975th triangular number with 138,240 factors in 10 minutes 5 seconds, and the 26,467,792,064th triangular number with 172,032 factors in 21 minutes 25 seconds (2.4GHz Core2 Duo), so this code takes only 116 processor cycles per number on average. The last triangular number itself is larger than 2^68, so


回答 16

我将“ Jannich Brendle”版本修改为1000,而不是500。并列出了euler12.bin,euler12.erl,p12dist.erl的结果。两种erl代码都使用“ + native”进行编译。

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

I modified “Jannich Brendle” version to 1000 instead 500. And list the result of euler12.bin, euler12.erl, p12dist.erl. Both erl codes use ‘+native’ to compile.

zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s p12dist start
The result is: 842161320.

real    0m3.879s
user    0m14.553s
sys     0m0.314s
zhengs-MacBook-Pro:workspace zhengzhibin$ time erl -noshell -s euler12 solve
842161320

real    0m10.125s
user    0m10.078s
sys     0m0.046s
zhengs-MacBook-Pro:workspace zhengzhibin$ time ./euler12.bin 
842161320

real    0m5.370s
user    0m5.328s
sys     0m0.004s
zhengs-MacBook-Pro:workspace zhengzhibin$

回答 17

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

超时时间

2.79s用户0.00s系统99%cpu 2.794

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square+1;
    long candidate = 2;
    int count = 1;
    while(candidate <= isquare && candidate<=n){
        int c = 1;
        while (n % candidate == 0) {
           c++;
           n /= candidate;
        }
        count *= c;
        candidate++;
    }
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

gcc -lm -Ofast euler.c

time ./a.out

2.79s user 0.00s system 99% cpu 2.794 total


Mal-MAL-做一个Lisp

MAL-做一个Lisp

描述

1.Mal是一个受Clojure启发的Lisp解释器

2.MAL是一种学习工具

MAL的每个实现被分成11个增量的、自包含的(且可测试的)步骤,这些步骤演示了Lisp的核心概念。最后一步是能够自托管(运行mal的错误实现)。请参阅make-a-lisp process
guide

Make-a-LISP步骤包括:

每个Make-a-LISP步骤都有一个关联的架构图。该步骤的新元素以红色高亮显示。以下是step A

stepA_mal architecture

如果您对创建mal实现感兴趣(或者只是对使用mal做某事感兴趣),欢迎您加入我们的Discord或加入#mal onlibera.chat除了make-a-lisp
process guide
还有一个mal/make-a-lisp
FAQ
在这里我试图回答一些常见的问题

3.MAL用86种语言实现(91种不同实现,113种运行模式)

语言 创建者
Ada Chris Moore
Ada #2 Nicolas Boulenguez
GNU Awk Miutsuru Kariya
Bash 4 Joel Martin
BASIC(C64和QBASIC) Joel Martin
BBC BASIC V Ben Harris
C Joel Martin
C #2 Duncan Watts
C++ Stephen Thirlwall
C# Joel Martin
ChucK Vasilij Schneidermann
Clojure(Clojure和ClojureScript) Joel Martin
CoffeeScript Joel Martin
Common Lisp Iqbal Ansari
Crystal Linda_pp
D Dov Murik
Dart Harry Terkelsen
Elixir Martin Ek
Elm Jos van Bakel
Emacs Lisp Vasilij Schneidermann
Erlang Nathan Fiedler
ES6(ECMAScript 2015) Joel Martin
F# Peter Stephens
Factor Jordan Lewis
Fantom Dov Murik
Fennel sogaiu
Forth Chris Houser
GNU Guile Mu Lei
GNU Smalltalk Vasilij Schneidermann
Go Joel Martin
Groovy Joel Martin
Haskell Joel Martin
Haxe(Neko、Python、C++和JS) Joel Martin
Hy Joel Martin
Io Dov Murik
Janet sogaiu
Java Joel Martin
Java(松露/GraalVM) Matt McGill
JavaScript(Demo) Joel Martin
jq Ali MohammadPur
Julia Joel Martin
Kotlin Javier Fernandez-Ivern
LiveScript Jos van Bakel
Logo Dov Murik
Lua Joel Martin
GNU Make Joel Martin
mal itself Joel Martin
MATLAB(GNU Octave&MATLAB) Joel Martin
miniMAL(RepoDemo) Joel Martin
NASM Ben Dudson
Nim Dennis Felsing
Object Pascal Joel Martin
Objective C Joel Martin
OCaml Chris Houser
Perl Joel Martin
Perl 6 Hinrik Örn Sigurðsson
PHP Joel Martin
Picolisp Vasilij Schneidermann
Pike Dov Murik
PL/pgSQL(PostgreSQL) Joel Martin
PL/SQL(Oracle) Joel Martin
PostScript Joel Martin
PowerShell Joel Martin
Prolog Nicolas Boulenguez
Python(2.x和3.x) Joel Martin
Python #2(3.x) Gavin Lewis
RPython Joel Martin
R Joel Martin
Racket Joel Martin
Rexx Dov Murik
Ruby Joel Martin
Rust Joel Martin
Scala Joel Martin
Scheme (R7RS) Vasilij Schneidermann
Skew Dov Murik
Standard ML Fabian Bergström
Swift 2 Keith Rollin
Swift 3 Joel Martin
Swift 4 陆遥
Swift 5 Oleg Montak
Tcl Dov Murik
TypeScript Masahiro Wakame
Vala Simon Tatham
VHDL Dov Murik
Vimscript Dov Murik
Visual Basic.NET Joel Martin
WebAssembly(WASM) Joel Martin
Wren Dov Murik
XSLT Ali MohammadPur
Yorick Dov Murik
Zig Josh Tobin

演示文稿

Mal第一次出现在2014年Clojure West的闪电演讲中(不幸的是没有视频)。参见Examples/clojurewest2014.mal了解会议上的演示文稿(是的,该演示文稿是一个MALL程序)

在Midwest.io 2015上,乔尔·马丁(Joel Martin)就MAL做了题为“解锁的成就:一条更好的语言学习之路”的演讲VideoSlides

最近,乔尔在LambdaConf 2016大会上发表了题为“用10个增量步骤打造自己的Lisp解释器”的演讲:Part 1Part 2Part 3Part 4Slides

构建/运行实现

运行任何给定实现的最简单方法是使用docker。每个实现都有一个预先构建的停靠器映像,其中安装了语言依赖项。您可以在顶层Makefile中使用一个方便的目标启动REPL(其中impl是实现目录名,stepX是要运行的步骤):

make DOCKERIZE=1 "repl^IMPL^stepX"
    # OR stepA is the default step:
make DOCKERIZE=1 "repl^IMPL"

外部实现

以下实施作为单独的项目进行维护:

HolyC

生锈

  • by Tim Morgan
  • by vi-使用Pest语法,不使用典型的MAL基础设施(货币化步骤和内置的转换测试)

问:

  • by Ali Mohammad Pur-Q实现运行良好,但它需要专有的手动下载,不能Docker化(或集成到mal CI管道中),因此目前它仍然是一个单独的项目

其他MAL项目

  • malc详细说明:MAL(Make A Lisp)编译器。将MAL程序编译成LLVM汇编语言,然后编译成二进制
  • malcc-malcc是MAL语言的增量编译器实现。它使用微型C编译器作为编译器后端,并且完全支持MAL语言,包括宏、尾部调用消除,甚至运行时求值。“I Built a Lisp Compiler”发布有关该过程的帖子
  • frock+Clojure风格的PHP。使用mal/php运行程序
  • flk-无论Bash在哪里都可以运行的LISP
  • glisp详细说明:基于Lisp的自引导图形设计工具。Live Demo

实施详情

Ada

Ada实现是在Debian上使用GNAT4.9开发的。如果您有git、gnat和make(可选)的windows版本,它也可以在windows上编译而不变。没有外部依赖项(未实现ReadLine)

cd impls/ada
make
./stepX_YYY

Ada.2

第二个Ada实现是使用GNAT 8开发的,并与GNU读取线库链接

cd impls/ada
make
./stepX_YYY

GNU awk

Mal的GNU awk实现已经使用GNU awk 4.1.1进行了测试

cd impls/gawk
gawk -O -f stepX_YYY.awk

BASH 4

cd impls/bash
bash stepX_YYY.sh

基本(C64和QBasic)

Basic实现使用一个预处理器,该预处理器可以生成与C64 Basic(CBMv2)和QBasic兼容的Basic代码。C64模式已经过测试cbmbasic(当前需要打补丁的版本来修复线路输入问题),并且QBasic模式已经过测试qb64

生成C64代码并使用cbmbasic运行:

cd impls/basic
make stepX_YYY.bas
STEP=stepX_YYY ./run

生成QBasic代码并加载到qb64中:

cd impls/basic
make MODE=qbasic stepX_YYY.bas
./qb64 stepX_YYY.bas

感谢Steven Syrek有关此实现的原始灵感,请参阅

BBC Basic V

BBC Basic V实现可以在Brandy解释器中运行:

cd impls/bbc-basic
brandy -quit stepX_YYY.bbc

或在RISC OS 3或更高版本下的ARM BBC Basic V中:

*Dir bbc-basic.riscos
*Run setup
*Run stepX_YYY

C

mal的C实现需要以下库(lib和头包):glib、libffi6、libgc以及libedit或GNU readline库

cd impls/c
make
./stepX_YYY

C.2

mal的第二个C实现需要以下库(lib和头包):libedit、libgc、libdl和libffi

cd impls/c.2
make
./stepX_YYY

C++

构建mal的C++实现需要g++-4.9或clang++-3.5和readline兼容库。请参阅cpp/README.md有关更多详细信息,请执行以下操作:

cd impls/cpp
make
    # OR
make CXX=clang++-3.5
./stepX_YYY

C#

mal的C#实现已经在Linux上使用Mono C#编译器(MCS)和Mono运行时(2.10.8.1版)进行了测试。两者都是构建和运行C#实现所必需的

cd impls/cs
make
mono ./stepX_YYY.exe

卡盘

Chuck实现已经使用Chuck 1.3.5.2进行了测试

cd impls/chuck
./run

封闭式

在很大程度上,Clojure实现需要Clojure 1.5,然而,要通过所有测试,则需要Clojure 1.8.0-RC4

cd impls/clojure
lein with-profile +stepX trampoline run

CoffeeScript

sudo npm install -g coffee-script
cd impls/coffee
coffee ./stepX_YYY

通用Lisp

该实现已经在Ubuntu 16.04和Ubuntu 12.04上使用SBCL、CCL、CMUCL、GNU CLISP、ECL和Allegro CL进行了测试,请参阅README了解更多详细信息。如果您安装了上述依赖项,请执行以下操作来运行实现

cd impls/common-lisp
make
./run

水晶

MAL的晶体实现已经用Crystal 0.26.1进行了测试

cd impls/crystal
crystal run ./stepX_YYY.cr
    # OR
make   # needed to run tests
./stepX_YYY

D

使用GDC4.8对MAL的D实现进行了测试。它需要GNU读取线库

cd impls/d
make
./stepX_YYY

省道

DART实施已使用DART 1.20进行了测试

cd impls/dart
dart ./stepX_YYY

Emacs Lisp

Emacs Lisp的MAL实现已经使用Emacs 24.3和24.5进行了测试。虽然有非常基本的读数行编辑(<backspace>C-d工作,C-c取消进程),建议使用rlwrap

cd impls/elisp
emacs -Q --batch --load stepX_YYY.el
# with full readline support
rlwrap emacs -Q --batch --load stepX_YYY.el

灵丹妙药

MAL的长生不老的实现已经在长生不老的长生不老的1.0.5中进行了测试

cd impls/elixir
mix stepX_YYY
# Or with readline/line editing functionality:
iex -S mix stepX_YYY

榆树

MAL的ELM实现已经用ELM 0.18.0进行了测试

cd impls/elm
make stepX_YYY.js
STEP=stepX_YYY ./run

二郎

Mal的Erlang实现需要Erlang/OTP R17rebar要建造

cd impls/erlang
make
    # OR
MAL_STEP=stepX_YYY rebar compile escriptize # build individual step
./stepX_YYY

ES6(ECMAScript 2015)

ES6/ECMAScript 2015实施使用babel用于生成ES5兼容JavaScript的编译器。生成的代码已经在Node 0.12.4上进行了测试

cd impls/es6
make
node build/stepX_YYY.js

F#

mal的F#实现已经在Linux上使用Mono F#编译器(Fsharpc)和Mono运行时(版本3.12.1)进行了测试。单C#编译器(MCS)也是编译readline依赖项所必需的。所有这些都是构建和运行F#实现所必需的

cd impls/fsharp
make
mono ./stepX_YYY.exe

因素

MAL的因子实现已通过因子0.97(factorcode.org)

cd impls/factor
FACTOR_ROOTS=. factor -run=stepX_YYY

幻影

MAL的幻象实现已经用幻象1.0.70进行了测试

cd impls/fantom
make lib/fan/stepX_YYY.pod
STEP=stepX_YYY ./run

茴香

Mal的Fennel实现已经在Lua5.4上使用Fennel版本0.9.1进行了测试

cd impls/fennel
fennel ./stepX_YYY.fnl

第四

cd impls/forth
gforth stepX_YYY.fs

GNU Guile 2.1+

cd impls/guile
guile -L ./ stepX_YYY.scm

GNU Smalltalk

MALL的Smalltalk实现已经在GNU Smalltalk 3.2.91上进行了测试

cd impls/gnu-smalltalk
./run

MALL的GO实现要求在路径上安装GO。该实现已经在GO 1.3.1上进行了测试

cd impls/go
make
./stepX_YYY

时髦的

mal的Groovy实现需要Groovy才能运行,并且已经使用Groovy 1.8.6进行了测试

cd impls/groovy
make
groovy ./stepX_YYY.groovy

哈斯克尔

Haskell实现需要GHC编译器版本7.10.1或更高版本以及Haskell parsec和readline(或editline)包

cd impls/haskell
make
./stepX_YYY

Haxe(Neko、Python、C++和JavaScript)

Mal的Haxe实现需要编译Haxe3.2版。支持四种不同的Haxe目标:neko、Python、C++和JavaScript

cd impls/haxe
# Neko
make all-neko
neko ./stepX_YYY.n
# Python
make all-python
python3 ./stepX_YYY.py
# C++
make all-cpp
./cpp/stepX_YYY
# JavaScript
make all-js
node ./stepX_YYY.js

干草

MAL的Hy实现已经用Hy 0.13.0进行了测试

cd impls/hy
./stepX_YYY.hy

IO

已使用IO版本20110905测试了MAL的IO实现

cd impls/io
io ./stepX_YYY.io

珍妮特

MAIL的Janet实现已经使用Janet版本1.12.2进行了测试

cd impls/janet
janet ./stepX_YYY.janet

Java 1.7

mal的Java实现需要maven2来构建

cd impls/java
mvn compile
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY
    # OR
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY -Dexec.args="CMDLINE_ARGS"

Java,将Truffle用于GraalVM

这个Java实现可以在OpenJDK上运行,但是多亏了Truffle框架,它在GraalVM上的运行速度可以提高30倍。它已经在OpenJDK 11、GraalVM CE 20.1.0和GraalVM CE 21.1.0上进行了测试

cd impls/java-truffle
./gradlew build
STEP=stepX_YYY ./run

JavaScript/节点

cd impls/js
npm install
node stepX_YYY.js

朱莉娅

Mal的Julia实现需要Julia 0.4

cd impls/julia
julia stepX_YYY.jl

JQ

针对1.6版进行了测试,IO部门存在大量作弊行为

cd impls/jq
STEP=stepA_YYY ./run
    # with Debug
DEBUG=true STEP=stepA_YYY ./run

科特林

MAL的Kotlin实现已经使用Kotlin 1.0进行了测试

cd impls/kotlin
make
java -jar stepX_YYY.jar

LiveScript

已使用LiveScript 1.5测试了mal的LiveScript实现

cd impls/livescript
make
node_modules/.bin/lsc stepX_YYY.ls

徽标

MAL的Logo实现已经用UCBLogo 6.0进行了测试

cd impls/logo
logo stepX_YYY.lg

路亚

Mal的Lua实现已经使用Lua 5.3.5进行了测试。该实现需要安装luarock

cd impls/lua
make  # to build and link linenoise.so and rex_pcre.so
./stepX_YYY.lua

男性

运行mal的错误实现包括运行其他实现之一的STEPA,并传递作为命令行参数运行的mal步骤

cd impls/IMPL
IMPL_STEPA_CMD ../mal/stepX_YYY.mal

GNU Make 3.81

cd impls/make
make -f stepX_YYY.mk

NASM

MAL的NASM实现是为x86-64 Linux编写的,并且已经在Linux 3.16.0-4-AMD64和NASM版本2.11.05上进行了测试

cd impls/nasm
make
./stepX_YYY

NIM 1.0.4

MAL的NIM实现已经使用NIM 1.0.4进行了测试

cd impls/nim
make
  # OR
nimble build
./stepX_YYY

对象PASCAL

MAL的对象Pascal实现已经使用Free Pascal编译器版本2.6.2和2.6.4在Linux上构建和测试

cd impls/objpascal
make
./stepX_YYY

目标C

Mal的Objective C实现已经在Linux上使用CLANG/LLVM3.6进行了构建和测试。它还使用XCode7在OS X上进行了构建和测试

cd impls/objc
make
./stepX_YYY

OCaml 4.01.0

cd impls/ocaml
make
./stepX_YYY

MATLAB(GNU倍频程和MATLAB)

MATLAB实现已经在GNU Octave 4.2.1上进行了测试。它还在Linux上用MATLAB版本R2014a进行了测试。请注意,matlab是一个商业产品。

cd impls/matlab
./stepX_YYY
octave -q --no-gui --no-history --eval "stepX_YYY();quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY();quit;"
    # OR with command line arguments
octave -q --no-gui --no-history --eval "stepX_YYY('arg1','arg2');quit;"
matlab -nodisplay -nosplash -nodesktop -nojvm -r "stepX_YYY('arg1','arg2');quit;"

极小值

miniMAL是用不到1024字节的JavaScript实现的小型Lisp解释器。要运行mal的最小实现,您需要下载/安装最小解释器(这需要Node.js)

cd impls/miniMAL
# Download miniMAL and dependencies
npm install
export PATH=`pwd`/node_modules/minimal-lisp/:$PATH
# Now run mal implementation in miniMAL
miniMAL ./stepX_YYY

Perl 5

Perl 5实现应该使用Perl 5.19.3和更高版本

要获得读取行编辑支持,请从CPAN安装Term::ReadLine::Perl或Term::ReadLine::GNU

cd impls/perl
perl stepX_YYY.pl

Perl 6

Perl6实现在Rakudo Perl6 2016.04上进行了测试

cd impls/perl6
perl6 stepX_YYY.pl

PHP 5.3

mal的PHP实现需要php命令行界面才能运行

cd impls/php
php stepX_YYY.php

皮奥利普

Picolisp实现需要libreadline和Picolisp 3.1.11或更高版本

cd impls/picolisp
./run

派克

Pike实现在Pike8.0上进行了测试

cd impls/pike
pike stepX_YYY.pike

pl/pgSQL(PostgreSQL SQL过程语言)

mal的PL/pgSQL实现需要一个正在运行的PostgreSQL服务器(“kanaka/mal-test-plpgsql”docker映像自动启动PostgreSQL服务器)。该实现连接到PostgreSQL服务器并创建名为“mal”的数据库来存储表和存储过程。包装器脚本使用psql命令连接到服务器,并默认为用户“postgres”,但可以使用PSQL_USER环境变量覆盖该值。可以使用PGPASSWORD环境变量指定密码。该实现已使用PostgreSQL 9.4进行了测试

cd impls/plpgsql
./wrap.sh stepX_YYY.sql
    # OR
PSQL_USER=myuser PGPASSWORD=mypass ./wrap.sh stepX_YYY.sql

PL/SQL(Oracle SQL过程语言)

mal的PL/SQL实现需要一个正在运行的Oracle DB服务器(“kanaka/mal-test-plsql”docker映像自动启动Oracle Express服务器)。该实现连接到Oracle服务器以创建类型、表和存储过程。默认的SQL*Plus登录值(用户名/口令@CONNECT_IDENTIFIER)是“SYSTEM/ORACLE”,但是可以用ORACLE_LOGON环境变量覆盖该值。该实施已使用Oracle Express Edition 11g Release 2进行了测试。请注意,任何SQL*Plus连接警告(用户密码过期等)都会干扰包装脚本与数据库通信的能力

cd impls/plsql
./wrap.sh stepX_YYY.sql
    # OR
ORACLE_LOGON=myuser/mypass@ORCL ./wrap.sh stepX_YYY.sql

PostScript Level 2/3

mal的PostScript实现需要运行Ghostscript。它已经使用Ghostscript 9.10进行了测试

cd impls/ps
gs -q -dNODISPLAY -I./ stepX_YYY.ps

PowerShell

Mal的PowerShell实现需要PowerShell脚本语言。它已经在Linux上使用PowerShell 6.0.0 Alpha 9进行了测试

cd impls/powershell
powershell ./stepX_YYY.ps1

序言

Prolog实现使用了一些特定于SWI-Prolog的结构,包括READLINE支持,并且已经在8.2.1版的Debian GNU/Linux上进行了测试

cd impls/prolog
swipl stepX_YYY

Python(2.x和3.x)

cd impls/python
python stepX_YYY.py

Python2(3.x)

第二个Python实现大量使用类型注释并使用Arpeggio解析器库

# Recommended: do these steps in a Python virtual environment.
pip3 install Arpeggio==1.9.0
python3 stepX_YYY.py

RPython

你一定是rpython在您的路径上(随附pypy)

cd impls/rpython
make        # this takes a very long time
./stepX_YYY

R

MALL R实现需要R(r-base-core)来运行

cd impls/r
make libs  # to download and build rdyncall
Rscript stepX_YYY.r

球拍(5.3)

Mal的racket实现需要运行racket编译器/解释器

cd impls/racket
./stepX_YYY.rkt

雷克斯

Mal的Rexx实现已经使用Regina Rexx 3.6进行了测试

cd impls/rexx
make
rexx -a ./stepX_YYY.rexxpp

拼音(1.9+)

cd impls/ruby
ruby stepX_YYY.rb

生锈(1.38+)

Mal的Rust实现需要使用Rust编译器和构建工具(Cargo)来构建

cd impls/rust
cargo run --release --bin stepX_YYY

缩放比例

安装Scala和SBT(http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html):

cd impls/scala
sbt 'run-main stepX_YYY'
    # OR
sbt compile
scala -classpath target/scala*/classes stepX_YYY

方案(R7RS)

MAL的方案实施已在赤壁-方案0.7.3、卡瓦2.4、高车0.9.5、鸡肉4.11.0、人马座0.8.3、气旋0.6.3(Git版本)和Foment 0.4(Git版本)上进行了测试。在弄清库是如何加载的并调整了R7RS实现的基础上,您应该能够让它在其他符合R7RS标准的实现上运行Makefilerun相应地编写脚本

cd impls/scheme
make symlinks
# chibi
scheme_MODE=chibi ./run
# kawa
make kawa
scheme_MODE=kawa ./run
# gauche
scheme_MODE=gauche ./run
# chicken
make chicken
scheme_MODE=chicken ./run
# sagittarius
scheme_MODE=sagittarius ./run
# cyclone
make cyclone
scheme_MODE=cyclone ./run
# foment
scheme_MODE=foment ./run

歪斜

MAL的不对称实现已经使用不对称0.7.42进行了测试

cd impls/skew
make
node stepX_YYY.js

标准ML(Poly/ML、MLton、莫斯科ML)

Mal的标准ML实现需要一个SML97实施。Makefile支持POLY/ML、MLTON、MOVICO ML,并已在POLY/ML 5.8.1、MLTON 20210117和MOSSIONS ML版本2.10上进行了测试

cd impls/sml
# Poly/ML
make sml_MODE=polyml
./stepX_YYY
# MLton
make sml_MODE=mlton
./stepX_YYY
# Moscow ML
make sml_MODE=mosml
./stepX_YYY

斯威夫特

MALL的SWIFT实施需要SWIFT 2.0编译器(XCode 7.0)来构建。由于语言和标准库中的更改,旧版本将无法运行

cd impls/swift
make
./stepX_YYY

斯威夫特3

MALL的SWIFT 3实施需要SWIFT 3.0编译器。它已经在SWIFT 3预览版3上进行了测试

cd impls/swift3
make
./stepX_YYY

斯威夫特4

MALL的SWIFT 4实施需要SWIFT 4.0编译器。它已在SWIFT 4.2.3版本中进行了测试

cd impls/swift4
make
./stepX_YYY

SWIFT 5

MALL的SWIFT 5实施需要SWIFT 5.0编译器。它已在SWIFT 5.1.1版本中进行了测试

cd impls/swift5
swift run stepX_YYY

TCL 8.6

Mal的Tcl实现需要运行Tcl 8.6。要获得readline行编辑支持,请安装tclreadline

cd impls/tcl
tclsh ./stepX_YYY.tcl

打字稿

mal的TypeScript实现需要TypeScript 2.2编译器。它已经在Node.js V6上进行了测试

cd impls/ts
make
node ./stepX_YYY.js

瓦拉

MALL的VALA实现已经用VALA0.40.8编译器进行了测试。您将需要安装valaclibreadline-dev或同等的

cd impls/vala
make
./stepX_YYY

VHDL

用GHDL0.29对mal的vhdl实现进行了测试。

cd impls/vhdl
make
./run_vhdl.sh ./stepX_YYY

Vimscript

Mal的Vimscript实现需要运行Vim 8.0

cd impls/vimscript
./run_vimscript.sh ./stepX_YYY.vim

Visual Basic.NET

Mal的VB.NET实现已经在Linux上使用Mono VB编译器(Vbnc)和Mono运行时(2.10.8.1版)进行了测试。构建和运行VB.NET实现需要两者

cd impls/vb
make
mono ./stepX_YYY.exe

WebAssembly(Wasm)

WebAssembly实现是用Wam(WebAssembly宏语言),并在几种不同的非Web嵌入(运行时)下运行:nodewasmtimewasmerlucetwaxwacewarpy

cd impls/wasm
# node
make wasm_MODE=node
./run.js ./stepX_YYY.wasm
# wasmtime
make wasm_MODE=wasmtime
wasmtime --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# wasmer
make wasm_MODE=wasmer
wasmer run --dir=./ --dir=../ --dir=/ ./stepX_YYY.wasm
# lucet
make wasm_MODE=lucet
lucet-wasi --dir=./:./ --dir=../:../ --dir=/:/ ./stepX_YYY.so
# wax
make wasm_MODE=wax
wax ./stepX_YYY.wasm
# wace
make wasm_MODE=wace_libc
wace ./stepX_YYY.wasm
# warpy
make wasm_MODE=warpy
warpy --argv --memory-pages 256 ./stepX_YYY.wasm

XSLT

mal的XSLT实现是用XSLT3编写的,并在Saxon 9.9.1.6家庭版上进行了测试

cd impls/xslt
STEP=stepX_YY ./run

雷恩

MAL的WREN实现在WREN 0.2.0上进行了测试

cd impls/wren
wren ./stepX_YYY.wren

约里克

MAL的Yorick实现在Yorick 2.2.04上进行了测试

cd impls/yorick
yorick -batch ./stepX_YYY.i

之字形

MAL的Zig实现在Zig0.5上进行了测试

cd impls/zig
zig build stepX_YYY

运行测试

顶层Makefile有许多有用的目标来协助实现、开发和测试。这个helpTarget提供目标和选项的列表:

make help

功能测试

中几乎有800个通用功能测试(针对所有实现)。tests/目录。每个步骤都有相应的测试文件,其中包含特定于该步骤的测试。这个runtest.py测试工具启动MAL步骤实现,然后将测试一次一个提供给实现,并将输出/返回值与预期的输出/返回值进行比较

  • 要在所有实现中运行所有测试(请准备等待):

make test
  • 要针对单个实施运行所有测试,请执行以下操作:

make "test^IMPL"

# e.g.
make "test^clojure"
make "test^js"
  • 要对所有实施运行单个步骤的测试,请执行以下操作:

make "test^stepX"

# e.g.
make "test^step2"
make "test^step7"
  • 要针对单个实施运行特定步骤的测试,请执行以下操作:

make "test^IMPL^stepX"

# e.g
make "test^ruby^step3"
make "test^ps^step4"

自托管功能测试

  • 若要在自托管模式下运行功能测试,请指定mal作为测试实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):

make MAL_IMPL=IMPL "test^mal^step2"

# e.g.
make "test^mal^step2"   # js is default
make MAL_IMPL=ruby "test^mal^step2"
make MAL_IMPL=python "test^mal^step2"

启动REPL

  • 要在特定步骤中启动实施的REPL,请执行以下操作:

make "repl^IMPL^stepX"

# e.g
make "repl^ruby^step3"
make "repl^ps^step4"
  • 如果您省略了这一步,那么stepA使用的是:

make "repl^IMPL"

# e.g
make "repl^ruby"
make "repl^ps"
  • 若要启动自托管实现的REPL,请指定mal作为REPL实现,并使用MAL_IMPLMake Variable以更改基础主机语言(默认值为JavaScript):

make MAL_IMPL=IMPL "repl^mal^stepX"

# e.g.
make "repl^mal^step2"   # js is default
make MAL_IMPL=ruby "repl^mal^step2"
make MAL_IMPL=python "repl^mal"

性能测试

警告:这些性能测试在统计上既不有效,也不全面;运行时性能不是mal的主要目标。如果你从这些性能测试中得出任何严肃的结论,那么请联系我,了解堪萨斯州一些令人惊叹的海滨房产,我愿意以低价卖给你

  • 要针对单个实施运行性能测试,请执行以下操作:

make "perf^IMPL"

# e.g.
make "perf^js"
  • 要对所有实施运行性能测试,请执行以下操作:

make "perf"

正在生成语言统计信息

  • 要报告单个实施的行和字节统计信息,请执行以下操作:

make "stats^IMPL"

# e.g.
make "stats^js"

对接测试

每个实现目录都包含一个Dockerfile,用于创建包含该实现的所有依赖项的docker映像。此外,顶级Makefile还支持在停靠器容器中通过传递以下参数来运行测试目标(以及perf、stats、repl等“DOCKERIZE=1”在make命令行上。例如:

make DOCKERIZE=1 "test^js^step3"

现有实现已经构建了坞站映像,并将其推送到坞站注册表。但是,如果您希望在本地构建或重建坞站映像,TopLevel Makefile提供了构建坞站映像的规则:

make "docker-build^IMPL"

注意事项

  • Docker镜像被命名为“Kanaka/mal-test-iml”
  • 基于JVM的语言实现(Groovy、Java、Clojure、Scala):您可能需要首先手动运行此命令一次make DOCKERIZE=1 "repl^IMPL"然后才能运行测试,因为需要下载运行时依赖项以避免测试超时。这些依赖项被下载到/mal目录中的点文件中,因此它们将在两次运行之间保持不变

许可证

MAL(make-a-lisp)是根据MPL 2.0(Mozilla Public License 2.0)许可的。有关更多详细信息,请参阅LICENSE.txt

为什么在C ++中从stdin读取行比Python慢​​得多?

问题:为什么在C ++中从stdin读取行比Python慢​​得多?

我想比较使用Python和C ++从stdin读取的字符串输入的行数,并且震惊地看到我的C ++代码运行速度比等效的Python代码慢一个数量级。由于我的C ++生锈,而且我还不是专家Pythonista,因此请告诉我我做错了什么还是误解了什么。


(TLDR回答:包括以下声明:cin.sync_with_stdio(false)或仅使用fgets代替。

TLDR结果:一直滚动到我的问题的底部,然后查看表格。)


C ++代码:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

等同于Python:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

这是我的结果:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

我应该注意,我在Mac OS X v10.6.8(Snow Leopard)和Linux 2.6.32(Red Hat Linux 6.2)下都尝试过。前者是MacBook Pro,后者是非常强大的服务器,并不是说这太相关了。

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

微小的基准附录和总结

为了完整起见,我认为我将使用原始(已同步)C ++代码更新同一框上同一文件的读取速度。同样,这是针对快速磁盘上的100M行文件。这是比较,有几种解决方案/方法:

Implementation      Lines per second
python (default)           3,571,428
cin (default/naive)          819,672
cin (no sync)             12,500,000
fgets                     14,285,714
wc (not fair comparison)  54,644,808

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I’m not yet an expert Pythonista, please tell me if I’m doing something wrong or if I’m misunderstanding something.


(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

Tiny benchmark addendum and recap

For completeness, I thought I’d update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here’s the comparison, with several solutions/approaches:

Implementation      Lines per second
python (default)           3,571,428
cin (default/naive)          819,672
cin (no sync)             12,500,000
fgets                     14,285,714
wc (not fair comparison)  54,644,808

回答 0

默认情况下,cin与stdio同步,这将使其避免任何输入缓冲。如果将其添加到主目录的顶部,应该会看到更好的性能:

std::ios_base::sync_with_stdio(false);

通常,当缓冲输入流时,而不是一次读取一个字符,而是以更大的块读取该流。这减少了系统调用的数量,这些调用通常比较昂贵。但是,由于FILE*基于stdioiostreams通常具有单独的实现,因此也具有单独的缓冲区,如果将两者一起使用,则可能会导致问题。例如:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

如果读取的输入cin多于实际需要的输入,则该函数将无法使用第二个整数值,该scanf函数具有自己的独立缓冲区。这将导致意外的结果。

为避免这种情况,默认情况下,流与同步stdio。实现此目的的一种常用方法是cin使用stdio函数一次读取每个字符。不幸的是,这带来了很多开销。对于少量输入来说,这不是一个大问题,但是当您读取数百万行时,性能损失将是巨大的。

幸运的是,库设计人员决定,如果您知道自己在做什么,则还应该能够禁用此功能以提高性能,因此他们提供了该sync_with_stdio方法。

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

If more input was read by cin than it actually needed, then the second integer value wouldn’t be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn’t a big problem, but when you are reading millions of lines, the performance penalty is significant.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method.


回答 1

出于好奇,我了解了幕后情况,并且在每次测试中都使用了dtruss / strace

C ++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

系统调用 sudo dtruss -c ./a.out < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

Python

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

系统调用 sudo dtruss -c ./a.py < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

Just out of curiosity I’ve taken a look at what happens under the hood, and I’ve used dtruss/strace on each test.

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

syscalls sudo dtruss -c ./a.out < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

Python

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

syscalls sudo dtruss -c ./a.py < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

回答 2

我在这里落后了几年,但是:

在原始帖子的“编辑4/5/6”中,您正在使用以下结构:

$ /usr/bin/time cat big_file | program_to_benchmark

这有两种不同的错误方式:

  1. 您实际上是在定时执行cat,而不是基准测试。显示的“用户”和“系统” CPU使用率time是的cat,而不是基准测试程序。更糟糕的是,“实时”时间也不一定准确。根据cat本地操作系统中和管道的实现,有可能cat在读取器进程完成其工作之前写入最终的巨型缓冲区并退出。

  2. 使用cat是不必要的,实际上会适得其反;您正在添加活动部件。如果您使用的是足够老的系统(例如,具有单个CPU,并且-在某些代计算机中-I / O比CPU快),则仅cat运行一个事实就可以使结果显色。您还必须遵守输入和输出缓冲以及其他处理的所有cat要求。(如果我是Randal Schwartz,这可能会为您赢得“猫的无用使用”奖。

更好的构造是:

$ /usr/bin/time program_to_benchmark < big_file

在此语句中,外壳程序将打开big_file,并将其作为已打开的文件描述符传递给您的程序(time然后,实际上将其作为子进程执行到该程序)。所读取文件的100%严格是您要进行基准测试的程序的责任。这使您可以真正了解其性能,而不会产生虚假的并发症。

我会提到两个可能但实际上是错误的“修复程序”,这些也可以考虑(但我对它们进行了“不同”的编号,因为这些并不是原始帖子中出现的错误):

答:您可以通过仅定时执行程序来“修复”此问题:

$ cat big_file | /usr/bin/time program_to_benchmark

B.或通过计时整个管道:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

由于与#2相同的原因,它们是错误的:它们仍在cat不必要地使用。我提到它们的原因有几个:

  • 对于对POSIX shell的I / O重定向功能不完全满意的人来说,它们更“自然”

  • 可能存在的情况cat 需要(例如:要读取的文件需要某种特权来访问,并且不希望授予该特权的程序进行基准测试:sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output

  • 实际上,在现代机器上,cat管道中添加的内容可能没有任何实际意义。

但是我有些犹豫地说那最后一件事。如果我们检查“ Edit 5”中的最后一个结果-

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU ...

-这声称cat在测试期间消耗了74%的CPU; 而实际上1.34 / 1.83约为74%。也许运行:

$ /usr/bin/time wc -l < temp_big_file

只会花剩下的0.49秒!可能不需要:cat这里必须支付read()从文件“磁盘”(实际上是缓冲区高速缓存)传输文件的系统调用(或等效调用),以及为将文件传递到的管道写操作wc。正确的测试仍然必须进行这些read()调用。只有写到管道和读到管道调用将被保存,并且这些调用应该非常便宜。

尽管如此,我预计您将能够测量出两者之间的差异cat file | wc -lwc -l < file并找到明显的差异(两位数百分比)。每个较慢的测试在绝对时间内都会付出类似的代价。但是,这只占其总时间的一小部分。

实际上,我在Linux 3.13(Ubuntu 14.04)系统上对1.5 GB的垃圾文件进行了一些快速测试,获得了这些结果(这些结果实际上是“最好的3个”结果;当然,在启动缓存之后):

$ time wc -l < /tmp/junk
real 0.280s user 0.156s sys 0.124s (total cpu 0.280s)
$ time cat /tmp/junk | wc -l
real 0.407s user 0.157s sys 0.618s (total cpu 0.775s)
$ time sh -c 'cat /tmp/junk | wc -l'
real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

请注意,这两个管道结果声称比实际的挂钟时间花费了更多的CPU时间(user + sys)。这是因为我正在使用shell(bash)的内置“ time”命令,该命令可以识别管道。我在多核计算机上,流水线中的各个进程可以使用各个核,因此,CPU时间的累积要比实时更快。通过使用,/usr/bin/time我看到的CPU时间比实时时间要短-表明它只能计时单个管道元素在其命令行上传递给它的时间。而且,shell的输出/usr/bin/time仅提供毫秒,而仅提供百分之一秒。

因此,在的效率水平上wc -l,将cat产生巨大的差异:409/283 = 1.453或45.3%多的实时,和775/280 = 2.768,或多177%的CPU使用!在我的随机情况下,它是同时存在的测试箱。

我要补充一点,这些测试样式之间至少存在另一个显着差异,我不能说这是好处还是错误;您必须自己决定:

运行时cat big_file | /usr/bin/time my_program,您的程序正在以正好由发送的速度从管道接收输入cat,并且块的大小不得大于编写的速度cat

运行时/usr/bin/time my_program < big_file,程序会收到一个指向实际文件的打开文件描述符。当您的程序(在许多情况下,该语言是使用其编写的语言的I / O库)在提供引用常规文件的文件描述符时可能会采取不同的操作。它可能用于mmap(2)将输入文件映射到其地址空间,而不是使用显式的read(2)系统调用。与运行cat二进制文件的少量费用相比,这些差异可能会对基准测试结果产生更大的影响。

当然,如果同一程序在两种情况下的执行情况显着不同,这将是一个有趣的基准结果。它确实表明该程序或其I / O库正在做一些有趣的事情,例如使用mmap()。因此,在实践中最好同时使用两种基准。也许将cat结果小幅折算以“原谅”其运行成本cat

I’m a few years behind here, but:

In ‘Edit 4/5/6’ of the original post, you are using the construction:

$ /usr/bin/time cat big_file | program_to_benchmark

This is wrong in a couple of different ways:

  1. You’re actually timing the execution of `cat`, not your benchmark. The ‘user’ and ‘sys’ CPU usage displayed by `time` are those of `cat`, not your benchmarked program. Even worse, the ‘real’ time is also not necessarily accurate. Depending on the implementation of `cat` and of pipelines in your local OS, it is possible that `cat` writes a final giant buffer and exits long before the reader process finishes its work.

  2. Use of `cat` is unnecessary and in fact counterproductive; you’re adding moving parts. If you were on a sufficiently old system (i.e. with a single CPU and — in certain generations of computers — I/O faster than CPU) — the mere fact that `cat` was running could substantially color the results. You are also subject to whatever input and output buffering and other processing `cat` may do. (This would likely earn you a ‘Useless Use Of Cat’ award if I were Randal Schwartz.

A better construction would be:

$ /usr/bin/time program_to_benchmark < big_file

In this statement it is the shell which opens big_file, passing it to your program (well, actually to `time` which then executes your program as a subprocess) as an already-open file descriptor. 100% of the file reading is strictly the responsibility of the program you’re trying to benchmark. This gets you a real reading of its performance without spurious complications.

I will mention two possible, but actually wrong, ‘fixes’ which could also be considered (but I ‘number’ them differently as these are not things which were wrong in the original post):

A. You could ‘fix’ this by timing only your program:

$ cat big_file | /usr/bin/time program_to_benchmark

B. or by timing the entire pipeline:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

These are wrong for the same reasons as #2: they’re still using `cat` unnecessarily. I mention them for a few reasons:

  • they’re more ‘natural’ for people who aren’t entirely comfortable with the I/O redirection facilities of the POSIX shell

  • there may be cases where `cat` is needed (e.g.: the file to be read requires some sort of privilege to access, and you do not want to grant that privilege to the program to be benchmarked: `sudo cat /dev/sda | /usr/bin/time my_compression_test –no-output`)

  • in practice, on modern machines, the added `cat` in the pipeline is probably of no real consequence

But I say that last thing with some hesitation. If we examine the last result in ‘Edit 5’ —

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU ...

— this claims that `cat` consumed 74% of the CPU during the test; and indeed 1.34/1.83 is approximately 74%. Perhaps a run of:

$ /usr/bin/time wc -l < temp_big_file

would have taken only the remaining .49 seconds! Probably not: `cat` here had to pay for the read() system calls (or equivalent) which transferred the file from ‘disk’ (actually buffer cache), as well as the pipe writes to deliver them to `wc`. The correct test would still have had to do those read() calls; only the write-to-pipe and read-from-pipe calls would have been saved, and those should be pretty cheap.

Still, I predict you would be able to measure the difference between `cat file | wc -l` and `wc -l < file` and find a noticeable (2-digit percentage) difference. Each of the slower tests will have paid a similar penalty in absolute time; which would however amount to a smaller fraction of its larger total time.

In fact I did some quick tests with a 1.5 gigabyte file of garbage, on a Linux 3.13 (Ubuntu 14.04) system, obtaining these results (these are actually ‘best of 3’ results; after priming the cache, of course):

$ time wc -l < /tmp/junk
real 0.280s user 0.156s sys 0.124s (total cpu 0.280s)
$ time cat /tmp/junk | wc -l
real 0.407s user 0.157s sys 0.618s (total cpu 0.775s)
$ time sh -c 'cat /tmp/junk | wc -l'
real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

Notice that the two pipeline results claim to have taken more CPU time (user+sys) than real wall-clock time. This is because I’m using the shell (bash)’s built-in ‘time’ command, which is cognizant of the pipeline; and I’m on a multi-core machine where separate processes in a pipeline can use separate cores, accumulating CPU time faster than realtime. Using /usr/bin/time I see smaller CPU time than realtime — showing that it can only time the single pipeline element passed to it on its command line. Also, the shell’s output gives milliseconds while /usr/bin/time only gives hundredths of a second.

So at the efficiency level of `wc -l`, the `cat` makes a huge difference: 409 / 283 = 1.453 or 45.3% more realtime, and 775 / 280 = 2.768, or a whopping 177% more CPU used! On my random it-was-there-at-the-time test box.

I should add that there is at least one other significant difference between these styles of testing, and I can’t say whether it is a benefit or fault; you have to decide this yourself:

When you run `cat big_file | /usr/bin/time my_program`, your program is receiving input from a pipe, at precisely the pace sent by `cat`, and in chunks no larger than written by `cat`.

When you run `/usr/bin/time my_program < big_file`, your program receives an open file descriptor to the actual file. Your program — or in many cases the I/O libraries of the language in which it was written — may take different actions when presented with a file descriptor referencing a regular file. It may use mmap(2) to map the input file into its address space, instead of using explicit read(2) system calls. These differences could have a far larger effect on your benchmark results than the small cost of running the `cat` binary.

Of course it is an interesting benchmark result if the same program performs significantly differently between the two cases. It shows that, indeed, the program or its I/O libraries are doing something interesting, like using mmap(). So in practice it might be good to run the benchmarks both ways; perhaps discounting the `cat` result by some small factor to “forgive” the cost of running `cat` itself.


回答 3

我在Mac上使用g ++在计算机上重现了原始结果。

while循环之前将以下语句添加到C ++版本,使其与Python版本内联:

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_stdio将速度提高到2秒,并且设置更大的缓冲区将其降低到1秒。

I reproduced the original result on my computer using g++ on a Mac.

Adding the following statements to the C++ version just before the while loop brings it inline with the Python version:

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_stdio improved speed to 2 seconds, and setting a larger buffer brought it down to 1 second.


回答 4

getlinescanf如果您不关心文件加载时间或正在加载小型文本文件,则流操作符可以很方便。但是,如果性能是您所关心的,那么您实际上应该只是将整个文件缓冲到内存中(假设它将适合)。

这是一个例子:

//open file in binary mode
std::fstream file( filename, std::ios::in|::std::ios::binary );
if( !file ) return NULL;

//read the size...
file.seekg(0, std::ios::end);
size_t length = (size_t)file.tellg();
file.seekg(0, std::ios::beg);

//read into memory buffer, then close it.
char *filebuf = new char[length+1];
file.read(filebuf, length);
filebuf[length] = '\0'; //make it null-terminated
file.close();

如果需要,可以将流包装在该缓冲区周围,以便更方便地进行访问,如下所示:

std::istrstream header(&filebuf[0], length);

另外,如果您控制文件,请考虑使用平面二进制数据格式而不是文本。读写更加可靠,因为您不必处理所有空白。它也更小且解析速度更快。

getline, stream operators, scanf, can be convenient if you don’t care about file loading time or if you are loading small text files. But, if the performance is something you care about, you should really just buffer the entire file into memory (assuming it will fit).

Here’s an example:

//open file in binary mode
std::fstream file( filename, std::ios::in|::std::ios::binary );
if( !file ) return NULL;

//read the size...
file.seekg(0, std::ios::end);
size_t length = (size_t)file.tellg();
file.seekg(0, std::ios::beg);

//read into memory buffer, then close it.
char *filebuf = new char[length+1];
file.read(filebuf, length);
filebuf[length] = '\0'; //make it null-terminated
file.close();

If you want, you can wrap a stream around that buffer for more convenient access like this:

std::istrstream header(&filebuf[0], length);

Also, if you are in control of the file, consider using a flat binary data format instead of text. It’s more reliable to read and write because you don’t have to deal with all the ambiguities of whitespace. It’s also smaller and much faster to parse.


回答 5

对于我来说,以下代码比到目前为止发布的其他代码更快:(Visual Studio 2013,64位,500 MB文件,行长统一为[0,1000))。

const int buffer_size = 500 * 1024;  // Too large/small buffer is not good.
std::vector<char> buffer(buffer_size);
int size;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

它比我的所有Python尝试都要多2倍。

The following code was faster for me than the other code posted here so far: (Visual Studio 2013, 64-bit, 500 MB file with line length uniformly in [0, 1000)).

const int buffer_size = 500 * 1024;  // Too large/small buffer is not good.
std::vector<char> buffer(buffer_size);
int size;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

It beats all my Python attempts by more than a factor 2.


回答 6

顺便说一句,C ++版本的行数比Python版本的行数大1个原因是,仅当尝试读取超出eof的值时,才会设置eof标志。因此正确的循环将是:

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};

By the way, the reason the line count for the C++ version is one greater than the count for the Python version is that the eof flag only gets set when an attempt is made to read beyond eof. So the correct loop would be:

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};

回答 7

在您的第二个示例(带有scanf())的情况下,这样做仍然较慢的原因可能是因为scanf(“%s”)解析了字符串并查找了任何空格字符(空格,制表符,换行符)。

同样,是的,CPython进行了一些缓存以避免硬盘读取。

In your second example (with scanf()) reason why this is still slower might be because scanf(“%s”) parses string and looks for any space char (space, tab, newline).

Also, yes, CPython does some caching to avoid harddisk reads.


回答 8

答案的第一要素:<iostream>缓慢。该死的慢。scanf如下所示,我获得了巨大的性能提升,但是它仍然比Python慢​​两倍。

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}

A first element of an answer: <iostream> is slow. Damn slow. I get a huge performance boost with scanf as in the below, but it is still two times slower than Python.

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}

回答 9

好吧,我看到在您的第二个解决方案中,您从切换cinscanf,这是我要向您提出的第一个建议(cin是sloooooooooooow)。现在,如果您从切换scanffgets,则会看到性能的另一提升:fgets是用于字符串输入的最快的C ++函数。

顺便说一句,不知道同步的事情,很好。但是您仍然应该尝试fgets

Well, I see that in your second solution you switched from cin to scanf, which was the first suggestion I was going to make you (cin is sloooooooooooow). Now, if you switch from scanf to fgets, you would see another boost in performance: fgets is the fastest C++ function for string input.

BTW, didn’t know about that sync thing, nice. But you should still try fgets.


Borg-使用压缩和身份验证加密对服务器执行数据消除

BorgBackup Basic Usage

更多截屏视频:installationadvanced usage

什么是BorgBackup?

BorgBackup(简称:Borg)是一个重复数据消除备份程序。或者,它支持压缩和经过身份验证的加密

Borg的主要目标是提供一种高效、安全的数据备份方式。由于只存储更改,因此使用的重复数据消除技术使Borg适用于日常备份。经过身份验证的加密技术使其适用于备份到不完全受信任的目标

请参阅installation manual或者,如果您已经下载了Borg,docs/installation.rst开始学习博格。还有一个offline documentation可用,有多种格式

主要特点

节省空间的存储
基于内容定义的区块的重复数据消除用于减少存储的字节数:每个文件被拆分成多个可变长度的区块,并且只有以前从未见过的区块才会添加到存储库中

如果一个块的id_hash值相同,则认为它是重复的。密码强散列或MAC函数用作id_hash,例如(HMAC-)sha256

要执行重复数据消除,将考虑同一存储库中的所有区块,无论它们来自不同的计算机、来自以前的备份、来自相同的备份,甚至来自相同的单个文件

与其他重复数据消除方法相比,此方法不依赖于:

  • 文件/目录名称保持不变:因此,即使在共享回收站的计算机之间,您也可以在不影响重复数据删除的情况下四处移动数据
  • 完整的文件或时间戳保持不变:如果大文件稍有更改,则只需存储几个新的区块-这对虚拟机或原始磁盘非常有用
  • 数据块在文件内的绝对位置:填充可能会发生移位,但重复数据消除算法仍会找到该位置
速度
  • 性能关键型代码(分块、压缩、加密)是用C/Cython实现的
  • 文件/块索引数据的本地缓存
  • 快速检测未修改的文件
数据加密
所有数据均可使用256位AES加密进行保护,数据完整性和真实性使用HMAC-SHA256进行验证。数据是加密的客户端
模糊处理
可选地,Borg可以主动地模糊例如文件/块的大小,以使指纹攻击更加困难
压缩
可以选择压缩所有数据:

  • LZ4(超高速、低压缩)
  • zstd(从高速低压缩到高压缩低速的大范围)
  • zlib(中速和压缩)
  • LZMA(低速、高压缩)
异地备份
Borg可以将数据存储在可通过SSH访问的任何远程主机上。如果在远程主机上安装了Borg,与使用网络文件系统(sshfs、nfs、.)相比,可以获得很大的性能提升。
可装载为文件系统的备份
备份归档可作为用户空间文件系统挂载,以便轻松进行交互式备份检查和恢复(例如,使用常规文件管理器)
在多个平台上轻松安装
我们提供不需要安装任何内容的单文件二进制文件-您只需在以下平台上运行它们:

  • Linux操作系统
  • Mac OS X
  • FreeBSD
  • OpenBSD和NetBSD(尚不支持xattrs/ACL或二进制文件)
  • Cygwin(试验性的,目前还没有二进制文件)
  • Windows 10的Linux子系统(实验性)
自由开放源码软件
  • 安全性和功能可以独立审核
  • 在BSD(3条款)许可下获得许可,请参阅License获取完整的许可证

易于使用

初始化新的备份存储库(请参见borg init --help对于加密选项):

$ borg init -e repokey /path/to/repo

创建备份存档:

$ borg create /path/to/repo::Saturday1 ~/Documents

现在再做一次备份,只是为了炫耀一下伟大的重复数据消除功能:

$ borg create -v --stats /path/to/repo::Saturday2 ~/Documents
-----------------------------------------------------------------------------
Archive name: Saturday2
Archive fingerprint: 622b7c53c...
Time (start): Sat, 2016-02-27 14:48:13
Time (end):   Sat, 2016-02-27 14:48:14
Duration: 0.88 seconds
Number of files: 163
-----------------------------------------------------------------------------
               Original size      Compressed size    Deduplicated size
This archive:        6.85 MB              6.85 MB             30.79 kB  <-- !
All archives:       13.69 MB             13.71 MB              6.88 MB

               Unique chunks         Total chunks
Chunk index:             167                  330
-----------------------------------------------------------------------------

有关图形前端,请参阅我们的补充项目BorgWeb

帮助,捐款,施舍,成为赞助人

我们随时欢迎您的帮助!

传播信息、提供反馈、帮助编写文档、测试或开发

你也可以给这个项目提供资金支持,详情请看那里:

https://www.borgbackup.org/support/fund.html

链接

兼容性说明

预计当主要版本号更改时(如从0.x.y到1.0.0或从1.x.y到2.0.0),我们会反复破坏兼容性

未发布的开发版本具有未知的兼容性属性

这是正在开发的软件,您自己决定它是否适合您的需求

安全问题应报告给Security contact(或参阅docs/support.rst在源代码分发中)

DocumentationBuild Status (master)Test CoverageBest Practices ScoreBounty Source

Graal-GraalVM:在🚀任何地方更快地运行程序

GraalVM是一个通用虚拟机,用于运行以JavaScript、Python、Ruby、R、基于JVM的语言(如Java、Scala、Clojure、Kotlin)以及基于LLVM的语言(如C和C++)编写的应用程序

项目网站为https://www.graalvm.org描述如何get started,如何stay connected,以及如何contribute

存储库结构

GraalVM主源存储库包括下面列出的组件。每个组件的文档都包括该组件的开发人员说明

  • GraalVM SDK包含长期支持的GraalVM API
  • GraalVM compiler用JAVA编写,支持动态编译和静电编译,可以与JAVA HotSpot VM集成,也可以独立运行
  • Truffle用于创建GraalVM的语言和工具的语言实现框架
  • Tools包含一组使用规范框架实现的GraalVM语言的工具
  • Substrate VM允许在封闭假设下提前(AOT)将Java应用程序编译为可执行映像或共享对象的框架
  • Sulong是用于在GraalVM上运行LLVM位码的引擎
  • GraalWasm是一个用于在GraalVM上运行WebAssembly程序的引擎
  • TRegex是正则表达式的实现,它利用GraalVM高效编译自动机
  • VM包括构建模块化GraalVM映像的组件
  • VS Code提供对Visual Studio代码的扩展,以支持使用GraalVM开发多语言应用程序

获得支持

相关存储库

GraalVM允许使用Truffle和GraalVM编译器在使用GraalVM内核的相关存储库中开发和测试的以下语言运行。这些是:

许可证

每个GraalVM组件均获得许可:

Flatbuffers-FlatBuffers:内存效率高的串行化库

logoFlatbuffers

Flatbuffers是一个跨平台的序列化库,旨在实现最高的内存效率。它允许您直接访问序列化数据,而无需先对其进行解析/解包,同时仍具有很好的向前/向后兼容性

请访问我们的landing page浏览我们的文档

支持的操作系统

  • Windows
  • MacOS X
  • Linux操作系统
  • 安卓系统
  • 以及使用最新的C++编译器的任何其他版本

支持的编程语言

  • C++
  • C#
  • C
  • GO
  • Java语言
  • JavaScript
  • PHP
  • python
  • Rust

还有更多的正在进行中

贡献

为这个项目做贡献,看见CONTRIBUTING

安全性

请参阅我们的Security Policy用于报告漏洞

许可

平缓冲器是按照Apache许可证2.0版进行许可的。看见LICENSE有关完整的许可证文本,请参阅