问题:干净,轻巧的替代Python的替代品吗?[关闭]

一个(很久以前),我写了一个网络蜘蛛,对它进行了多线程处理,以使并发请求能够同时发生。那是我的Python青年时代,在我了解GIL及其为多线程代码造成的相关麻烦之前(IE,大多数情况下,这些东西最终都被序列化了!)…

我想对这段代码进行重做,以使其更健壮并性能更好。基本上有两种方法可以执行此操作:我可以使用2.6+中的新多处理模块,也可以使用某种基于反应堆/事件的模型。我宁愿稍后再做,因为它更加简单且不易出错。

因此,问题与哪种框架最适合我的需求有关。以下是到目前为止我所知道的选项列表:

  • Twisted:Python反应器框架的祖父:看起来很复杂,但是有点a肿。陡峭的学习曲线,可完成一项小任务。
  • Eventlet:从在家伙lindenlab。基于Greenlet的框架,适用于此类任务。我看了一下代码,但看起来不是很漂亮:不符合pep8,散布着印刷品(为什么人们要在框架中这样做!?),API似乎有点不一致。
  • PyEv:不成熟,尽管它基于libevent,所以现在似乎还没有人在使用它,因此它有一个可靠的后端。
  • asyncore:来自stdlib:über低级,似乎涉及很多工作,只是为了使事情起步。
  • 龙卷风:尽管这是一种面向服务器的产品,旨在为动态网站提供服务器,但它确实具有异步HTTP客户端和简单的ioloop。看起来可以完成工作,但不能达到预期目的。[编辑:不幸的是,它不能在Windows上运行,这对我来说算是它了-这是我支持这个la脚平台的要求]

我有什么想念的吗?当然,必须有一个适合简化异步网络库的最佳选择的库!

[编辑:非常感谢intgr指向此页面。如果滚动到底部,您将看到一个非常不错的项目列表,旨在以一种或多种方式解决此任务。实际上,自Twisted诞生以来,事情确实已经发生了变化:人们现在似乎更喜欢基于协同例程的解决方案,而不是传统的面向反应器/回调的解决方案。这种方法的好处是更直接的代码:我过去确实发现过,特别是在使用boost.asio时。在C ++中,基于回调的代码可能导致难以遵循的设计,并且对于未经训练的人来说是相对模糊的。使用协同例程可使您编写看起来至少同步一些的代码。我想现在我的任务是找出我喜欢的众多库中的哪一个,并尝试一下!很高兴我现在问…]

[编辑:可能是关注或偶然发现此问题或在某种意义上关心此主题的任何人所感兴趣的:我发现了该工作可用工具的当前状态非常出色的文章]

A (long) while ago I wrote a web-spider that I multithreaded to enable concurrent requests to occur at the same time. That was in my Python youth, in the days before I knew about the GIL and the associated woes it creates for multithreaded code (IE, most of the time stuff just ends up serialized!)…

I’d like to rework this code to make it more robust and perform better. There are basically two ways I could do this: I could use the new multiprocessing module in 2.6+ or I could go for a reactor / event-based model of some sort. I would rather do the later since it’s far simpler and less error-prone.

So the question relates to what framework would be best suited to my needs. The following is a list of the options I know about so far:

  • Twisted: The granddaddy of Python reactor frameworks: seems complex and a bit bloated however. Steep learning curve for a small task.
  • Eventlet: From the guys at lindenlab. Greenlet based framework that’s geared towards these kinds of tasks. I had a look at the code though and it’s not too pretty: non-pep8 compliant, scattered with prints (why do people do this in a framework!?), API seems a little inconsistent.
  • PyEv: Immature, doesn’t seem to be anyone using it right now though it is based on libevent so it’s got a solid backend.
  • asyncore: From the stdlib: über low-level, seems like a lot of legwork involved just to get something off the ground.
  • tornado: Though this is a server oriented product designed to server dynamic websites it does feature an async HTTP client and a simple ioloop. Looks like it could get the job done but not what it was intended for. [edit: doesn’t run on Windows unfortunately, which counts it out for me – its a requirement for me to support this lame platform]

Is there anything I have missed at all? Surely there must be a library out there that fits the sweet-spot of a simplified async networking library!

[edit: big thanks to intgr for his pointer to this page. If you scroll to the bottom you will see there is a really nice list of projects that aim to tackle this task in one way or another. It seems actually that things have indeed moved on since the inception of Twisted: people now seem to favour a co-routine based solution rather than a traditional reactor / callback oriented one. The benefits of this approach are clearer more direct code: I’ve certainly found in the past, especially when working with boost.asio in C++ that callback based code can lead to designs that can be hard-to-follow and are relatively obscure to the untrained eye. Using co-routines allows you to write code that looks a little more synchronous at least. I guess now my task is to work out which one of these many libraries I like the look of and give it a go! Glad I asked now…]

[edit: perhaps of interest to anyone who followed or stumbled on this this question or cares about this topic in any sense: I found a really great writeup of the current state of the available tools for this job]


回答 0

我喜欢并发 Python模块,该模块依赖轻量级线程的Stackless Python微线程或Greenlets。所有阻塞网络I / O通过一个libevent循环透明地实现异步,因此它的效率应与真正的异步服务器差不多。

我想它在这种方式上类似于Eventlet。

缺点是其API与Python的sockets/ threading模块完全不同;您需要重写您的应用程序的一部分(或编写一个兼容性填充层)

编辑:似乎也有cogen,这是相似的,但是使用Python 2.5的增强型生成器为其协程而不是Greenlets。这使得它比并发和其他替代方法更可移植。网络I / O直接通过epoll / kqueue / iocp完成。

I liked the concurrence Python module which relies on either Stackless Python microthreads or Greenlets for light-weight threading. All blocking network I/O is transparently made asynchronous through a single libevent loop, so it should be nearly as efficient as an real asynchronous server.

I suppose it’s similar to Eventlet in this way.

The downside is that its API is quite different from Python’s sockets/threading modules; you need to rewrite a fair bit of your application (or write a compatibility shim layer)

Edit: It seems that there’s also cogen, which is similar, but uses Python 2.5’s enhanced generators for its coroutines, instead of Greenlets. This makes it more portable than concurrence and other alternatives. Network I/O is done directly with epoll/kqueue/iocp.


回答 1

扭曲是复杂的,您是正确的。扭曲肿。

如果您在此处查看:http : //twistedmatrix.com/trac/browser/trunk/twisted,您将找到一个组织良好,全面且经过良好测试的,包含许多 Internet协议的套件,以及编写的辅助代码并部署非常复杂的网络应用程序。我不会将膨胀与全面性混为一谈。

众所周知,Twisted文档乍一看并不是最用户友好的,并且我相信这会避免不幸的人们。但是如果您花时间的话,Twisted太棒了(IMHO)。我做到了,事实证明这是值得的,我建议其他人也可以尝试。

Twisted is complex, you’re right about that. Twisted is not bloated.

If you take a look here: http://twistedmatrix.com/trac/browser/trunk/twisted you’ll find an organized, comprehensive, and very well tested suite of many protocols of the internet, as well as helper code to write and deploy very sophisticated network applications. I wouldn’t confuse bloat with comprehensiveness.

It’s well known that the Twisted documentation isn’t the most user-friendly from first glance, and I believe this turns away an unfortunate number of people. But Twisted is amazing (IMHO) if you put in the time. I did and it proved to be worth it, and I’d recommend to others to try the same.


回答 2

gevent清除eventlet

在API方面,它遵循与标准库(尤其是线程和多处理模块)相同的约定(在这里有意义)。因此,您可以使用诸如QueueEvent之类的熟悉的东西。

它仅支持libevent从1.0开始更新: libev)作为反应堆实现,但充分利用了它的优点,它具有基于libevent-http的快速WSGI服务器,并通过libevent-dns解决DNS查询,而不是像其他大多数库一样使用线程池做。(更新:由于使用1.0 c-ares进行异步DNS查询;线程池也是一种选择。)

与eventlet一样,它通过使用greenlets使得不需要回调和Deferreds 。

查看示例:并发下载多个URL长时间轮询webchat

gevent is eventlet cleaned up.

API-wise it follows the same conventions as the standard library (in particular, threading and multiprocessing modules) where it makes sense. So you have familiar things like Queue and Event to work with.

It only supports libevent (update: libev since 1.0) as reactor implementation but takes full advantage of it, featuring a fast WSGI server based on libevent-http and resolving DNS queries through libevent-dns as opposed to using a thread pool like most other libraries do. (update: since 1.0 c-ares is used to make async DNS queries; threadpool is also an option.)

Like eventlet, it makes the callbacks and Deferreds unnecessary by using greenlets.

Check out the examples: concurrent download of multiple urls, long polling webchat.


回答 3

NicholasPiël在他的博客上对这些框架进行了非常有趣的比较:值得一读!

A really interesting comparison of such frameworks was compiled by Nicholas Piël on his blog: it’s well worth a read!


回答 4

这些解决方案都无法避免GIL阻止CPU并行的事实-它们只是获得线程已经具有的IO并行的更好方法。如果您认为可以做得更好的IO,则可以采取以下任何一种方法,但是如果瓶颈是处理结果,那么除了多处理模块之外,这里没有任何帮助。

None of these solutions will avoid that fact that the GIL prevents CPU parallelism – they are just better ways of getting IO parallelism that you already have with threads. If you think you can do better IO, by all means pursue one of these, but if your bottleneck is in processing the results nothing here will help except for the multiprocessing module.


回答 5

我不会说Twisted blo肿,但很难缠住你的头。我一直避免真正地学会学习,因为我一直希望对“小任务”更轻松一些。

但是,既然我已经使用了它,我不得不说所有的电池都非常好。

我使用过的所有其他异步库最终都没有看起来那么成熟。Twisted的事件循环很稳定。

我不太确定如何解决陡峭的Twisted学习曲线。如果有人将其分叉并清理一些东西,例如删除所有向后兼容的废纸and和无效项目,那可能会有所帮助。但这就是成熟软件的本质。

I wouldn’t go as far as to call Twisted bloated, but it is difficult to wrap your head around. I avoided really settling in an learn for quite a while as I always wanted something a little easier for ‘small tasks’.

However, now that I have worked with it some more I have to say having all the batteries included is VERY nice.

All the other async libraries I’ve worked with end being way less mature than they even appear. Twisted’s event loop is solid.

I’m not quite sure how to solve the steep Twisted learning curve. It might help if someone would fork it and clean a few things up, like removing all the backwards compatability cruft and the dead projects. But that’s the nature of mature software I guess.


回答 6

尚未提及Kamaelia。它的并发模型基于将组件连接在一起,并在收件箱和发件箱之间传递消息。是一个简短的概述。

Kamaelia hasn’t been mentioned yet. Its concurrency model is based on wiring together components with message passing between inboxes and outboxes. Here‘s a brief overview.


回答 7

我开始在某些事情上使用扭曲。它的美丽几乎是因为它“ blo肿”。那里有几乎所有主要协议的连接器。您可以拥有一个jabber机器人,该机器人将接收命令并将其发布到irc服务器,将其通过电子邮件发送给某人,运行命令,从NNTP服务器读取以及监视网页中的更改。坏消息是它可以完成所有这些操作,并且会使诸如OP所述的简单任务变得过于复杂。python的优点是您只包含需要的内容。因此,尽管下载量可能是20mb,但您可能只包含2mb的库(仍然很多)。我最大的困惑是,尽管它们包含示例,但您只能依靠基本的tcp服务器。

虽然不是python解决方案,但最近我已经看到node.js获得了更多的吸引力。实际上,我已经考虑过将其用于较小的项目,但是当我听到javascript时我只是畏缩:)

I’ve started to use twisted for some things. The beauty of it almost is because it’s “bloated.” There are connectors for just about any of the main protocols out there. You can have a jabber bot that will take commands and post to an irc server, email them to someone, run a command, read from an NNTP server, and monitor a web page for changes. The bad news is it can do all of that and can make things overly complex for simple tasks like the OP explained. The advantage of python though is you only include what you need. So while the download may be 20mb, you may only include 2mb of libraries (which is still a lot). My biggest complaint with twisted is although they include examples, anything beyond a basic tcp server you’re on your own.

While not a python solution, I’ve seen node.js gain a lot more traction as of late. In fact I’ve considered looking into it for smaller projects but I just cringe when I hear javascript :)


回答 8

关于这一主题的一本好书是:Abe Fettig撰写的“ Twisted Network Programming Essentials”。这些示例说明了如何编写非常Pythonic的代码,对我个人而言,不要以strike肿的框架为基础。看书中的解决方案,如果它们不是干净的,那么我不知道干净意味着什么。

我唯一的困惑与其他框架(如Ruby)相同。我担心,它会扩大规模吗?我不愿意将客户端委托给将存在可伸缩性问题的框架。

There is a good book on the subject: “Twisted Network Programming Essentials”, by Abe Fettig. The examples show how to write very Pythonic code, and to me personally, do not strike me as based on a bloated framework. Look at the solutions in the book, if they aren’t clean, then I don’t know what clean means.

My only enigma is the same I have with other frameworks, like Ruby. I worry, does it scale up? I would hate to commit a client to a framework that is going to have scalability problems.


回答 9

Whizzer是一个使用pyev的微型异步套接字框架。它的速度非常快,主要是因为pyev。它试图提供类似的界面,但略有改动。

Whizzer is a tiny asynchronous socket framework that uses pyev. Its very fast, primarily because of pyev. It attempts to provide a similiar interface as twisted with some slight changes.


回答 10

也可以尝试Syncless。它基于协程(因此类似于Concurrence,Eventlet和gevent)。它实现了socket.socket,socket.gethostbyname(等),ssl.SSLSocket,time.sleep和select.select的插入式非阻塞替换。它很快。它需要Stackless Python和libevent。它包含一个用C编写的强制性Python扩展(Pyrex / Cython)。

Also try Syncless. It’s coroutine-based (so it’s similar to Concurrence, Eventlet and gevent). It implements drop-in non-blocking replacements for socket.socket, socket.gethostbyname (etc.), ssl.SSLSocket, time.sleep and select.select. It’s fast. It needs Stackless Python and libevent. It contains a mandatory Python extension written in C (Pyrex/Cython).


回答 11

我确认不同步的好处。它可以使用libev(libevent的更新,更干净,性能更好的版本)。有时它没有libevent所提供的支持,但是现在开发过程更进一步,非常有用。

I Confirm the goodness of syncless. It can use libev (the newer, cleaner and better performance version of libevent). A while ago it didn’t have as much support as libevent, but now the development process is more advanced and syncless very useful.


回答 12

如果您只想要一个简化的,轻量级的HTTP请求库,那么我觉得Unirest真的很好

If you just want a Simplified, lightweight HTTP Request Library then I find Unirest really good


回答 13

欢迎您来看看PyWorks,它采用了完全不同的方法。它使对象实例在其自己的线程中运行,并对该对象进行异步函数调用。

只需让一个类从Task继承而不是从Object继承,它就异步了,所有方法调用都是Proxies。返回值(如果需要)是将来的代理。

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

可以在http://bitbucket.org/raindog/pyworks上找到PyWorks。

You are welcome to have a look at PyWorks, which takes a quite different approach. It lets object instances run in their own thread and makes function call’s to that object async.

Just let a class inherit from Task instead of object and it is async, all methods calls are Proxies. Return values (if you need them) are Future proxies.

res = obj.method( args )
# code continues here without waiting for method to finish
do_something_else( )
print "Result = %d" % res # Code will block here, if res not calculated yet

PyWorks can be found on http://bitbucket.org/raindog/pyworks


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。