python教程—如何在Python脚本中运行Scrapy-Python实用宝典

python教程—如何在Python脚本中运行Scrapy

我是一个新手,我正在寻找从Python脚本运行它的方法。我找到了两个来源来解释这一点:

我是一个新手,我正在寻找从Python脚本运行它的方法。我找到了两个来源来解释这一点:

http://tryolabs.com/Blog/2011/09/27/calling-scrapy-python-script/ < a href = " http://tryolabs.com/Blog/2011/09/27/calling-scrapy-python-script/ " > < / >

http://snipplr.com/view/67006/using-scrapy-from-a-script/ < a href = " http://snipplr.com/view/67006/using-scrapy-from-a-script/ " > < / >

我不知道应该将spider代码放在哪里,以及如何从主函数调用它。请帮助。下面是示例代码:

    # This snippet can be used to run scrapy spiders independent of scrapyd or the scrapy command line tool and use it from a script. # # The multiprocessing library is used in order to work around a bug in Twisted, in which you cannot restart an already running reactor or in this case a scrapy instance. # # [Here](http://groups.google.com/group/scrapy-users/browse_thread/thread/f332fc5b749d401a) is the mailing-list discussion for this snippet. #!/usr/bin/python import os os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings') #Must be at the top before other imports from scrapy import log, signals, project from scrapy.xlib.pydispatch import dispatcher from scrapy.conf import settings from scrapy.crawler import CrawlerProcess from multiprocessing import Process, Queue class CrawlerScript(): def __init__(self): self.crawler = CrawlerProcess(settings) if not hasattr(project, 'crawler'): self.crawler.install() self.crawler.configure() self.items = [] dispatcher.connect(self._item_passed, signals.item_passed) def _item_passed(self, item): self.items.append(item) def _crawl(self, queue, spider_name): spider = self.crawler.spiders.create(spider_name) if spider: self.crawler.queue.append_spider(spider) self.crawler.start() self.crawler.stop() queue.put(self.items) def crawl(self, spider): queue = Queue() p = Process(target=self._crawl, args=(queue, spider,)) p.start() p.join() return queue.get(True) # Usage if __name__ == "__main__": log.start() """ This example runs spider1 and then spider2 three times. """ items = list() crawler = CrawlerScript() items.append(crawler.crawl('spider1')) for i in range(3): items.append(crawler.crawl('spider2')) print items # Snippet imported from snippets.scrapy.org (which no longer works) # author: joehillen # date : Oct 24, 2010

谢谢你!

回答

​Python实用宝典 (pythondict.com)
不只是一个宝典
欢迎关注公众号:Python实用宝典

本文由 Python实用宝典 作者:Python实用宝典 发表,其版权均为 Python实用宝典 所有,文章内容系作者个人观点,不代表 Python实用宝典 对观点赞同或支持。如需转载,请注明文章来源。
0

发表评论