标签归档:web-applications

设置预定的工作?

问题:设置预定的工作?

我一直在使用Django开发Web应用程序,并且很好奇是否有一种方法可以安排作业定期运行。

基本上,我只想遍历数据库并自动定期进行一些计算/更新,但是我似乎找不到任何有关此操作的文档。

有人知道如何设置吗?

需要说明的是:我知道我可以为此设置cron工作,但我很好奇Django中是否有某些功能可以提供此功能。我希望人们能够自己部署此应用程序,而无需进行大量配置(最好为零)。

我已经考虑过通过简单地检查自从上次将请求发送到站点以来是否应该运行作业来“追溯地”触发这些操作,但是我希望有一些清洁的方法。

I’ve been working on a web app using Django, and I’m curious if there is a way to schedule a job to run periodically.

Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can’t seem to find any documentation on doing this.

Does anyone know how to set this up?

To clarify: I know I can set up a cron job to do this, but I’m curious if there is some feature in Django that provides this functionality. I’d like people to be able to deploy this app themselves without having to do much config (preferably zero).

I’ve considered triggering these actions “retroactively” by simply checking if a job should have been run since the last time a request was sent to the site, but I’m hoping for something a bit cleaner.


回答 0

我采用的一种解决方案是这样做:

1)创建一个自定义管理命令,例如

python manage.py my_cool_command

2)使用cron(在Linux上)或at在要求的时间(在Windows上)运行我的命令。

这是一个简单的解决方案,不需要安装沉重的AMQP堆栈。但是,使用其他答案中提到的诸如Celery之类的东西有很好的优势。特别是,使用Celery很好,不必将应用程序逻辑散布到crontab文件中。但是,cron解决方案非常适合中小型应用程序,并且您不需要太多外部依赖项。

编辑:

在更高版本的Windows中,at不建议在Windows 8,Server 2012及更高版本中使用该命令。您可以使用schtasks.exe相同的用途。

****更新****这是django doc 的新链接,用于编写自定义管理命令

One solution that I have employed is to do this:

1) Create a custom management command, e.g.

python manage.py my_cool_command

2) Use cron (on Linux) or at (on Windows) to run my command at the required times.

This is a simple solution that doesn’t require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don’t want a lot of external dependencies.

EDIT:

In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.

**** UPDATE **** This the new link of django doc for writing the custom management command


回答 1

Celery是基于AMQP(RabbitMQ)构建的分布式任务队列。它还以cron类的方式处理周期性任务(请参阅周期性任务)。根据您的应用程序,可能值得一试。

用django(docs)设置Celery非常容易,并且在停机的情况下,定期任务实际上会跳过错过的任务。如果任务失败,Celery还具有内置的重试机制。

Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.

Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.


回答 2

我们已经开源了我认为是结构化应用程序的源代码。Brian的解决方案也暗指。我们希望收到任何/所有反馈!

https://github.com/tivix/django-cron

它带有一个管理命令:

./manage.py runcrons

做到了。每个cron都被建模为一个类(因此其所有OO),并且每个cron都以不同的频率运行,并且我们确保相同cron类型不会并行运行(以防万一cron自身花费的时间比其频率更长!)

We’ve open-sourced what I think is a structured app. that Brian’s solution above alludes too. We would love any / all feedback!

https://github.com/tivix/django-cron

It comes with one management command:

./manage.py runcrons

That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn’t run in parallel (in case crons themselves take longer time to run than their frequency!)


回答 3

如果您使用的是标准POSIX操作系统,请使用cron

如果您使用的是Windows,请

编写Django管理命令以

  1. 找出他们使用的平台。

  2. 为您的用户执行适当的“ AT”命令,为您的用户更新crontab。

If you’re using a standard POSIX OS, you use cron.

If you’re using Windows, you use at.

Write a Django management command to

  1. Figure out what platform they’re on.

  2. Either execute the appropriate “AT” command for your users, or update the crontab for your users.


回答 4

有趣的新可插拔Django应用:django-chronograph

您只需要添加一个用作计时器的cron条目,即可在脚本中运行一个非常漂亮的Django管理界面。

Interesting new pluggable Django app: django-chronograph

You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.


回答 5

看一下Django Poor Man’s Cron,这是一个Django应用,它利用垃圾邮件搜索引擎,搜索引擎索引机器人等以大致固定的时间间隔运行计划的任务

请参阅:http : //code.google.com/p/django-poormanscron/

Look at Django Poor Man’s Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals

See: http://code.google.com/p/django-poormanscron/


回答 6

布赖恩·尼尔(Brian Neal)建议通过cron运行管理命令效果很好,但是如果您正在寻找更强大的功能(但还不如Celery(Celery)那么细腻),我可以考虑一下Kronos这样的库:

# app/cron.py

import kronos

@kronos.register('0 * * * *')
def task():
    pass

Brian Neal’s suggestion of running management commands via cron works well, but if you’re looking for something a little more robust (yet not as elaborate as Celery) I’d look into a library like Kronos:

# app/cron.py

import kronos

@kronos.register('0 * * * *')
def task():
    pass

回答 7

RabbitMQ和Celery比Cron具有更多的功能和任务处理功能。如果任务失败不是问题,并且您认为您将在下一个调用中处理损坏的任务,那么Cron就足够了。

Celery & AMQP将让您处理损坏的任务,并且它将由另一位工作人员再次执行(Celery工作人员侦听要处理的下一个任务),直到到达任务的max_retries属性为止。您甚至可以在发生故障时调用任务,例如记录故障,或在发生故障后向管理员发送电子邮件max_retries

而且,当您需要扩展应用程序时,您可以分发Celery和AMQP服务器。

RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn’t an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.

Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task’s max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.

And you can distribute Celery and AMQP servers when you need to scale your application.


回答 8

我之前有完全相同的要求,最终使用APScheduler用户指南)解决了这一要求

它使调度作业变得非常简单,并使它独立于某些代码的基于请求的执行。以下是一个简单的示例。

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
job = None

def tick():
    print('One tick!')\

def start_job():
    global job
    job = scheduler.add_job(tick, 'interval', seconds=3600)
    try:
        scheduler.start()
    except:
        pass

希望这对某人有帮助!

I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)

It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
job = None

def tick():
    print('One tick!')\

def start_job():
    global job
    job = scheduler.add_job(tick, 'interval', seconds=3600)
    try:
        scheduler.start()
    except:
        pass

Hope this helps somebody!


回答 9

我个人使用cron,但是django-extensionsJobs Scheduling部分看起来很有趣。

I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.


回答 10

尽管不是Django的一部分,但Airflow是一个较新的项目(截至2016年),对任务管理很有用。

Airflow是一个工作流自动化和调度系统,可用于创作和管理数据管道。基于Web的UI为开发人员提供了一系列用于管理和查看这些管道的选项。

Airflow用Python编写,并使用Flask构建。

Airflow由Airbnb的Maxime Beauchemin创建,并于2015年春季开源。它于2016年冬季加入Apache Software Foundation的孵化计划。这是Git项目页面和一些其他背景信息

Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.

Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.

Airflow is written in Python and is built using Flask.

Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.


回答 11

将以下内容放在cron.py文件的顶部:

#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'

# imports and code below

Put the following at the top of your cron.py file:

#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'

# imports and code below

回答 12

我只是想到了这个相当简单的解决方案:

  1. 定义一个视图函数do_work(req,param),就像在其他任何视图中一样,通过URL映射,返回HttpResponse等。
  2. 根据您的时间偏好设置(或在Windows中使用AT或计划任务)设置cron作业,该作业运行curl http:// localhost / your / mapped / url?param = value

您可以添加参数,但只需将参数添加到URL。

跟我说你们的想法。

[更新]我现在正在使用来自django-extensions的 runjob命令,而不是curl。

我的cron看起来像这样:

@hourly python /path/to/project/manage.py runjobs hourly

…等等,每天,每月等。您也可以将其设置为运行特定作业。

我发现它更易于管理和清洁。不需要将URL映射到视图。只需定义您的工作类别和crontab即可。

I just thought about this rather simple solution:

  1. Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
  2. Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.

You can add parameters but just adding parameters to the URL.

Tell me what you guys think.

[Update] I’m now using runjob command from django-extensions instead of curl.

My cron looks something like this:

@hourly python /path/to/project/manage.py runjobs hourly

… and so on for daily, monthly, etc’. You can also set it up to run a specific job.

I find it more managable and a cleaner. Doesn’t require mapping a URL to a view. Just define your job class and crontab and you’re set.


回答 13

在代码部分之后,我可以写任何东西,就像我的views.py :)

#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################

来自 http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/

after the part of code,I can write anything just like my views.py :)

#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################

from http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/


回答 14

您绝对应该检查django-q!它不需要任何额外的配置,并且很可能具有处理商业项目中任何生产问题所需的一切。

它是积极开发的,并且与django,django ORM,mongo,redis很好地集成在一起。这是我的配置:

# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
    # Match recommended settings from docs.
    'name': 'DjangoORM',
    'workers': 4,
    'queue_limit': 50,
    'bulk': 10,
    'orm': 'default',

# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,

# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,

# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,

# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,

# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,

# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
    'sentry': RAVEN_CONFIG,
},
}

You should definitely check out django-q! It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.

It’s actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:

# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
    # Match recommended settings from docs.
    'name': 'DjangoORM',
    'workers': 4,
    'queue_limit': 50,
    'bulk': 10,
    'orm': 'default',

# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,

# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,

# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,

# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,

# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,

# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
    'sentry': RAVEN_CONFIG,
},
}

回答 15

用于计划程序作业的Django APScheduler。Advanced Python Scheduler(APScheduler)是一个Python库,可让您安排Python代码稍后执行,一次或定期执行。您可以根据需要随时添加或删除旧作业。

注意:我是这个图书馆的作者

安装APScheduler

pip install apscheduler

查看文件功能调用

文件名:scheduler_jobs.py

def FirstCronTest():
    print("")
    print("I am executed..!")

配置调度程序

制作execute.py文件并添加以下代码

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()

您的书面函数在这里,调度程序函数写在scheduler_jobs中

import scheduler_jobs 

scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()

链接文件以执行

现在,在Url文件底部添加以下行

import execute

Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.

note: I’m the author of this library

Install APScheduler

pip install apscheduler

View file function to call

file name: scheduler_jobs.py

def FirstCronTest():
    print("")
    print("I am executed..!")

Configuring the scheduler

make execute.py file and add the below codes

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()

Your written functions Here, the scheduler functions are written in scheduler_jobs

import scheduler_jobs 

scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()

Link the File for Execution

Now, add the below line in the bottom of Url file

import execute

回答 16

我今天对你的问题也有类似的看法。

我不想让它通过服务器cron来处理(最后,大多数库只是cron助手)。

因此,我创建了一个调度模块并将其附加到init

这不是最好的方法,但是它可以帮助我将所有代码都放在一个地方,并且其执行与主应用程序有关。

I had something similar with your problem today.

I didn’t wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).

So i’ve created a scheduling module and attached it to the init .

It’s not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.


回答 17

是的,上面的方法很棒。我尝试了其中一些。最后,我发现了这样的方法:

    from threading import Timer

    def sync():

        do something...

        sync_timer = Timer(self.interval, sync, ())
        sync_timer.start()

就像递归一样。

好的,我希望这种方法可以满足您的要求。:)

Yes, the method above is so great. And I tried some of them. At last, I found a method like this:

    from threading import Timer

    def sync():

        do something...

        sync_timer = Timer(self.interval, sync, ())
        sync_timer.start()

Just like Recursive.

Ok, I hope this method can meet your requirement. :)


回答 18

与Celery相比,更现代的解决方案是Django Q:https//django-q.readthedocs.io/en/latest/index.html

它具有出色的文档,并且很容易理解。缺少Windows支持,因为Windows不支持流程分支。但是,如果您使用Windows for Linux子系统创建开发环境,则效果很好。

A more modern solution (compared to Celery) is Django Q: https://django-q.readthedocs.io/en/latest/index.html

It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.


回答 19

我用Celery做我的定期任务。首先,您需要按以下步骤安装它:

pip install django-celery

不要忘记注册django-celery设置,然后您可以执行以下操作:

from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
@periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
 #your code

I use celery to create my periodical tasks. First you need to install it as follows:

pip install django-celery

Don’t forget to register django-celery in your settings and then you could do something like this:

from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
@periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
 #your code

回答 20

我不确定这对任何人都有用,因为我必须提供系统的其他用户来计划作业,而又不让他们访问实际的服务器(Windows)任务计划程序,因此我创建了这个可重用的应用程序。

请注意,用户可以访问服务器上的一个共享文件夹,可以在其中创建所需的command / task / .bat文件。然后可以使用此应用安排此任务。

应用名称为 Django_Windows_Scheduler

屏幕截图:

I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.

Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.

App name is Django_Windows_Scheduler

ScreenShot:


回答 21

如果您想要比Celery可靠的产品,请尝试构建在AWS SQS / SNS之上的TaskHawk

请参阅:http : //taskhawk.readthedocs.io

If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.

Refer: http://taskhawk.readthedocs.io


回答 22

对于简单的dockerized项目,我真的看不到任何现有的合适答案。

因此,我写了一个非常准系统的解决方案,不需要外部库或触发器,它们可以独立运行。无需外部os-cron,就可以在每种环境下工作。

它通过添加中间件来工作: middleware.py

import threading

def should_run(name, seconds_interval):
    from application.models import CronJob
    from django.utils.timezone import now

    try:
        c = CronJob.objects.get(name=name)
    except CronJob.DoesNotExist:
        CronJob(name=name, last_ran=now()).save()
        return True

    if (now() - c.last_ran).total_seconds() >= seconds_interval:
        c.last_ran = now()
        c.save()
        return True

    return False


class CronTask:
    def __init__(self, name, seconds_interval, function):
        self.name = name
        self.seconds_interval = seconds_interval
        self.function = function


def cron_worker(*_):
    if not should_run("main", 60):
        return

    # customize this part:
    from application.models import Event
    tasks = [
        CronTask("events", 60 * 30, Event.clean_stale_objects),
        # ...
    ]

    for task in tasks:
        if should_run(task.name, task.seconds_interval):
            task.function()


def cron_middleware(get_response):

    def middleware(request):
        response = get_response(request)
        threading.Thread(target=cron_worker).start()
        return response

    return middleware

models/cron.py

from django.db import models


class CronJob(models.Model):
    name = models.CharField(max_length=10, primary_key=True)
    last_ran = models.DateTimeField()

settings.py

MIDDLEWARE = [
    ...
    'application.middleware.cron_middleware',
    ...
]

For simple dockerized projects, I could not really see any existing answer fit.

So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.

It works by adding a middleware: middleware.py

import threading

def should_run(name, seconds_interval):
    from application.models import CronJob
    from django.utils.timezone import now

    try:
        c = CronJob.objects.get(name=name)
    except CronJob.DoesNotExist:
        CronJob(name=name, last_ran=now()).save()
        return True

    if (now() - c.last_ran).total_seconds() >= seconds_interval:
        c.last_ran = now()
        c.save()
        return True

    return False


class CronTask:
    def __init__(self, name, seconds_interval, function):
        self.name = name
        self.seconds_interval = seconds_interval
        self.function = function


def cron_worker(*_):
    if not should_run("main", 60):
        return

    # customize this part:
    from application.models import Event
    tasks = [
        CronTask("events", 60 * 30, Event.clean_stale_objects),
        # ...
    ]

    for task in tasks:
        if should_run(task.name, task.seconds_interval):
            task.function()


def cron_middleware(get_response):

    def middleware(request):
        response = get_response(request)
        threading.Thread(target=cron_worker).start()
        return response

    return middleware

models/cron.py:

from django.db import models


class CronJob(models.Model):
    name = models.CharField(max_length=10, primary_key=True)
    last_ran = models.DateTimeField()

settings.py:

MIDDLEWARE = [
    ...
    'application.middleware.cron_middleware',
    ...
]

回答 23

简单的方法是编写一个自定义的shell命令(请参阅Django文档)并在Linux上使用cronjob执行它。但是,我强烈建议您使用像RabbitMQ这样的消息代理以及Celery。也许你可以看看这个教程

Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at this Tutorial


Django可扩展吗?[关闭]

问题:Django可扩展吗?[关闭]

我正在使用Django构建Web应用程序。我选择Django的原因是:

  • 我想使用免费/开源工具。
  • 我喜欢Python,并认为它是一种长期的语言,而对于Ruby,我不确定,PHP似乎是一个学习上的麻烦。
  • 我正在为一个想法构建原型,并且对未来没有太多考虑。开发速度是主要因素,我已经了解Python。
  • 我知道,如果将来我选择迁移到Google App Engine,将会更容易。
  • 我听说Django很“不错”。

现在,我开始考虑发布作品了,我开始担心规模。我发现的有关Django扩展功能的唯一信息是Django团队提供的(我并不是说要忽略它们,但这显然不是客观信息…)。

我的问题:

  • 今天在Django上构建的“最大”网站是什么?(我主要通过用户流量来衡量规模)
  • Django可以每天处理100,000个用户,每个用户访问几个小时吗?
  • 像Stack Overflow这样的网站可以在Django上运行吗?

I’m building a web application with Django. The reasons I chose Django were:

  • I wanted to work with free/open-source tools.
  • I like Python and feel it’s a long-term language, whereas regarding Ruby I wasn’t sure, and PHP seemed like a huge hassle to learn.
  • I’m building a prototype for an idea and wasn’t thinking too much about the future. Development speed was the main factor, and I already knew Python.
  • I knew the migration to Google App Engine would be easier should I choose to do so in the future.
  • I heard Django was “nice”.

Now that I’m getting closer to thinking about publishing my work, I start being concerned about scale. The only information I found about the scaling capabilities of Django is provided by the Django team (I’m not saying anything to disregard them, but this is clearly not objective information…).

My questions:

  • What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)
  • Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
  • Could a site like Stack Overflow run on Django?

回答 0

  1. “当今在Django上最大的网站是什么?”

    没有一个地方可以收集有关Django构建的网站上的流量的信息,因此我将不得不使用来自不同位置的数据来刺探它。首先,在Django项目主页的首页上有Django站点列表,然后在djangosites.org上有Django构建的站点列表。浏览列表并挑选一些我知道流量不错的网站,我们看到:

  2. “ Django每天可以处理100,000个用户,每个用户访问网站几个小时吗?”

    是的,请参见上文。

  3. “像Stack Overflow这样的网站可以在Django上运行吗?”

    我的直觉是肯定的,但是正如其他人回答并且Mike Malone在演讲中提到的那样,数据库设计至关重要。如果我们可以找到任何可靠的流量统计信息,也可以在www.cnprog.com上找到有力的证明。无论如何,将一堆Django模型放在一起不仅仅是发生的事情:)

当然,还有更多感兴趣的网站和博客作者,但是我必须在某个地方停下来!


关于使用Django构建高流量网站michaelmoore.com的博客文章,描述为排名前10,000的网站Quantcast统计信息Competition.com统计数据


(*)编辑的作者,包括此类参考文献,曾在该项目中担任外包开发人员。

  1. “What are the largest sites built on Django today?”

    There isn’t any single place that collects information about traffic on Django built sites, so I’ll have to take a stab at it using data from various locations. First, we have a list of Django sites on the front page of the main Django project page and then a list of Django built sites at djangosites.org. Going through the lists and picking some that I know have decent traffic we see:

  2. “Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?”

    Yes, see above.

  3. “Could a site like Stack Overflow run on Django?”

    My gut feeling is yes but, as others answered and Mike Malone mentions in his presentation, database design is critical. Strong proof might also be found at www.cnprog.com if we can find any reliable traffic stats. Anyway, it’s not just something that will happen by throwing together a bunch of Django models :)

There are, of course, many more sites and bloggers of interest, but I have got to stop somewhere!


Blog post about Using Django to build high-traffic site michaelmoore.com described as a top 10,000 website. Quantcast stats and compete.com stats.


(*) The author of the edit, including such reference, used to work as outsourced developer in that project.


回答 1

我们正在进行负载测试。我们认为我们可以支持240个并发请求(持续24×7的每秒120次命中),而服务器性能没有任何显着降低。那将是每小时432,000次点击。响应时间并不短(我们的交易量很大),但是随着负载的增加,我们的基准性能不会降低。

我们正在使用Apache前端Django和MySQL。操作系统是Red Hat Enterprise Linux(RHEL)。64位。对于Django,我们在守护程序模式下使用mod_wsgi。除了接受默认值外,我们没有进行任何缓存或数据库优化。

我们全都位于具有(我认为)32Gb RAM的64位Dell上的一个VM中。

由于20个或200个并发用户的性能几乎相同,因此我们不需要花费大量时间“调整”。相反,我们只需要通过常规SSL性能改进,常规数据库设计和实现(索引等),常规防火墙性能改进等来保持基本性能。

我们要衡量的是我们的负载测试笔记本电脑在15个运行16个请求并发线程的进程的疯狂工作量下苦苦挣扎。

We’re doing load testing now. We think we can support 240 concurrent requests (a sustained rate of 120 hits per second 24×7) without any significant degradation in the server performance. That would be 432,000 hits per hour. Response times aren’t small (our transactions are large) but there’s no degradation from our baseline performance as the load increases.

We’re using Apache front-ending Django and MySQL. The OS is Red Hat Enterprise Linux (RHEL). 64-bit. We use mod_wsgi in daemon mode for Django. We’ve done no cache or database optimization other than to accept the defaults.

We’re all in one VM on a 64-bit Dell with (I think) 32Gb RAM.

Since performance is almost the same for 20 or 200 concurrent users, we don’t need to spend huge amounts of time “tweaking”. Instead we simply need to keep our base performance up through ordinary SSL performance improvements, ordinary database design and implementation (indexing, etc.), ordinary firewall performance improvements, etc.

What we do measure is our load test laptops struggling under the insane workload of 15 processes running 16 concurrent threads of requests.


回答 2

不确定每天的访问次数,但以下是一些大型Django网站的示例:

这是Quora高流量Django站点列表的链接。

Not sure about the number of daily visits but here are a few examples of large Django sites:

Here is a link to list of high traffic Django sites on Quora.


回答 3

今天在Django上构建的“最大”网站是什么?(我主要通过用户流量来衡量规模)

在美国,是玛哈洛(Mahalo)。有人告诉我他们每个月处理大约1000万个唯一身份。现在,在2019年,Mahalo由Ruby on Rails提供支持。

国外,Globo网络(巴西新闻,体育和娱乐网站的网络);Alexa将其排在全球前100名(目前排名第80位)。

其他著名的Django用户包括PBS,国家地理,探索,NASA(实际上是NASA内的许多不同部门)和国会图书馆。

Django每天可以处理10万个用户,每个用户访问该网站几个小时吗?

是的-但前提是您正确编写了应用程序,并且拥有足够的硬件。Django不是万能的子弹。

像StackOverflow这样的网站可以在Django上运行吗?

是的(但见上文)。

从技术角度出发,轻而易举:尝试一下soclone。在流量方面,每月以不超过一百万的唯一身份竞争钉住StackOverflow。我可以命名至少十个Django网站,其流量比SO多。

What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic)

In the US, it was Mahalo. I’m told they handle roughly 10 million uniques a month. Now, in 2019, Mahalo is powered by Ruby on Rails.

Abroad, the Globo network (a network of news, sports, and entertainment sites in Brazil); Alexa ranks them in to top 100 globally (around 80th currently).

Other notable Django users include PBS, National Geographic, Discovery, NASA (actually a number of different divisions within NASA), and the Library of Congress.

Can Django deal with 100k users daily, each visiting the site for a couple of hours?

Yes — but only if you’ve written your application right, and if you’ve got enough hardware. Django’s not a magic bullet.

Could a site like StackOverflow run on Django?

Yes (but see above).

Technology-wise, easily: see soclone for one attempt. Traffic-wise, compete pegs StackOverflow at under 1 million uniques per month. I can name at least dozen Django sites with more traffic than SO.


回答 4

扩展Web应用程序与Web框架或语言无关,而与您的体系结构有关。它涉及到如何处理浏览器缓存,数据库缓存,如何使用非标准持久性提供程序(例如CouchDB),数据库的调整方式以及许多其他内容。

Scaling Web apps is not about web frameworks or languages, is about your architecture. It’s about how you handle you browser cache, your database cache, how you use non-standard persistence providers (like CouchDB), how tuned is your database and a lot of other stuff…


回答 5

扮演恶魔的拥护者:

您应该查看Cal Henderson提供的DjangoCon 2008主题演讲,题目为“为什么我讨厌Django”,其中他几乎涵盖了您可能想要在高流量网站中执行的Django缺少的所有事项。在这一天结束时,你有,因为它把所有这些以开放的心态完全有可能写出Django的应用包含的规模,但我认为这是一个很好的介绍和有关你的问题。

Playing devil’s advocate a little bit:

You should check the DjangoCon 2008 Keynote, delivered by Cal Henderson, titled “Why I hate Django” where he pretty much goes over everything Django is missing that you might want to do in a high traffic website. At the end of the day you have to take this all with an open mind because it is perfectly possible to write Django apps that scale, but I thought it was a good presentation and relevant to your question.


回答 6

我知道的最大的django网站是《华盛顿邮报》,这肯定表明它可以很好地扩展。

好的设计决策可能会对性能产生更大的影响。Twitter通常被认为是一个网站,它通过另一个基于动态解释语言的Web框架Ruby on Rails来体现性能问题-但Twitter工程师表示,该框架并没有像他们早先做出的某些数据库设计选择那样重要上。

Django与memcached配合得很好,并提供了一些用于管理缓存的类,您可以在其中解决大部分性能问题。在线交付的内容实际上比后端要重要的多-使用yslow之类的工具对于高性能Web应用程序至关重要。您始终可以在后端投入更多的硬件,但不能更改用户带宽。

The largest django site I know of is the Washington Post, which would certainly indicate that it can scale well.

Good design decisions probably have a bigger performance impact than anything else. Twitter is often cited as a site which embodies the performance issues with another dynamic interpreted language based web framework, Ruby on Rails – yet Twitter engineers have stated that the framework isn’t as much an issue as some of the database design choices they made early on.

Django works very nicely with memcached and provides some classes for managing the cache, which is where you would resolve the majority of your performance issues. What you deliver on the wire is almost more important than your backend in reality – using a tool like yslow is critical for a high performance web application. You can always throw more hardware at your backend, but you can’t change your users bandwidth.


回答 7

我上周参加了EuroDjangoCon会议,这是几场讲座的主题-包括最大的基于Django的网站Pownce的创建者(这里的一个演讲的幻灯片)。主要信息是,您不必担心Django,而需要进行适当的缓存,负载平衡,数据库优化等工作。

Django实际上对大多数这些东西都有钩子-特别是缓存非常容易。

I was at the EuroDjangoCon conference the other week, and this was the subject of a couple of talks – including from the founders of what was the largest Django-based site, Pownce (slides from one talk here). The main message is that it’s not Django you have to worry about, but things like proper caching, load balancing, database optimisation, etc.

Django actually has hooks for most of those things – caching, in particular, is made very easy.


回答 8

我确定您正在寻找一个更可靠的答案,但是我能想到的最明显的客观验证是Google推动Django与它的App Engine框架一起使用。如果有人定期了解并处理可扩展性,那就是Google。根据我的阅读,最大的限制因素似乎是数据库后端,这就是Google使用自己的数据库的原因…

I’m sure you’re looking for a more solid answer, but the most obvious objective validation I can think of is that Google pushes Django for use with its App Engine framework. If anybody knows about and deals with scalability on a regular basis, it’s Google. From what I’ve read, the most limiting factor seems to be the database back-end, which is why Google uses their own…


回答 9

如高性能 Django书中所述, 并通过本Cal Henderson

请参阅下面提到的更多详细信息:

听到人们说“ Django无法扩展”的情况并不少见。根据您的看法,该陈述是完全正确的,也可能是完全错误的。Django本身无法扩展。

Ruby on Rails,Flask,PHP或数据库驱动的动态网站使用的任何其他语言也可以这样说。

不过,好消息是Django与一套缓存和负载平衡工具进行了精美的交互,这将使其能够扩展到最大流量。

与您在网上阅读的内容相反,它可以这样做,而无需替换通常标为“过慢”的核心组件,例如数据库ORM或模板层。

Disqus每月提供超过80亿的页面浏览量。那些数字很大。

这些团队已经证明Django确实可以扩展。我们在林肯环路的经验对此提供了支持。

我们已经建立了大型的Django网站,这些网站能够在Reddit主页上度过一天而又不费吹灰之力。

到目前为止,Django的扩展成功案例几乎不胜枚举。

它支持Disqus,Instagram和Pinterest。需要更多证据吗?Instagram仅3位工程师(其中2位没有后端开发)就能在Django上维持超过3000万用户

As stated in High Performance Django Book and Go through this Cal Henderson

See further details as mentioned below:

It’s not uncommon to hear people say “Django doesn’t scale”. Depending on how you look at it, the statement is either completely true or patently false. Django, on its own, doesn’t scale.

The same can be said of Ruby on Rails, Flask, PHP, or any other language used by a database-driven dynamic website.

The good news, however, is that Django interacts beautifully with a suite of caching and load balancing tools that will allow it to scale to as much traffic as you can throw at it.

Contrary to what you may have read online, it can do so without replacing core components often labeled as “too slow” such as the database ORM or the template layer.

Disqus serves over 8 billion page views per month. Those are some huge numbers.

These teams have proven Django most certainly does scale. Our experience here at Lincoln Loop backs it up.

We’ve built big Django sites capable of spending the day on the Reddit homepage without breaking a sweat.

Django’s scaling success stories are almost too numerous to list at this point.

It backs Disqus, Instagram, and Pinterest. Want some more proof? Instagram was able to sustain over 30 million users on Django with only 3 engineers (2 of which had no back-end development


回答 10

今天,我们使用许多Web应用程序和网站来满足我们的需求。它们中的大多数非常有用。我将向您展示python或django使用的其中一些。

华盛顿邮报

《华盛顿邮报》的网站是伴随他们的每日报纸而广为流行的在线新闻来源。Django Web框架可以轻松处理其大量的视图和流量。 Washington Post - 52.2 million unique visitors (March, 2015)

美国宇航局

国家航空航天局的官方网站是查找有关其正在进行的太空探索的新闻,图片和视频的地方。这个Django网站可以轻松处理大量的视图和流量。 2 million visitors monthly

守护者

《卫报》是英国《卫报》媒体集团所有的新闻和媒体网站。它几乎包含了《卫报》和《观察家》报纸的所有内容。这些巨大的数据由Django处理。 The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)

的YouTube

我们都知道YouTube是上传猫视频的地方,但失败了。作为现有的最受欢迎的网站之一,它为我们提供了无尽的视频娱乐时间。Python编程语言为其提供了强大支持,并为我们所喜爱的功能提供了支持。

投递箱

DropBox引发了在线文档存储革命,这已成为日常生活的一部分。现在,我们几乎将所有内容都存储在云中。Dropbox使我们能够使用Python的功能存储,同步和共享几乎所有内容。

调查Monkey

Survey Monkey是最大的在线调查公司。他们每天可以在重写的Python网站上处理超过一百万个响应。

Quora

Quora是在线提问和接收社区答案的第一人。这些社区成员在他们的Python网站上回答,编辑和组织了相关结果。

有点

Bitly URL缩短服务和分析的大多数代码都是使用Python构建的。他们的服务每天可以处理数亿个事件。

Reddit

Reddit被称为互联网的首页。这是一个在线查找基于数千种不同类别的信息或娱乐的地方。帖子和链接由用户生成,并通过投票提升到顶部。Reddit的许多功能都依靠Python来实现。

希普姆克

Hipmunk是一个在线消费者旅游网站,它比较热门旅游网站以找到最优惠的价格。这个Python网站的工具可让您找到目的地的最便宜的酒店和机票。

单击此处了解更多: 25个最受欢迎的python和django网站什么是在Django上运行的知名站点

Today we use many web apps and sites for our needs. Most of them are highly useful. I will show you some of them used by python or django.

Washington Post

The Washington Post’s website is a hugely popular online news source to accompany their daily paper. Its’ huge amount of views and traffic can be easily handled by the Django web framework. Washington Post - 52.2 million unique visitors (March, 2015)

NASA

The National Aeronautics and Space Administration’s official website is the place to find news, pictures, and videos about their ongoing space exploration. This Django website can easily handle huge amounts of views and traffic. 2 million visitors monthly

The Guardian

The Guardian is a British news and media website owned by the Guardian Media Group. It contains nearly all of the content of the newspapers The Guardian and The Observer. This huge data is handled by Django. The Guardian (commenting system) - 41,6 million unique visitors (October, 2014)

YouTube

We all know YouTube as the place to upload cat videos and fails. As one of the most popular websites in existence, it provides us with endless hours of video entertainment. The Python programming language powers it and the features we love.

DropBox

DropBox started the online document storing revolution that has become part of daily life. We now store almost everything in the cloud. Dropbox allows us to store, sync, and share almost anything using the power of Python.

Survey Monkey

Survey Monkey is the largest online survey company. They can handle over one million responses every day on their rewritten Python website.

Quora

Quora is the number one place online to ask a question and receive answers from a community of individuals. On their Python website relevant results are answered, edited, and organized by these community members.

Bitly

A majority of the code for Bitly URL shortening services and analytics are all built with Python. Their service can handle hundreds of millions of events per day.

Reddit

Reddit is known as the front page of the internet. It is the place online to find information or entertainment based on thousands of different categories. Posts and links are user generated and are promoted to the top through votes. Many of Reddit’s capabilities rely on Python for their functionality.

Hipmunk

Hipmunk is an online consumer travel site that compares the top travel sites to find you the best deals. This Python website’s tools allow you to find the cheapest hotels and flights for your destination.

Click here for more: 25-of-the-most-popular-python-and-django-websites, What-are-some-well-known-sites-running-on-Django


回答 11

我认为我们不妨将2011年苹果年度最佳应用程序Instagram(Instagram)添加到大量使用django的列表中。

I think we might as well add Apple’s App of the year for 2011, Instagram, to the list which uses django intensively.


回答 12

是的,它可以。可以是带Python的Django或Ruby on Rails。它仍然会扩展。

有几种不同的技术。首先,缓存无法扩展。除了硬件平衡器之外,您可能还具有以nginx作为前端平衡的多个应用程序服务器。为了扩展数据库方面,如果您采用RDBMS方式,则可以在MySQL / PostgreSQL中使用读取从属进行相当大的扩展。

Django中的高流量网站的一些很好的例子可能是:

  • 当他们还在那儿的时候就穿衣服
  • 铁饼(通用共享评论管理器)
  • 所有与报纸相关的网站:《华盛顿邮报》等。

您可以放心。

Yes it can. It could be Django with Python or Ruby on Rails. It will still scale.

There are few different techniques. First, caching is not scaling. You could have several application servers balanced with nginx as the front in addition to hardware balancer(s). To scale on the database side you can go pretty far with read slave in MySQL / PostgreSQL if you go the RDBMS way.

Some good examples of heavy traffic websites in Django could be:

  • Pownce when they were still there.
  • Discus (generic shared comments manager)
  • All the newspaper related websites: Washington Post and others.

You can feel safe.


回答 13

以下是Django中一些比较引人注目的内容的列表:

  1. 监护人的“ 调查议员的费用 ”应用程序

  2. Politifact.com(这是一篇有关(正面)体验的博客文章。该网站赢得了普利策奖)。

  3. 纽约时报的代表应用程序

  4. 每个块

  5. WaPo的一名程序员Peter Harkins 在他的博客中列出了他们用Django构建的所有内容

  6. 它有些旧,但是《洛杉矶时报》的某人对他们为什么选择Django 进行了基本概述

  7. 洋葱的AV俱乐部最近从(我认为Drupal)转移到了Django。

我想象这些网站中的许多网站每天的点击量可能超过10万次。Django当然可以每天点击10万次甚至更多。但是YMMV会根据您所构建的内容将您的特定网站放到那里。

在Django级别上,有一些缓存选项(例如,在memcached中缓存查询集和视图可以解决奇迹)以及其他方面(如Squid之类的上游缓存)。数据库服务器规范也将是一个因素(通常是挥霍的地方),以及您对其进行的优化程度。例如,不要以为Django会正确设置索引。不要以为默认的PostgreSQLMySQL配置是正确的配置。

此外,如果这是很慢的话,您总是可以选择让多个应用程序服务器运行Django,并在其前面安装软件或硬件负载平衡器。

最后,您是否要在与Django相同的服务器上提供静态内容?您使用的是Apache还是nginxlighttpd之类的东西?您能负担得起将CDN用于静态内容吗?这些都是要考虑的事情,但这都是非常投机的。每天10万次点击不是唯一的变量:您要花费多少?您拥有多少专业知识来管理所有这些组件?您需要花费多少时间将它们整合在一起?

Here’s a list of some relatively high-profile things built in Django:

  1. The Guardian’s “Investigate your MP’s expenses” app

  2. Politifact.com (here’s a Blog post talking about the (positive) experience. Site won a Pulitzer.

  3. NY Times’ Represent app

  4. EveryBlock

  5. Peter Harkins, one of the programmers over at WaPo, lists all the stuff they’ve built with Django on his blog

  6. It’s a little old, but someone from the LA Times gave a basic overview of why they went with Django.

  7. The Onion’s AV Club was recently moved from (I think Drupal) to Django.

I imagine a number of these these sites probably gets well over 100k+ hits per day. Django can certainly do 100k hits/day and more. But YMMV in getting your particular site there depending on what you’re building.

There are caching options at the Django level (for example caching querysets and views in memcached can work wonders) and beyond (upstream caches like Squid). Database Server specifications will also be a factor (and usually the place to splurge), as is how well you’ve tuned it. Don’t assume, for example, that Django’s going set up indexes properly. Don’t assume that the default PostgreSQL or MySQL configuration is the right one.

Furthermore, you always have the option of having multiple application servers running Django if that is the slow point, with a software or hardware load balancer in front.

Finally, are you serving static content on the same server as Django? Are you using Apache or something like nginx or lighttpd? Can you afford to use a CDN for static content? These are things to think about, but it’s all very speculative. 100k hits/day isn’t the only variable: how much do you want to spend? How much expertise do you have managing all these components? How much time do you have to pull it all together?


回答 14

YouTube的开发者拥护者在PyCon 2012上发表了有关缩放Python话题,这也与缩放Django有关。

YouTube拥有超过十亿的用户,并且YouTube基于Python构建。

The developer advocate for YouTube gave a talk about scaling Python at PyCon 2012, which is also relevant to scaling Django.

YouTube has more than a billion users, and YouTube is built on Python.


回答 15

我已经使用Django一年多了,它对组合模块化,可扩展性和开发速度的管理方式印象深刻。像任何技术一样,它也带有学习曲线。但是,Django社区提供的出色文档使学习曲线的难度大大降低。Django能够很好地处理我提出的所有问题。看起来它将能够很好地扩展到未来。

BidRodeo Penny Auctions是一个中等规模的Django支持的网站。这是一个非常动态的网站,每天确实处理大量的网页浏览。

I have been using Django for over a year now, and am very impressed with how it manages to combine modularity, scalability and speed of development. Like with any technology, it comes with a learning curve. However, this learning curve is made a lot less steep by the excellent documentation from the Django community. Django has been able to handle everything I have thrown at it really well. It looks like it will be able to scale well into the future.

BidRodeo Penny Auctions is a moderately sized Django powered website. It is a very dynamic website and does handle a good number of page views a day.


回答 16

请注意,如果您希望每天有10万名用户,并且一次处于活动状态数小时(意味着最多有2万名并发用户),那么您将需要大量服务器。SO拥有约15,000个注册用户,其中大多数人可能每天都不活跃。虽然大部分流量来自未注册的用户,但我猜想他们中很少有人会停留在网站上超过几分钟(即,他们遵循Google搜索结果然后离开)。

对于该数量,预计至少要有30台服务器……每台服务器仍然有1000个并发用户。

Note that if you’re expecting 100K users per day, that are active for hours at a time (meaning max of 20K+ concurrent users), you’re going to need A LOT of servers. SO has ~15,000 registered users, and most of them are probably not active daily. While the bulk of traffic comes from unregistered users, I’m guessing that very few of them stay on the site more than a couple minutes (i.e. they follow google search results then leave).

For that volume, expect at least 30 servers … which is still a rather heavy 1,000 concurrent users per server.


回答 17

今天在Django上构建的“最大”网站是什么?(我衡量大多是由用户流量的大小), Pinterest的
disqus.com
这里更多:https://www.shuup.com/en/blog/25-of-the-most-popular-python-and-django-websites/

Django是否可以每天处理100,000个用户,每个用户访问几个小时?
是的,但是使用适当的体系结构,数据库设计,缓存,负载平衡以及多个服务器或节点

像Stack Overflow这样的网站可以在Django上运行吗?
是的,只需要按照第二个问题中提到的答案

What’s the “largest” site that’s built on Django today? (I measure size mostly by user traffic) Pinterest
disqus.com
More here: https://www.shuup.com/en/blog/25-of-the-most-popular-python-and-django-websites/

Can Django deal with 100,000 users daily, each visiting the site for a couple of hours?
Yes but use proper architecture, database design, use of cache, use load balances and multiple servers or nodes

Could a site like Stack Overflow run on Django?
Yes just need to follow the answer mentioned in the 2nd question


回答 18

另一个示例是rasp.yandex.ru,俄罗斯的运输时间表服务。出席人数可以满足您的要求。

Another example is rasp.yandex.ru, Russian transport timetable service. Its attendance satisfies your requirements.


回答 19

如果您的网站上有一些静态内容,那么将Varnish服务器放在最前面将大大提高性能。即使是一个盒子,也可以轻松吐出100 Mbit / s的流量。

请注意,对于动态内容,使用Varnish之类的东西会变得更加棘手。

If you have a site with some static content, then putting a Varnish server in front will dramatically increase your performance. Even a single box can then easily spit out 100 Mbit/s of traffic.

Note that with dynamic content, using something like Varnish becomes a lot more tricky.


回答 20

我对Django的经验很少,但我确实记得《 Django书》中有一章,他们采访了运行某些较大Django应用程序的人员。 这是一个链接。 我想它可以提供一些见解。

它说curse.com是最大的Django应用程序之一,每月浏览量约为60-9000万。

My experience with Django is minimal but I do remember in The Django Book they have a chapter where they interview people running some of the larger Django applications. Here is a link. I guess it could provide some insights.

It says curse.com is one of the largest Django applications with around 60-90 million page views in a month.


回答 21

我使用Django为爱尔兰的国家广播公司开发高流量站点。它对我们很好。开发高性能站点不仅仅只是选择一个框架。框架将只是与最薄弱的环节一样强大的系统的一部分。如果问题是数据库查询速度慢或服务器或网络配置错误,则使用最新的框架“ X”不能解决您的性能问题。

I develop high traffic sites using Django for the national broadcaster in Ireland. It works well for us. Developing a high performance site is more than about just choosing a framework. A framework will only be one part of a system that is as strong as it’s weakest link. Using the latest framework ‘X’ won’t solve your performance issues if the problem is slow database queries or a badly configured server or network.


回答 22

尽管这里有很多不错的答案,但我只是想指出一点,没有人强调。

取决于应用

如果您的应用程序写时很少,那么从DB中读取的数据要比编写的要多得多。然后缩放django应该是相当琐碎的,哎呀,它带有一些相当不错的输出/视图缓存,可以直接使用。充分利用这一点,例如说redis作为缓存提供者,在它前面放置一个负载均衡器,启动n个实例,您应该能够处理非常大的流量。

现在,如果您必须每秒进行数千次复杂的写操作?不同的故事。Django将是一个错误的选择吗?好吧,不一定要取决于您如何真正设计解决方案以及您的要求是什么。

只是我的两分钱:-)

Even-though there have been a lot of great answers here, I just feel like pointing out, that nobody have put emphasis on..

It depends on the application

If you application is light on writes, as in you are reading a lot more data from the DB than you are writing. Then scaling django should be fairly trivial, heck, it comes with some fairly decent output/view caching straight out of the box. Make use of that, and say, redis as a cache provider, put a load balancer in front of it, spin up n-instances and you should be able to deal with a VERY large amount of traffic.

Now, if you have to do thousands of complex writes a second? Different story. Is Django going to be a bad choice? Well, not necessarily, depends on how you architect your solution really, and also, what your requirements are.

Just my two cents :-)


回答 23

您绝对可以在Django中运行高流量站点。在Django 1.0之前的版本中查看该版本,但仍在此处相关:http : //menendez.com/blog/launching-high-performance-django-site/

You can definitely run a high-traffic site in Django. Check out this pre-Django 1.0 but still relevant post here: http://menendez.com/blog/launching-high-performance-django-site/


回答 24

查看这个名为EveryBlock的微型新闻聚合

它完全用Django编写。实际上,他们是开发Django框架本身的人。

Check out this micro news aggregator called EveryBlock.

It’s entirely written in Django. In fact they are the people who developed the Django framework itself.


回答 25

问题不在于django是否可以扩展。

正确的方法是了解并了解在django / symfony / rails项目下可以很好地扩展的网络设计模式和工具。

一些想法可以是:

  • 多路复用。
  • 反向代理。例如:Nginx,光油
  • Memcache会话。例如:Redis
  • 在项目和数据库上进行集群化以实现负载平衡和容错:例如:Docker
  • 使用第三方存储资产。例如:Amazon S3

希望对您有所帮助。这是我到山上的小石头。

The problem is not to know if django can scale or not.

The right way is to understand and know which are the network design patterns and tools to put under your django/symfony/rails project to scale well.

Some ideas can be :

  • Multiplexing.
  • Inversed proxy. Ex : Nginx, Varnish
  • Memcache Session. Ex : Redis
  • Clusterization on your project and db for load balancing and fault tolerance : Ex : Docker
  • Use third party to store assets. Ex : Amazon S3

Hope it help a bit. This is my tiny rock to the mountain.


回答 26

如果您想使用开源,那么有很多选择。但是python是其中最好的,因为它有许多库和一个超棒的社区。这些是可能会改变您想法的一些原因:

  • Python非常好,但是它是一种解释型语言,因此速度很慢。但是有许多加速器和缓存服务可以部分解决此问题。

  • 如果您正在考虑快速发展,那么Ruby on Rails是最好的选择。此(ROR)框架的主要座右铭是为开发人员提供舒适的体验。如果您比较一下,Ruby和Python的语法几乎相同。

  • Google App Engine是一项非常好的服务,但是它将在您的范围内束缚您,您没有机会尝试新事物。取而代之的是,您可以使用Digital Ocean云,因为它最简单的液滴仅需每月支付5美元Heroku是另一项免费服务,您可以在其中部署产品。

  • 是! 是! 您所听到的是完全正确的,但这是一些使用其他技术的示例

    • Rails:Github,Twitter(以前),Shopify,Airbnb,Slideshare,Heroku等
    • PHP:Facebook,Wikipedia,Flickr,Yahoo,Tumbler,Mailchimp等。

结论是一种框架或语言无法为您做任何事情。更好的架构,设计和策略将为您提供可扩展的网站。Instagram是最大的例子,这个小团队正在管理如此庞大的数据。这是一个有关其架构必须阅读的博客

If you want to use Open source then there are many options for you. But python is best among them as it has many libraries and a super awesome community. These are a few reasons which might change your mind:

  • Python is very good but it is a interpreted language which makes it slow. But many accelerator and caching services are there which partly solve this problem.

  • If you are thinking about rapid development then Ruby on Rails is best among all. The main motto of this(ROR) framework is to give a comfortable experience to the developers. If you compare Ruby and Python both have nearly the same syntax.

  • Google App Engine is very good service but it will bind you in its scope, you don’t get chance to experiment new things. Instead of it you can use Digital Ocean cloud which will only take $5/Month charge for its simplest droplet. Heroku is another free service where you can deploy your product.

  • Yes! Yes! What you heard is totally correct but here are some examples which are using other technologies

    • Rails: Github, Twitter(previously), Shopify, Airbnb, Slideshare, Heroku etc.
    • PHP: Facebook, Wikipedia, Flickr, Yahoo, Tumbler, Mailchimp etc.

Conclusion is a framework or language won’t do everything for you. A better architecture, designing and strategy will give you a scalable website. Instagram is the biggest example, this small team is managing such huge data. Here is one blog about its architecture must read it.


回答 27

我不认为问题确实与Django缩放有关。

我真的建议您研究一下可以帮助您满足扩展需求的体系结构,如果您出错了,那么Django的性能将毫无意义。性能!=规模。您可以拥有一个性能惊人但不能扩展的系统,反之亦然。

您的应用程序数据库绑定了吗?如果是这样,那么您的规模问题也就在那里。您如何计划与Django中的数据库进行交互?当您的数据库无法以Django接受请求的速度处理请求时,会发生什么情况?当数据超过一台物理计算机时,会发生什么。您需要考虑如何计划应对这些情况。

此外,当您的流量超过一台应用服务器时会发生什么?在这种情况下,如何处理会话可能会很棘手,通常您可能需要无共享架构。同样,这取决于您的应用程序。

简而言之,语言不是决定规模的因素,语言是决定性能的因素(再次取决于您的应用程序,不同的语言表现不同)。是您的设计和体系结构使扩展成为现实。

希望对您有所帮助,如果您有任何疑问,将很乐意为您提供进一步的帮助。

I don’t think the issue is really about Django scaling.

I really suggest you look into your architecture that’s what will help you with your scaling needs.If you get that wrong there is no point on how well Django performs. Performance != Scale. You can have a system that has amazing performance but does not scale and vice versa.

Is your application database bound? If it is then your scale issues lay there as well. How are you planning on interacting with the database from Django? What happens when you database cannot process requests as fast as Django accepts them? What happens when your data outgrows one physical machine. You need to account for how you plan on dealing with those circumstances.

Moreover, What happens when your traffic outgrows one app server? how you handle sessions in this case can be tricky, more often than not you would probably require a shared nothing architecture. Again that depends on your application.

In short languages is not what determines scale, a language is responsible for performance(again depending on your applications, different languages perform differently). It is your design and architecture that makes scaling a reality.

I hope it helps, would be glad to help further if you have questions.


回答 28

一旦您的站点/应用程序开始增长,就必须平均分配任务,简而言之,优化各个方面,包括数据库,文件,图像,CSS等,并平衡负载与其他多种资源。或者,您为其腾出更多空间。大型站点必须实施CDN,云等最新技术。仅仅开发和调整应用程序并不能使您满意,其他组件也起着重要的作用。

Spreading the tasks evenly, in short optimizing each and every aspect including DBs, Files, Images, CSS etc. and balancing the load with several other resources is necessary once your site/application starts growing. OR you make some more space for it to grow. Implementation of latest technologies like CDN, Cloud are must with huge sites. Just developing and tweaking an application won’t give your the cent percent satisfation, other components also play an important role.