标签归档:mod-python

减少Django的内存使用量。低挂水果?

问题:减少Django的内存使用量。低挂水果?

我的内存使用量随着时间的推移而增加,并且重新启动Django对用户而言并不友好。

我不确定如何分析内存使用情况,但是一些有关如何开始测量的提示将很有用。

我感觉有些简单的步骤可以带来很大的收益。确保将“调试”设置为“假”是显而易见的。

有人可以建议别人吗?在低流量的网站上缓存会带来多少改善?

在这种情况下,我使用mod_python在Apache 2.x下运行。我听说mod_wsgi较为精简,但在此阶段进行切换将非常棘手,除非我知道收益会很大。

编辑:感谢到目前为止的提示。关于如何发现内存用尽的任何建议?是否有任何有关Python内存分析的指南?

同样如前所述,有些事情会使切换到mod_wsgi变得很棘手,因此我想对在朝这个方向努力之前所能获得的收益有所了解。

编辑:卡尔在这里发布了更详细的回复,值得一读:Django部署:减少Apache的开销

编辑: Graham Dumpleton的文章是我在MPM和mod_wsgi相关的东西上找到的最好的文章。我很失望,但是没人能提供有关调试应用程序本身的内存使用情况的任何信息。

最终编辑:好吧,我一直在与Webfaction讨论这个问题,看他们是否可以协助重新编译Apache,这就是他们的话:

“我真的认为切换到MPM Worker + mod_wsgi设置不会给您带来太大的好处。我估计您可能可以节省20MB左右,但可能不超过20MB。”

所以!这使我回到了最初的问题(我仍然不明智)。如何确定问题所在?这是一个众所周知的准则,如果不进行测试以查看需要优化的地方就不会进行优化,但是关于测量Python内存使用情况的教程的方式很少,而针对Django的教程则完全没有。

感谢大家的帮助,但我认为这个问题仍然悬而未决!

另一个最终编辑;-)

我在django-users列表上问了这个,并得到了一些非常有帮助的回复

老实说,有史以来最后一次更新!

这是刚刚发布。可能是迄今为止最好的解决方案:使用Pympler分析Django对象的大小和内存使用情况

My memory usage increases over time and restarting Django is not kind to users.

I am unsure how to go about profiling the memory usage but some tips on how to start measuring would be useful.

I have a feeling that there are some simple steps that could produce big gains. Ensuring ‘debug’ is set to ‘False’ is an obvious biggie.

Can anyone suggest others? How much improvement would caching on low-traffic sites?

In this case I’m running under Apache 2.x with mod_python. I’ve heard mod_wsgi is a bit leaner but it would be tricky to switch at this stage unless I know the gains would be significant.

Edit: Thanks for the tips so far. Any suggestions how to discover what’s using up the memory? Are there any guides to Python memory profiling?

Also as mentioned there’s a few things that will make it tricky to switch to mod_wsgi so I’d like to have some idea of the gains I could expect before ploughing forwards in that direction.

Edit: Carl posted a slightly more detailed reply here that is worth reading: Django Deployment: Cutting Apache’s Overhead

Edit: Graham Dumpleton’s article is the best I’ve found on the MPM and mod_wsgi related stuff. I am rather disappointed that no-one could provide any info on debugging the memory usage in the app itself though.

Final Edit: Well I have been discussing this with Webfaction to see if they could assist with recompiling Apache and this is their word on the matter:

“I really don’t think that you will get much of a benefit by switching to an MPM Worker + mod_wsgi setup. I estimate that you might be able to save around 20MB, but probably not much more than that.”

So! This brings me back to my original question (which I am still none the wiser about). How does one go about identifying where the problems lies? It’s a well known maxim that you don’t optimize without testing to see where you need to optimize but there is very little in the way of tutorials on measuring Python memory usage and none at all specific to Django.

Thanks for everyone’s assistance but I think this question is still open!

Another final edit ;-)

I asked this on the django-users list and got some very helpful replies

Honestly the last update ever!

This was just released. Could be the best solution yet: Profiling Django object size and memory usage with Pympler


回答 0

确保您没有保留对数据的全局引用。这样可以防止python垃圾回收器释放内存。

不要使用mod_python。它在apache中加载一个解释器。如果需要使用apache,请mod_wsgi改用。切换并不困难。这很容易。与django -dead相比,为djangomod_wsgi进行配置更容易mod_python

如果您可以从需求中删除apache,那对您的记忆会更好。spawning似乎是运行python Web应用程序的新的快速可扩展方式。

编辑:我看不到如何切换到mod_wsgi可能是“ 棘手的 ”。这应该是一个非常容易的任务。请详细说明您在使用交换机时遇到的问题。

Make sure you are not keeping global references to data. That prevents the python garbage collector from releasing the memory.

Don’t use mod_python. It loads an interpreter inside apache. If you need to use apache, use mod_wsgi instead. It is not tricky to switch. It is very easy. mod_wsgi is way easier to configure for django than brain-dead mod_python.

If you can remove apache from your requirements, that would be even better to your memory. spawning seems to be the new fast scalable way to run python web applications.

EDIT: I don’t see how switching to mod_wsgi could be “tricky“. It should be a very easy task. Please elaborate on the problem you are having with the switch.


回答 1

如果您在mod_wsgi下运行,并且由于它是WSGI兼容的,则大概是在生成的,您可以使用Dozer查看您的内存使用情况。

在mod_wsgi下,只需将其添加到WSGI脚本的底部:

from dozer import Dozer
application = Dozer(application)

然后将浏览器指向http:// domain / _dozer / index以查看所有内存分配的列表。

我还要添加对mod_wsgi的支持之声。与mod_python相比,它在性能和内存使用方面有很大的不同。Graham Dumpleton对mod_wsgi的支持非常出色,无论是在积极开发方面还是在帮助邮件列表中的人员优化安装方面均如此。curse.com上的David Cramer 发布了一些图表(不幸的是,现在似乎找不到),显示了在该高流量站点上切换到mod_wsgi后cpu和内存使用量的急剧下降。django开发人员中有几个已经切换。说真的,这很容易:)

If you are running under mod_wsgi, and presumably spawning since it is WSGI compliant, you can use Dozer to look at your memory usage.

Under mod_wsgi just add this at the bottom of your WSGI script:

from dozer import Dozer
application = Dozer(application)

Then point your browser at http://domain/_dozer/index to see a list of all your memory allocations.

I’ll also just add my voice of support for mod_wsgi. It makes a world of difference in terms of performance and memory usage over mod_python. Graham Dumpleton’s support for mod_wsgi is outstanding, both in terms of active development and in helping people on the mailing list to optimize their installations. David Cramer at curse.com has posted some charts (which I can’t seem to find now unfortunately) showing the drastic reduction in cpu and memory usage after they switched to mod_wsgi on that high traffic site. Several of the django devs have switched. Seriously, it’s a no-brainer :)


回答 2

这些是我知道的Python内存探查器解决方案(与Django无关):

免责声明:我与后者有一定关系。

各个项目的文档应使您了解如何使用这些工具来分析Python应用程序的内存行为。

以下是一个不错的“战争故事”,还提供了一些有用的指导:

These are the Python memory profiler solutions I’m aware of (not Django related):

Disclaimer: I have a stake in the latter.

The individual project’s documentation should give you an idea of how to use these tools to analyze memory behavior of Python applications.

The following is a nice “war story” that also gives some helpful pointers:


回答 3

此外,检查是否不使用任何已知的泄漏器。由于Unicode处理中的错误,MySQLdb会泄漏Django的大量内存。除此之外,Django Debug Toolbar可能会帮助您跟踪猪。

Additionally, check if you do not use any of known leakers. MySQLdb is known to leak enormous amounts of memory with Django due to bug in unicode handling. Other than that, Django Debug Toolbar might help you to track the hogs.


回答 4

除了不保留对大型数据对象的全局引用之外,还应尽可能避免将大型数据集加载到内存中。

在守护程序模式下切换到mod_wsgi,并使用Apache的worker mpm代替prefork。后面的步骤可以使您以更少的内存开销为更多的并发用户提供服务。

In addition to not keeping around global references to large data objects, try to avoid loading large datasets into memory at all wherever possible.

Switch to mod_wsgi in daemon mode, and use Apache’s worker mpm instead of prefork. This latter step can allow you to serve many more concurrent users with much less memory overhead.


回答 5

Webfaction实际上有一些技巧可以降低Django的内存使用量。

要点:

  • 确保将debug设置为false(您已经知道)。
  • 在您的Apache配置中使用“ ServerLimit”
  • 检查内存中没有大对象
  • 考虑在单独的进程或服务器中提供静态内容。
  • 在您的apache配置中使用“ MaxRequestsPerChild”
  • 找出并了解您正在使用多少内存

Webfaction actually has some tips for keeping django memory usage down.

The major points:

  • Make sure debug is set to false (you already know that).
  • Use “ServerLimit” in your apache config
  • Check that no big objects are being loaded in memory
  • Consider serving static content in a separate process or server.
  • Use “MaxRequestsPerChild” in your apache config
  • Find out and understand how much memory you’re using

回答 6

mod_wsgi的另一个优点:maximum-requestsWSGIDaemonProcess指令中设置一个参数,mod_wsgi会每隔很长时间就重新启动守护进程。对用户来说,应该没有可见的效果,除了第一次刷新新进程时页面加载缓慢之外,因为它将把Django和您的应用程序代码加载到内存中。

但是,即使确实有内存泄漏,也应避免进程过大,而不必中断对用户的服务。

Another plus for mod_wsgi: set a maximum-requests parameter in your WSGIDaemonProcess directive and mod_wsgi will restart the daemon process every so often. There should be no visible effect for the user, other than a slow page load the first time a fresh process is hit, as it’ll be loading Django and your application code into memory.

But even if you do have memory leaks, that should keep the process size from getting too large, without having to interrupt service to your users.


回答 7

这是我用于mod_wsgi的脚本(称为wsgi.py,并放在django项目的根目录中):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

根据需要调整myproject.settings和路径。我将所有输出重定向到/ dev / null,因为默认情况下mod_wsgi阻止打印。请改用日志记录。

对于apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

希望这至少应该可以帮助您设置mod_wsgi,以便您查看它是否有所作为。

Here is the script I use for mod_wsgi (called wsgi.py, and put in the root off my django project):

import os
import sys
import django.core.handlers.wsgi

from os import path

sys.stdout = open('/dev/null', 'a+')
sys.stderr = open('/dev/null', 'a+')

sys.path.append(path.join(path.dirname(__file__), '..'))

os.environ['DJANGO_SETTINGS_MODULE'] = 'myproject.settings'
application = django.core.handlers.wsgi.WSGIHandler()

Adjust myproject.settings and the path as needed. I redirect all output to /dev/null since mod_wsgi by default prevents printing. Use logging instead.

For apache:

<VirtualHost *>
   ServerName myhost.com

   ErrorLog /var/log/apache2/error-myhost.log
   CustomLog /var/log/apache2/access-myhost.log common

   DocumentRoot "/var/www"

   WSGIScriptAlias / /path/to/my/wsgi.py

</VirtualHost>

Hopefully this should at least help you set up mod_wsgi so you can see if it makes a difference.


回答 8

缓存:确保已将其刷新。它很容易将某些东西放到缓存中,但是由于缓存引用而永远不会被GC。

Swig’d代码:确保任何内存管理都正确完成,这真的很容易在python中丢失,尤其是在第三方库中

监视:如果可以,获取有关内存使用率和命中率的数据。通常,您会看到某种类型的请求与内存使用之间的关联。

Caches: make sure they’re being flushed. Its easy for something to land in a cache, but never be GC’d because of the cache reference.

Swig’d code: Make sure any memory management is being done correctly, its really easy to miss these in python, especially with third party libraries

Monitoring: If you can, get data about memory usage and hits. Usually you’ll see a correlation between a certain type of request and memory usage.


回答 9

我们偶然发现了Django中包含大型站点地图(10000个项)的错误。似乎Django在生成站点地图时正在尝试将它们全部加载到内存中:http : //code.djangoproject.com/ticket/11572-当Google对该网站进行访问时,有效地终止了Apache进程。

We stumbled over a bug in Django with big sitemaps (10.000 items). Seems Django is trying to load them all in memory when generating the sitemap: http://code.djangoproject.com/ticket/11572 – effectively kills the apache process when Google pays a visit to the site.