标签归档:django-custom-manager

Django动态模型字段

问题:Django动态模型字段

我正在开发一个多租户应用程序,其中一些用户可以定义自己的数据字段(通过管理员)以收集表单中的其他数据并报告数据。后一点使得JSONField不是一个很好的选择,所以我有以下解决方案:

class CustomDataField(models.Model):
    """
    Abstract specification for arbitrary data fields.
    Not used for holding data itself, but metadata about the fields.
    """
    site = models.ForeignKey(Site, default=settings.SITE_ID)
    name = models.CharField(max_length=64)

    class Meta:
        abstract = True

class CustomDataValue(models.Model):
    """
    Abstract specification for arbitrary data.
    """
    value = models.CharField(max_length=1024)

    class Meta:
        abstract = True

请注意,CustomDataField如何具有指向站点的ForeignKey-每个站点将具有一组不同的自定义数据字段,但是使用相同的数据库。然后可以将各种具体的数据字段定义为:

class UserCustomDataField(CustomDataField):
    pass

class UserCustomDataValue(CustomDataValue):
    custom_field = models.ForeignKey(UserCustomDataField)
    user = models.ForeignKey(User, related_name='custom_data')

    class Meta:
        unique_together=(('user','custom_field'),)

这导致以下用途:

custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?

但这感觉很笨拙,尤其是在需要手动创建相关数据并将其与具体模型关联的情况下。有没有更好的方法?

已被优先丢弃的选项:

  • 自定义SQL可以即时修改表。一方面是因为它无法扩展,另一方面是因为它太过分了。
  • 无模式的解决方案,例如NoSQL。我没有反对他们的想法,但他们仍然不合适。最终,这个数据类型化,并使用第三方报告应用的可能性是存在的。
  • 上面列出的JSONField,因为它不适用于查询。

I’m working on a multi-tenanted application in which some users can define their own data fields (via the admin) to collect additional data in forms and report on the data. The latter bit makes JSONField not a great option, so instead I have the following solution:

class CustomDataField(models.Model):
    """
    Abstract specification for arbitrary data fields.
    Not used for holding data itself, but metadata about the fields.
    """
    site = models.ForeignKey(Site, default=settings.SITE_ID)
    name = models.CharField(max_length=64)

    class Meta:
        abstract = True

class CustomDataValue(models.Model):
    """
    Abstract specification for arbitrary data.
    """
    value = models.CharField(max_length=1024)

    class Meta:
        abstract = True

Note how CustomDataField has a ForeignKey to Site – each Site will have a different set of custom data fields, but use the same database. Then the various concrete data fields can be defined as:

class UserCustomDataField(CustomDataField):
    pass

class UserCustomDataValue(CustomDataValue):
    custom_field = models.ForeignKey(UserCustomDataField)
    user = models.ForeignKey(User, related_name='custom_data')

    class Meta:
        unique_together=(('user','custom_field'),)

This leads to the following use:

custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?

But this feels very clunky, particularly with the need to manually create the related data and associate it with the concrete model. Is there a better approach?

Options that have been pre-emptively discarded:

  • Custom SQL to modify tables on-the-fly. Partly because this won’t scale and partly because it’s too much of a hack.
  • Schema-less solutions like NoSQL. I have nothing against them, but they’re still not a good fit. Ultimately this data is typed, and the possibility exists of using a third-party reporting application.
  • JSONField, as listed above, as it’s not going to work well with queries.

回答 0

到目前为止,有四种可用的方法,其中两种需要特定的存储后端:

  1. Django-eav(不再提供原始软件包,但有一些繁荣的fork

    该解决方案基于实体属性值数据模型,实质上,它使用多个表来存储对象的动态属性。关于此解决方案的重要之处在于:

    • 使用几个纯净而简单的Django模型来表示动态字段,这使得它易于理解并且与数据库无关。
    • 允许您使用以下简单命令将动态属性存储有效地附加/分离到Django模型:

      eav.unregister(Encounter)
      eav.register(Patient)
    • 与Django admin很好地集成 ;

    • 同时真正强大。

    缺点:

    • 不太有效。这更多地是对EAV模式本身的一种批评,该模式要求将数据从列格式手动合并到模型中的一组键值对。
    • 难以维护。维护数据完整性需要多列唯一键约束,这在某些数据库上可能效率不高。
    • 您将需要选择其中一个分支,因为不再维护官方软件包,也没有明确的领导者。

    用法非常简单:

    import eav
    from app.models import Patient, Encounter
    
    eav.register(Encounter)
    eav.register(Patient)
    Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
    Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
    Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
    
    self.yes = EnumValue.objects.create(value='yes')
    self.no = EnumValue.objects.create(value='no')
    self.unkown = EnumValue.objects.create(value='unkown')
    ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
    ynu.enums.add(self.yes)
    ynu.enums.add(self.no)
    ynu.enums.add(self.unkown)
    
    Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
                                           enum_group=ynu)
    
    # When you register a model within EAV,
    # you can access all of EAV attributes:
    
    Patient.objects.create(name='Bob', eav__age=12,
                               eav__fever=no, eav__city='New York',
                               eav__country='USA')
    # You can filter queries based on their EAV fields:
    
    query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
    query2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)
  2. PostgreSQL中的Hstore,JSON或JSONB字段

    PostgreSQL支持几种更复杂的数据类型。大多数组件都通过第三方程序包得到支持,但是近年来Django已将它们引入django.contrib.postgres.fields中。

    HStoreField

    Django-hstore最初是第三方软件包,但是Django 1.8将HStoreField作为内置组件以及其他几种PostgreSQL支持的字段类型添加了。

    从某种意义上说,这种方法是好的,它可以让您充分利用两个领域:动态字段和关系数据库。但是,hstore 并不是理想的性能选择,特别是如果您最终要在一个字段中存储数千个项目时。它还仅支持值字符串。

    #app/models.py
    from django.contrib.postgres.fields import HStoreField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = models.HStoreField(db_index=True)

    在Django的shell中,您可以像这样使用它:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': '1', 'b': '2'}
               )
    >>> instance.data['a']
    '1'        
    >>> empty = Something.objects.create(name='empty')
    >>> empty.data
    {}
    >>> empty.data['a'] = '1'
    >>> empty.save()
    >>> Something.objects.get(name='something').data['a']
    '1'

    您可以针对hstore字段发出索引查询:

    # equivalence
    Something.objects.filter(data={'a': '1', 'b': '2'})
    
    # subset by key/value mapping
    Something.objects.filter(data__a='1')
    
    # subset by list of keys
    Something.objects.filter(data__has_keys=['a', 'b'])
    
    # subset by single key
    Something.objects.filter(data__has_key='a')    

    JSONField

    JSON / JSONB字段支持任何JSON可编码的数据类型,不仅是键/值对,而且比Hstore更快,并且(对于JSONB)更紧凑。一些软件包实现了JSON / JSONB字段,包括django-pgfields,但是从Django 1.9开始,JSONField是使用JSONB进行存储的内置方法。 JSONFieldHStoreField相似,并且在使用大字典时可能会表现更好。它还支持字符串以外的类型,例如整数,布尔值和嵌套字典。

    #app/models.py
    from django.contrib.postgres.fields import JSONField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = JSONField(db_index=True)

    在外壳中创建:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': 1, 'b': 2, 'nested': {'c':3}}
               )

    索引查询与HStoreField几乎相同,除了可以嵌套。复杂索引可能需要手动创建(或脚本迁移)。

    >>> Something.objects.filter(data__a=1)
    >>> Something.objects.filter(data__nested__c=3)
    >>> Something.objects.filter(data__has_key='a')
  3. Django MongoDB

    或其他NoSQL Django改编版-借助它们,您可以拥有完全动态的模型。

    NoSQL Django库很棒,但是请记住它们不是100%与Django兼容的,例如,要从标准Django 迁移到Django-nonrel,您将需要用ListField替换ManyToMany 。

    看看这个Django MongoDB示例:

    from djangotoolbox.fields import DictField
    
    class Image(models.Model):
        exif = DictField()
    ...
    
    >>> image = Image.objects.create(exif=get_exif_data(...))
    >>> image.exif
    {u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}

    您甚至可以创建任何Django模型的嵌入式列表

    class Container(models.Model):
        stuff = ListField(EmbeddedModelField())
    
    class FooModel(models.Model):
        foo = models.IntegerField()
    
    class BarModel(models.Model):
        bar = models.CharField()
    ...
    
    >>> Container.objects.create(
        stuff=[FooModel(foo=42), BarModel(bar='spam')]
    )
  4. Django-mutant:基于syncdb和South-hooks的动态模型

    Django-mutant实现了完全动态的外键和m2m字段。灵感来自于Will Hardy和Michael Hall 令人难以置信但有些骇人听闻的解决方案。

    所有这些都基于Django South hooks,根据Will Hardy在DjangoCon 2011上的演讲 (观看!)仍然很健壮并已在生产中进行了测试(相关源代码)。

    首先实现这一点的迈克尔·霍尔

    是的,这是神奇的事情,通过这些方法,您可以使用任何关系数据库后端来实现完全动态的Django应用程序,模型和字段。但是要花多少钱呢?大量使用会损害应用的稳定性吗?这些是要考虑的问题。您需要确保保持适当的锁定,以允许同时进行数据库更改请求。

    如果使用的是Michael Halls lib,则代码将如下所示:

    from dynamo import models
    
    test_app, created = models.DynamicApp.objects.get_or_create(
                          name='dynamo'
                        )
    test, created = models.DynamicModel.objects.get_or_create(
                      name='Test',
                      verbose_name='Test Model',
                      app=test_app
                   )
    foo, created = models.DynamicModelField.objects.get_or_create(
                      name = 'foo',
                      verbose_name = 'Foo Field',
                      model = test,
                      field_type = 'dynamiccharfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Foo',
                   )
    bar, created = models.DynamicModelField.objects.get_or_create(
                      name = 'bar',
                      verbose_name = 'Bar Field',
                      model = test,
                      field_type = 'dynamicintegerfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Bar',
                   )

As of today, there are four available approaches, two of them requiring a certain storage backend:

  1. Django-eav (the original package is no longer mantained but has some thriving forks)

    This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

    • uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
    • allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:

      eav.unregister(Encounter)
      eav.register(Patient)
      
    • Nicely integrates with Django admin;

    • At the same time being really powerful.

    Downsides:

    • Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
    • Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
    • You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.

    The usage is pretty straightforward:

    import eav
    from app.models import Patient, Encounter
    
    eav.register(Encounter)
    eav.register(Patient)
    Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
    Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
    Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
    
    self.yes = EnumValue.objects.create(value='yes')
    self.no = EnumValue.objects.create(value='no')
    self.unkown = EnumValue.objects.create(value='unkown')
    ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
    ynu.enums.add(self.yes)
    ynu.enums.add(self.no)
    ynu.enums.add(self.unkown)
    
    Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
                                           enum_group=ynu)
    
    # When you register a model within EAV,
    # you can access all of EAV attributes:
    
    Patient.objects.create(name='Bob', eav__age=12,
                               eav__fever=no, eav__city='New York',
                               eav__country='USA')
    # You can filter queries based on their EAV fields:
    
    query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
    query2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)
    
  2. Hstore, JSON or JSONB fields in PostgreSQL

    PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

    HStoreField:

    Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

    This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    #app/models.py
    from django.contrib.postgres.fields import HStoreField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = models.HStoreField(db_index=True)
    

    In Django’s shell you can use it like this:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': '1', 'b': '2'}
               )
    >>> instance.data['a']
    '1'        
    >>> empty = Something.objects.create(name='empty')
    >>> empty.data
    {}
    >>> empty.data['a'] = '1'
    >>> empty.save()
    >>> Something.objects.get(name='something').data['a']
    '1'
    

    You can issue indexed queries against hstore fields:

    # equivalence
    Something.objects.filter(data={'a': '1', 'b': '2'})
    
    # subset by key/value mapping
    Something.objects.filter(data__a='1')
    
    # subset by list of keys
    Something.objects.filter(data__has_keys=['a', 'b'])
    
    # subset by single key
    Something.objects.filter(data__has_key='a')    
    

    JSONField:

    JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    #app/models.py
    from django.contrib.postgres.fields import JSONField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = JSONField(db_index=True)
    

    Creating in the shell:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': 1, 'b': 2, 'nested': {'c':3}}
               )
    

    Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    >>> Something.objects.filter(data__a=1)
    >>> Something.objects.filter(data__nested__c=3)
    >>> Something.objects.filter(data__has_key='a')
    
  3. Django MongoDB

    Or other NoSQL Django adaptations — with them you can have fully dynamic models.

    NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.

    Checkout this Django MongoDB example:

    from djangotoolbox.fields import DictField
    
    class Image(models.Model):
        exif = DictField()
    ...
    
    >>> image = Image.objects.create(exif=get_exif_data(...))
    >>> image.exif
    {u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}
    

    You can even create embedded lists of any Django models:

    class Container(models.Model):
        stuff = ListField(EmbeddedModelField())
    
    class FooModel(models.Model):
        foo = models.IntegerField()
    
    class BarModel(models.Model):
        bar = models.CharField()
    ...
    
    >>> Container.objects.create(
        stuff=[FooModel(foo=42), BarModel(bar='spam')]
    )
    
  4. Django-mutant: Dynamic models based on syncdb and South-hooks

    Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.

    All of these are based on Django South hooks, which, according to Will Hardy’s talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).

    First to implement this was Michael Hall.

    Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.

    If you are using Michael Halls lib, your code will look like this:

    from dynamo import models
    
    test_app, created = models.DynamicApp.objects.get_or_create(
                          name='dynamo'
                        )
    test, created = models.DynamicModel.objects.get_or_create(
                      name='Test',
                      verbose_name='Test Model',
                      app=test_app
                   )
    foo, created = models.DynamicModelField.objects.get_or_create(
                      name = 'foo',
                      verbose_name = 'Foo Field',
                      model = test,
                      field_type = 'dynamiccharfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Foo',
                   )
    bar, created = models.DynamicModelField.objects.get_or_create(
                      name = 'bar',
                      verbose_name = 'Bar Field',
                      model = test,
                      field_type = 'dynamicintegerfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Bar',
                   )
    

回答 1

我一直在努力推动django-dynamo的构想。该项目仍未记录在案,但您可以在https://github.com/charettes/django-mutant中阅读代码

实际上FK和M2M字段(请参阅contrib.related)也可以工作,甚至可以为自己的自定义字段定义包装器。

还支持模型选项,例如unique_together和ordering以及Model基类,因此您可以将模型代理,抽象或混合作为子类。

我实际上正在研究一种非内存锁定机制,以确保可以在多个django运行实例之间共享模型定义,同时防止使用过时的定义。

该项目仍处于Alpha状态,但这是我的一个项目的基础技术,因此我必须将其投入生产准备。大型计划还支持django-nonrel,因此我们可以利用mongodb驱动程序。

I’ve been working on pushing the django-dynamo idea further. The project is still undocumented but you can read the code at https://github.com/charettes/django-mutant.

Actually FK and M2M fields (see contrib.related) also work and it’s even possible to define wrapper for your own custom fields.

There’s also support for model options such as unique_together and ordering plus Model bases so you can subclass model proxy, abstract or mixins.

I’m actually working on a not in-memory lock mechanism to make sure model definitions can be shared accross multiple django running instances while preventing them using obsolete definition.

The project is still very alpha but it’s a cornerstone technology for one of my project so I’ll have to take it to production ready. The big plan is supporting django-nonrel also so we can leverage the mongodb driver.


回答 2

进一步的研究表明,这是实体属性值设计模式的一种特殊情况,该模式已通过几个软件包为Django实现。

首先,在PyPi上有一个原始的eav-django项目。

其次,第一个项目的最新分支是django-eav,它主要是一个重构,允许将EAV与django自己的模型或第三方应用程序中的模型一起使用。

Further research reveals that this is a somewhat special case of Entity Attribute Value design pattern, which has been implemented for Django by a couple of packages.

First, there’s the original eav-django project, which is on PyPi.

Second, there’s a more recent fork of the first project, django-eav which is primarily a refactor to allow use of EAV with django’s own models or models in third-party apps.