使用Django 1.7加载初始数据和数据迁移

问题:使用Django 1.7加载初始数据和数据迁移

我最近从Django 1.6切换到1.7,并且开始使用迁移功能(我从未使用过South)。

在1.7之前,我曾经用fixture/initial_data.json文件加载初始数据,该文件是用python manage.py syncdb命令加载的(在创建数据库时)。

现在,我开始使用迁移,并且不赞成使用此行为:

如果应用程序使用迁移,则不会自动加载固定装置。由于Django 2.0中的应用程序需要迁移,因此该行为被视为已弃用。如果要加载应用程序的初始数据,请考虑在数据迁移中进行。(https://docs.djangoproject.com/zh-CN/1.7/howto/initial-data/#automatically-loading-initial-data-fixtures

官方文件并没有对如何做一个明显的例子,所以我的问题是:

使用数据迁移导入此类初始数据的最佳方法是什么:

  1. 通过多次调用编写Python代码mymodel.create(...)
  2. 使用或编写Django函数(如调用loaddata)从JSON固定文件加载数据。

我更喜欢第二种选择。

我不想使用South,因为Django现在似乎可以本地使用。

I recently switched from Django 1.6 to 1.7, and I began using migrations (I never used South).

Before 1.7, I used to load initial data with a fixture/initial_data.json file, which was loaded with the python manage.py syncdb command (when creating the database).

Now, I started using migrations, and this behavior is deprecated :

If an application uses migrations, there is no automatic loading of fixtures. Since migrations will be required for applications in Django 2.0, this behavior is considered deprecated. If you want to load initial data for an app, consider doing it in a data migration. (https://docs.djangoproject.com/en/1.7/howto/initial-data/#automatically-loading-initial-data-fixtures)

The official documentation does not have a clear example on how to do it, so my question is :

What is the best way to import such initial data using data migrations :

  1. Write Python code with multiple calls to mymodel.create(...),
  2. Use or write a Django function (like calling loaddata) to load data from a JSON fixture file.

I prefer the second option.

I don’t want to use South, as Django seems to be able to do it natively now.


回答 0

更新:有关此解决方案可能导致的问题,请参见下面的@GwynBleidD注释,有关对将来的模型更改更持久的方法,请参见下面的@Rockallite答案。


假设您有一个夹具文件 <yourapp>/fixtures/initial_data.json

  1. 创建您的空迁移:

    在Django 1.7中:

    python manage.py makemigrations --empty <yourapp>

    在Django 1.8+中,您可以提供一个名称:

    python manage.py makemigrations --empty <yourapp> --name load_intial_data
  2. 编辑您的迁移文件 <yourapp>/migrations/0002_auto_xxx.py

    2.1。自定义实现,受Django’ loaddata(初始答案)启发:

    import os
    from sys import path
    from django.core import serializers
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
    
        fixture = open(fixture_file, 'rb')
        objects = serializers.deserialize('json', fixture, ignorenonexistent=True)
        for obj in objects:
            obj.save()
        fixture.close()
    
    def unload_fixture(apps, schema_editor):
        "Brutally deleting all entries for this model..."
    
        MyModel = apps.get_model("yourapp", "ModelName")
        MyModel.objects.all().delete()
    
    class Migration(migrations.Migration):  
    
        dependencies = [
            ('yourapp', '0001_initial'),
        ]
    
        operations = [
            migrations.RunPython(load_fixture, reverse_code=unload_fixture),
        ]

    2.2。一个更简单的解决方案load_fixture(根据@juliocesar的建议):

    from django.core.management import call_command
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
        call_command('loaddata', fixture_file) 

    如果要使用自定义目录,则很有用。

    2.3。最简单的:调用loaddataapp_label从将加载器具<yourapp>fixtures目录自动:

    from django.core.management import call_command
    
    fixture = 'initial_data'
    
    def load_fixture(apps, schema_editor):
        call_command('loaddata', fixture, app_label='yourapp') 

    如果您未指定app_label,loaddata会尝试fixture所有应用程序的夹具目录(您可能不想要)中加载文件名。

  3. 运行

    python manage.py migrate <yourapp>

Update: See @GwynBleidD’s comment below for the problems this solution can cause, and see @Rockallite’s answer below for an approach that’s more durable to future model changes.


Assuming you have a fixture file in <yourapp>/fixtures/initial_data.json

  1. Create your empty migration:

    In Django 1.7:

    python manage.py makemigrations --empty <yourapp>
    

    In Django 1.8+, you can provide a name:

    python manage.py makemigrations --empty <yourapp> --name load_intial_data
    
  2. Edit your migration file <yourapp>/migrations/0002_auto_xxx.py

    2.1. Custom implementation, inspired by Django’ loaddata (initial answer):

    import os
    from sys import path
    from django.core import serializers
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
    
        fixture = open(fixture_file, 'rb')
        objects = serializers.deserialize('json', fixture, ignorenonexistent=True)
        for obj in objects:
            obj.save()
        fixture.close()
    
    def unload_fixture(apps, schema_editor):
        "Brutally deleting all entries for this model..."
    
        MyModel = apps.get_model("yourapp", "ModelName")
        MyModel.objects.all().delete()
    
    class Migration(migrations.Migration):  
    
        dependencies = [
            ('yourapp', '0001_initial'),
        ]
    
        operations = [
            migrations.RunPython(load_fixture, reverse_code=unload_fixture),
        ]
    

    2.2. A simpler solution for load_fixture (per @juliocesar’s suggestion):

    from django.core.management import call_command
    
    fixture_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), '../fixtures'))
    fixture_filename = 'initial_data.json'
    
    def load_fixture(apps, schema_editor):
        fixture_file = os.path.join(fixture_dir, fixture_filename)
        call_command('loaddata', fixture_file) 
    

    Useful if you want to use a custom directory.

    2.3. Simplest: calling loaddata with app_label will load fixtures from the <yourapp>‘s fixtures dir automatically :

    from django.core.management import call_command
    
    fixture = 'initial_data'
    
    def load_fixture(apps, schema_editor):
        call_command('loaddata', fixture, app_label='yourapp') 
    

    If you don’t specify app_label, loaddata will try to load fixture filename from all apps fixtures directories (which you probably don’t want).

  3. Run it

    python manage.py migrate <yourapp>
    

回答 1

精简版

您应使用loaddata的数据迁移直接管理命令。

# Bad example for a data migration
from django.db import migrations
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # No, it's wrong. DON'T DO THIS!
    call_command('loaddata', 'your_data.json', app_label='yourapp')


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

长版

loaddata利用利用django.core.serializers.python.Deserializer最新模型反序列化迁移中的历史数据。那是不正确的行为。

例如,假设有一个数据迁移,该数据迁移利用loaddata管理命令从固定装置加载数据,并且该数据迁移已应用于您的开发环境。

以后,您决定将新的必填字段添加到相应的模型中,这样就可以对更新后的模型进行新迁移(并可能在./manage.py makemigrations提示您时向新字段提供一次性值)。

您运行下一个迁移,一切顺利。

最后,开发完Django应用程序,然后将其部署在生产服务器上。现在是时候在生产环境上从头开始运行整个迁移了。

但是,数据迁移失败。这是因为来自loaddata命令的反序列化模型(代表当前代码)无法与添加的新必填字段的空数据一起保存。原始灯具缺少必要的数据!

但是,即使使用新字段所需的数据更新了灯具,数据迁移仍然会失败。在运行数据迁移时,尚未应用将相应列添加到数据库的下一次迁移。您无法将数据保存到不存在的列中!

结论:在数据迁移中,该loaddata命令引入了模型与数据库之间潜在的不一致。您绝对应该在数据迁移中直接使用它。

解决方案

loaddata命令依赖于django.core.serializers.python._get_model功能以从固定装置中获取相应的模型,该装置将返回模型的最新版本。我们需要对其进行Monkey修补,以便获得历史模型。

(以下代码适用于Django 1.8.x)

# Good example for a data migration
from django.db import migrations
from django.core.serializers import base, python
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # Save the old _get_model() function
    old_get_model = python._get_model

    # Define new _get_model() function here, which utilizes the apps argument to
    # get the historical version of a model. This piece of code is directly stolen
    # from django.core.serializers.python._get_model, unchanged. However, here it
    # has a different context, specifically, the apps variable.
    def _get_model(model_identifier):
        try:
            return apps.get_model(model_identifier)
        except (LookupError, TypeError):
            raise base.DeserializationError("Invalid model identifier: '%s'" % model_identifier)

    # Replace the _get_model() function on the module, so loaddata can utilize it.
    python._get_model = _get_model

    try:
        # Call loaddata command
        call_command('loaddata', 'your_data.json', app_label='yourapp')
    finally:
        # Restore old _get_model() function
        python._get_model = old_get_model


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

Short version

You should NOT use loaddata management command directly in a data migration.

# Bad example for a data migration
from django.db import migrations
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # No, it's wrong. DON'T DO THIS!
    call_command('loaddata', 'your_data.json', app_label='yourapp')


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

Long version

loaddata utilizes django.core.serializers.python.Deserializer which uses the most up-to-date models to deserialize historical data in a migration. That’s incorrect behavior.

For example, supposed that there is a data migration which utilizes loaddata management command to load data from a fixture, and it’s already applied on your development environment.

Later, you decide to add a new required field to the corresponding model, so you do it and make a new migration against your updated model (and possibly provide a one-off value to the new field when ./manage.py makemigrations prompts you).

You run the next migration, and all is well.

Finally, you’re done developing your Django application, and you deploy it on the production server. Now it’s time for you to run the whole migrations from scratch on the production environment.

However, the data migration fails. That’s because the deserialized model from loaddata command, which represents the current code, can’t be saved with empty data for the new required field you added. The original fixture lacks necessary data for it!

But even if you update the fixture with required data for the new field, the data migration still fails. When the data migration is running, the next migration which adds the corresponding column to the database, is not applied yet. You can’t save data to a column which does not exist!

Conclusion: in a data migration, the loaddata command introduces potential inconsistency between the model and the database. You should definitely NOT use it directly in a data migration.

The Solution

loaddata command relies on django.core.serializers.python._get_model function to get the corresponding model from a fixture, which will return the most up-to-date version of a model. We need to monkey-patch it so it gets the historical model.

(The following code works for Django 1.8.x)

# Good example for a data migration
from django.db import migrations
from django.core.serializers import base, python
from django.core.management import call_command


def load_fixture(apps, schema_editor):
    # Save the old _get_model() function
    old_get_model = python._get_model

    # Define new _get_model() function here, which utilizes the apps argument to
    # get the historical version of a model. This piece of code is directly stolen
    # from django.core.serializers.python._get_model, unchanged. However, here it
    # has a different context, specifically, the apps variable.
    def _get_model(model_identifier):
        try:
            return apps.get_model(model_identifier)
        except (LookupError, TypeError):
            raise base.DeserializationError("Invalid model identifier: '%s'" % model_identifier)

    # Replace the _get_model() function on the module, so loaddata can utilize it.
    python._get_model = _get_model

    try:
        # Call loaddata command
        call_command('loaddata', 'your_data.json', app_label='yourapp')
    finally:
        # Restore old _get_model() function
        python._get_model = old_get_model


class Migration(migrations.Migration):
    dependencies = [
        # Dependencies to other migrations
    ]

    operations = [
        migrations.RunPython(load_fixture),
    ]

回答 2

受一些评论(即n__o的评论)的启发,以及我initial_data.*在多个应用程序中散布了许多文件这一事实,我决定创建一个Django应用程序,以方便创建这些数据迁移。

使用Django的迁移夹具,你可以简单地运行下面的管理命令,它会通过所有搜索你INSTALLED_APPSinitial_data.*文件,并把它们变成数据迁移。

./manage.py create_initial_data_fixtures
Migrations for 'eggs':
  0002_auto_20150107_0817.py:
Migrations for 'sausage':
  Ignoring 'initial_data.yaml' - migration already exists.
Migrations for 'foo':
  Ignoring 'initial_data.yaml' - not migrated.

看到 安装/使用说明, django-migration-fixture

Inspired by some of the comments (namely n__o’s) and the fact that I have a lot of initial_data.* files spread out over multiple apps I decided to create a Django app that would facilitate the creation of these data migrations.

Using django-migration-fixture you can simply run the following management command and it will search through all your INSTALLED_APPS for initial_data.* files and turn them into data migrations.

./manage.py create_initial_data_fixtures
Migrations for 'eggs':
  0002_auto_20150107_0817.py:
Migrations for 'sausage':
  Ignoring 'initial_data.yaml' - migration already exists.
Migrations for 'foo':
  Ignoring 'initial_data.yaml' - not migrated.

See django-migration-fixture for install/usage instructions.


回答 3

为了给您的数据库一些初始数据,编写一个数据迁移。 在数据迁移中,使用RunPython函数加载数据。

不要编写任何loaddata命令,因为这种方式已被弃用。

您的数据迁移将仅运行一次。迁移是迁移的有序序列。运行003_xxxx.py迁移时,django迁移会在数据库中写入该应用已迁移到该版本(003)的信息,并将仅运行以下迁移。

In order to give your database some initial data, write a data migration. In the data migration, use the RunPython function to load your data.

Don’t write any loaddata command as this way is deprecated.

Your data migrations will be run only once. The migrations are an ordered sequence of migrations. When the 003_xxxx.py migrations is run, django migrations writes in the database that this app is migrated until this one (003), and will run the following migrations only.


回答 4

不幸的是,上面介绍的解决方案对我不起作用。我发现每次更改模型时都必须更新固定装置。理想情况下,我会写数据迁移来类似地修改创建的数据和夹具加载的数据。

为了方便起见,我编写了一个快速功能,它将在fixtures当前应用程序的目录中查找并加载夹具。将此功能放入与迁移中的字段匹配的模型历史记录中的迁移中。

The solutions presented above didn’t work for me unfortunately. I found that every time I change my models I have to update my fixtures. Ideally I would instead write data migrations to modify created data and fixture-loaded data similarly.

To facilitate this I wrote a quick function which will look in the fixtures directory of the current app and load a fixture. Put this function into a migration in the point of the model history that matches the fields in the migration.


回答 5

我认为固定装置有点不好。如果您的数据库经常更改,那么使其保持最新状态将很快成为噩梦。实际上,这不仅是我的观点,在《 Django的两个独家报道》一书中,它的解释要好得多。

相反,我将编写一个Python文件来提供初始设置。如果您还需要其他东西,我建议您去看看工厂男孩

如果需要迁移某些数据,则应使用数据迁移

还有关于使用固定装置的“燃烧固定装置,使用模型工厂”

In my opinion fixtures are a bit bad. If your database changes frequently, keeping them up-to-date will came a nightmare soon. Actually, it’s not only my opinion, in the book “Two Scoops of Django” it’s explained much better.

Instead I’ll write a Python file to provide initial setup. If you need something more I’ll suggest you look at Factory boy.

If you need to migrate some data you should use data migrations.

There’s also “Burn Your Fixtures, Use Model Factories” about using fixtures.


回答 6

在Django 2.1上,我想用初始数据加载某些模型(例如国家名称)。

但是我希望这种情况在执行初始迁移后立即自动发生。

因此,我认为拥有一个 sql/在每个应用程序中需要加载初始数据文件夹。

然后,在该sql/文件夹中,我将包含.sql带有所需DML的文件,以将初始数据加载到相应的模型中,例如:

INSERT INTO appName_modelName(fieldName)
VALUES
    ("country 1"),
    ("country 2"),
    ("country 3"),
    ("country 4");

为了更具描述性,这是包含sql/文件夹的应用程序的外观:

另外,我发现某些情况下需要按sql特定顺序执行脚本。因此,我决定为文件名加上一个连续的数字,如上图所示。

然后,我需要一种方法,SQLs可以自动在任何应用程序文件夹中加载可用的文件python manage.py migrate

因此,我创建了另一个名为的应用程序initial_data_migrations,然后将该应用程序添加到INSTALLED_APPSin settings.py文件列表中。然后,我在migrations里面创建了一个文件夹,并添加了一个名为run_sql_scripts.py实际上是自定义迁移)的文件。如下图所示:

我创建run_sql_scripts.py了它,以便它负责运行sql每个应用程序中可用的所有脚本。然后当有人跑步时将其解雇python manage.py migrate。此自定义migration还会将涉及的应用程序添加为依赖项,这样,它sql仅在所需的应用程序执行了0001_initial.py迁移之后才尝试运行语句(我们不想尝试针对不存在的表运行SQL语句)。

这是该脚本的来源:

import os
import itertools

from django.db import migrations
from YourDjangoProjectName.settings import BASE_DIR, INSTALLED_APPS

SQL_FOLDER = "/sql/"

APP_SQL_FOLDERS = [
    (os.path.join(BASE_DIR, app + SQL_FOLDER), app) for app in INSTALLED_APPS
    if os.path.isdir(os.path.join(BASE_DIR, app + SQL_FOLDER))
]

SQL_FILES = [
    sorted([path + file for file in os.listdir(path) if file.lower().endswith('.sql')])
    for path, app in APP_SQL_FOLDERS
]


def load_file(path):
    with open(path, 'r') as f:
        return f.read()


class Migration(migrations.Migration):

    dependencies = [
        (app, '__first__') for path, app in APP_SQL_FOLDERS
    ]

    operations = [
        migrations.RunSQL(load_file(f)) for f in list(itertools.chain.from_iterable(SQL_FILES))
    ]

我希望有人觉得这有帮助,对我来说效果很好!如果您有任何疑问,请告诉我。

注意:这可能不是最好的解决方案,因为我刚刚开始使用django,但是由于我在使用django进行搜索时没有找到太多信息,因此仍想与大家共享此“操作方法”。

On Django 2.1, I wanted to load some models (Like country names for example) with initial data.

But I wanted this to happen automatically right after the execution of initial migrations.

So I thought that it would be great to have an sql/ folder inside each application that required initial data to be loaded.

Then within that sql/ folder I would have .sql files with the required DMLs to load the initial data into the corresponding models, for example:

INSERT INTO appName_modelName(fieldName)
VALUES
    ("country 1"),
    ("country 2"),
    ("country 3"),
    ("country 4");

To be more descriptive, this is how an app containing an sql/ folder would look:

Also I found some cases where I needed the sql scripts to be executed in a specific order. So I decided to prefix the file names with a consecutive number as seen in the image above.

Then I needed a way to load any SQLs available inside any application folder automatically by doing python manage.py migrate.

So I created another application named initial_data_migrations and then I added this app to the list of INSTALLED_APPS in settings.py file. Then I created a migrations folder inside and added a file called run_sql_scripts.py (Which actually is a custom migration). As seen in the image below:

I created run_sql_scripts.py so that it takes care of running all sql scripts available within each application. This one is then fired when someone runs python manage.py migrate. This custom migration also adds the involved applications as dependencies, that way it attempts to run the sql statements only after the required applications have executed their 0001_initial.py migrations (We don’t want to attempt running a SQL statement against a non-existent table).

Here is the source of that script:

import os
import itertools

from django.db import migrations
from YourDjangoProjectName.settings import BASE_DIR, INSTALLED_APPS

SQL_FOLDER = "/sql/"

APP_SQL_FOLDERS = [
    (os.path.join(BASE_DIR, app + SQL_FOLDER), app) for app in INSTALLED_APPS
    if os.path.isdir(os.path.join(BASE_DIR, app + SQL_FOLDER))
]

SQL_FILES = [
    sorted([path + file for file in os.listdir(path) if file.lower().endswith('.sql')])
    for path, app in APP_SQL_FOLDERS
]


def load_file(path):
    with open(path, 'r') as f:
        return f.read()


class Migration(migrations.Migration):

    dependencies = [
        (app, '__first__') for path, app in APP_SQL_FOLDERS
    ]

    operations = [
        migrations.RunSQL(load_file(f)) for f in list(itertools.chain.from_iterable(SQL_FILES))
    ]

I hope someone finds this helpful, it worked just fine for me!. If you have any questions please let me know.

NOTE: This might not be the best solution since I’m just getting started with django, however still wanted to share this “How-to” with you all since I didn’t find much information while googling about this.