问题:为Python项目构建Docker映像时如何避免重新安装软件包?
我的Dockerfile就像
FROM my/base
ADD . /srv
RUN pip install -r requirements.txt
RUN python setup.py install
ENTRYPOINT ["run_server"]
每次构建新映像时,都必须重新安装依赖项,这在我所在的地区可能非常慢。
我想到的cache
已安装软件包的一种方法是my/base
用新的映像覆盖该映像,如下所示:
docker build -t new_image_1 .
docker tag new_image_1 my/base
因此,下一次我使用此Dockerfile进行构建时,我/基础已经安装了一些软件包。
但是此解决方案有两个问题:
- 并非总是可以覆盖基本图像
- 随着新图像的分层,基础图像变得越来越大
那么,我可以使用哪种更好的解决方案来解决此问题?
编辑##:
有关我机器上的docker的一些信息:
☁ test docker version
Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070
☁ test docker info
Containers: 0
Images: 56
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Dirs: 56
Execution Driver: native-0.2
Kernel Version: 3.13.0-29-generic
WARNING: No swap limit support
回答 0
尝试构建一个如下所示的Dockerfile:
FROM my/base
WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
RUN python setup.py install
ENTRYPOINT ["run_server"]
Docker将在pip安装期间使用缓存,只要您不对进行任何更改requirements.txt
,无论位于的其他代码文件是否.
已更改。这是一个例子。
这是一个简单的Hello, World!
程序:
$ tree
.
├── Dockerfile
├── requirements.txt
└── run.py
0 directories, 3 file
# Dockerfile
FROM dockerfile/python
WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
CMD python /srv/run.py
# requirements.txt
pytest==2.3.4
# run.py
print("Hello, World")
docker build的输出:
Step 1 : WORKDIR /srv
---> Running in 22d725d22e10
---> 55768a00fd94
Removing intermediate container 22d725d22e10
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> 968a7c3a4483
Removing intermediate container 5f4e01f290fd
Step 3 : RUN pip install -r requirements.txt
---> Running in 08188205e92b
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest
....
Cleaning up...
---> bf5c154b87c9
Removing intermediate container 08188205e92b
Step 4 : ADD . /srv
---> 3002a3a67e72
Removing intermediate container 83defd1851d0
Step 5 : CMD python /srv/run.py
---> Running in 11e69b887341
---> 5c0e7e3726d6
Removing intermediate container 11e69b887341
Successfully built 5c0e7e3726d6
让我们修改一下run.py
:
# run.py
print("Hello, Python")
尝试再次构建,下面是输出:
Sending build context to Docker daemon 5.12 kB
Sending build context to Docker daemon
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> Using cache
---> 968a7c3a4483
Step 3 : RUN pip install -r requirements.txt
---> Using cache
---> bf5c154b87c9
Step 4 : ADD . /srv
---> 9cc7508034d6
Removing intermediate container 0d7cf71eb05e
Step 5 : CMD python /srv/run.py
---> Running in f25c21135010
---> 4ffab7bc66c7
Removing intermediate container f25c21135010
Successfully built 4ffab7bc66c7
如上所示,这次docker在构建期间使用了缓存。现在,让我们更新requirements.txt
:
# requirements.txt
pytest==2.3.4
ipython
以下是docker build的输出:
Sending build context to Docker daemon 5.12 kB
Sending build context to Docker daemon
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> b6c19f0643b5
Removing intermediate container a4d9cb37dff0
Step 3 : RUN pip install -r requirements.txt
---> Running in 4b7a85a64c33
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest
Downloading/unpacking ipython (from -r requirements.txt (line 2))
Downloading/unpacking py>=1.4.12 (from pytest==2.3.4->-r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/py/setup.py) egg_info for package py
Installing collected packages: pytest, ipython, py
Running setup.py install for pytest
Installing py.test script to /usr/local/bin
Installing py.test-2.7 script to /usr/local/bin
Running setup.py install for py
Successfully installed pytest ipython py
Cleaning up...
---> 23a1af3df8ed
Removing intermediate container 4b7a85a64c33
Step 4 : ADD . /srv
---> d8ae270eca35
Removing intermediate container 7f003ebc3179
Step 5 : CMD python /srv/run.py
---> Running in 510359cf9e12
---> e42fc9121a77
Removing intermediate container 510359cf9e12
Successfully built e42fc9121a77
请注意,在pip安装期间docker如何不使用缓存。如果它不起作用,请检查您的码头工人版本。
Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070
回答 1
为了最大程度地减少网络活动,您可以指向pip
主机上的缓存目录。
在主机的pip缓存目录绑定已安装到容器的pip缓存目录中的情况下运行docker容器。docker run
命令应如下所示:
docker run -v $HOME/.cache/pip-docker/:/root/.cache/pip image_1
然后在您的Dockerfile中,将您的需求作为ENTRYPOINT
语句(或CMD
语句)的一部分而不是RUN
命令来安装。这很重要,因为(如注释中所指出),在映像构建期间(RUN
执行语句时)该安装不可用。Docker文件应如下所示:
FROM my/base
ADD . /srv
ENTRYPOINT ["sh", "-c", "pip install -r requirements.txt && python setup.py install && run_server"]
回答 2
我知道这个问题已经有一些流行的答案。但是,有一种更新的方式可以为程序包管理器缓存文件。我认为,当BuildKit变得更加标准时,这可能是一个很好的答案。
从Docker 18.09开始,对BuildKit进行了实验性支持。BuildKit在Dockerfile中增加了对一些新功能的支持,包括对将外部卷安装到RUN
步骤中的实验性支持。这使我们可以为诸如之类的东西创建缓存$HOME/.cache/pip/
。
我们将使用以下requirements.txt
文件作为示例:
Click==7.0
Django==2.2.3
django-appconf==1.0.3
django-compressor==2.3
django-debug-toolbar==2.0
django-filter==2.2.0
django-reversion==3.0.4
django-rq==2.1.0
pytz==2019.1
rcssmin==1.0.6
redis==3.3.4
rjsmin==1.1.0
rq==1.1.0
six==1.12.0
sqlparse==0.3.0
Python的典型示例Dockerfile
如下所示:
FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip install -r requirements.txt
COPY . /usr/src/app
使用DOCKER_BUILDKIT
环境变量启用BuildKit后,我们可以pip
在大约65秒内构建未缓存的步骤:
$ export DOCKER_BUILDKIT=1
$ docker build -t test .
[+] Building 65.6s (10/10) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> [internal] load build context 0.6s
=> => transferring context: 899.99kB 0.6s
=> CACHED [internal] helper image for file operations 0.0s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.5s
=> [3/4] RUN pip install -r requirements.txt 61.3s
=> [4/4] COPY . /usr/src/app 1.3s
=> exporting to image 1.2s
=> => exporting layers 1.2s
=> => writing image sha256:d66a2720e81530029bf1c2cb98fb3aee0cffc2f4ea2aa2a0760a30fb718d7f83 0.0s
=> => naming to docker.io/library/test 0.0s
现在,让我们添加实验性标头并修改RUN
缓存Python包的步骤:
# syntax=docker/dockerfile:experimental
FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
COPY . /usr/src/app
继续并立即进行另一个构建。它应该花费相同的时间。但这一次是在新的缓存安装中缓存Python软件包:
$ docker build -t pythontest .
[+] Building 60.3s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:experimental 0.5s
=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3 0.0s
=> [internal] load .dockerignore 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> [internal] load build context 0.7s
=> => transferring context: 899.99kB 0.6s
=> CACHED [internal] helper image for file operations 0.0s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.6s
=> [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt 53.3s
=> [4/4] COPY . /usr/src/app 2.6s
=> exporting to image 1.2s
=> => exporting layers 1.2s
=> => writing image sha256:0b035548712c1c9e1c80d4a86169c5c1f9e94437e124ea09e90aea82f45c2afc 0.0s
=> => naming to docker.io/library/test 0.0s
大约60秒。类似于我们的第一个版本。
对进行一些小的更改requirements.txt
(例如在两个软件包之间添加新行)以强制高速缓存无效并再次运行:
$ docker build -t pythontest .
[+] Building 15.9s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:experimental 1.1s
=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> CACHED [internal] helper image for file operations 0.0s
=> [internal] load build context 0.7s
=> => transferring context: 899.99kB 0.7s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.6s
=> [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt 8.8s
=> [4/4] COPY . /usr/src/app 2.1s
=> exporting to image 1.1s
=> => exporting layers 1.1s
=> => writing image sha256:fc84cd45482a70e8de48bfd6489e5421532c2dd02aaa3e1e49a290a3dfb9df7c 0.0s
=> => naming to docker.io/library/test 0.0s
仅约16秒!
由于不再下载所有Python软件包,因此我们获得了这种加速。它们由程序包管理器缓存(pip
在这种情况下),并存储在缓存卷装载中。卷安装已提供给运行步骤,以便pip
可以重复使用我们已经下载的软件包。这发生在任何Docker层缓存之外。
大的时候收益应该更好requirements.txt
。
笔记:
- 这是实验性的Dockerfile语法,应这样对待。您现在可能不想在生产中使用此版本。
目前,BuildKit的东西在Docker Compose或其他直接使用Docker API的工具下不起作用。从1.25.0开始,Docker Compose中已对此提供支持。请参阅如何使用docker-compose启用BuildKit?- 目前没有任何直接接口可用于管理缓存。当您执行一次操作时,它将被清除
docker system prune -a
。
希望这些功能可以使其融入Docker进行构建,并且BuildKit将成为默认设置。如果/当发生这种情况时,我将尝试更新此答案。
回答 3
我发现更好的方法是将Python site-packages目录添加为卷。
services:
web:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/code
- /usr/local/lib/python2.7/site-packages/
这样,我可以只安装新的库而不必进行完全重建。
编辑:忽略此答案,以上jkukul的答案为我工作。我的意图是缓存site-packages文件夹。那看起来应该更像:
volumes:
- .:/code
- ./cached-packages:/usr/local/lib/python2.7/site-packages/
缓存下载文件夹虽然更干净。这也可以缓存轮子,因此可以正确地完成任务。