如何腌制或存储Jupyter(IPython)笔记本会话以供以后使用

问题:如何腌制或存储Jupyter(IPython)笔记本会话以供以后使用

假设我正在Jupyter / Ipython笔记本中进行较大的数据分析,并且完成了大量耗时的计算。然后,由于某种原因,我必须关闭jupyter本地服务器I,但是我想稍后再进行分析,而不必再次进行所有耗时的计算。


我想什么爱做的是pickle或存储整个Jupyter会话(所有大熊猫dataframes,np.arrays,变量,…),所以我可以放心地关闭服务器知道我可以在完全相同的状态返回到我的会话之前。

从技术上讲甚至可行吗?有没有我忽略的内置功能?


编辑:根据这个答案,有一种%store 魔法应该是“轻型泡菜”。但是,您必须像这样手动存储变量:

#inside a ipython/nb session
foo = "A dummy string"
%store foo
关闭种子,重新启动内核#r
%store -r foo进行刷新
print(foo) # "A dummy string"

这与我想要的功能相当接近,但是由于必须手动执行并且无法区分不同的会话,因此它的用处不大。

Let’s say I am doing a larger data analysis in Jupyter/Ipython notebook with lots of time consuming computations done. Then, for some reason, I have to shut down the jupyter local server I, but I would like to return to doing the analysis later, without having to go through all the time-consuming computations again.


What I would like love to do is pickle or store the whole Jupyter session (all pandas dataframes, np.arrays, variables, …) so I can safely shut down the server knowing I can return to my session in exactly the same state as before.

Is it even technically possible? Is there a built-in functionality I overlooked?


EDIT: based on this answer there is a %store magic which should be “lightweight pickle”. However you have to store the variables manually like so:

#inside a ipython/nb session
foo = "A dummy string"
%store foo
closing seesion, restarting kernel
%store -r foo # r for refresh
print(foo) # "A dummy string"

which is fairly close to what I would want, but having to do it manually and being unable to distinguish between different sessions makes it less useful.


回答 0

我认为迪尔很好地回答了您的问题。

pip install dill

保存笔记本会话:

import dill
dill.dump_session('notebook_env.db')

恢复笔记本会话:

import dill
dill.load_session('notebook_env.db')

资源

I think Dill answers your question well.

pip install dill

Save a Notebook session:

import dill
dill.dump_session('notebook_env.db')

Restore a Notebook session:

import dill
dill.load_session('notebook_env.db')

Source


回答 1

(我宁愿评论而不愿将其作为实际答案,但我需要更多的声誉才能发表评论。)

您可以系统地存储大多数类似数据的变量。我通常要做的是将所有数据帧,数组等存储在pandas.HDFStore中。在笔记本的开头,声明

backup = pd.HDFStore('backup.h5')

然后在产生它们时存储任何新变量

backup['var1'] = var1

最后,可能是一个好主意

backup.close()

在关闭服务器之前。下次您想继续使用笔记本时:

backup = pd.HDFStore('backup.h5')
var1 = backup['var1']

说实话,我也更喜欢ipython Notebook中的内置功能。您不能以这种方式保存所有内容(例如,对象,连接),并且很难用大量样板代码来组织笔记本。

(I’d rather comment than offer this as an actual answer, but I need more reputation to comment.)

You can store most data-like variables in a systematic way. What I usually do is store all dataframes, arrays, etc. in pandas.HDFStore. At the beginning of the notebook, declare

backup = pd.HDFStore('backup.h5')

and then store any new variables as you produce them

backup['var1'] = var1

At the end, probably a good idea to do

backup.close()

before turning off the server. The next time you want to continue with the notebook:

backup = pd.HDFStore('backup.h5')
var1 = backup['var1']

Truth be told, I’d prefer built-in functionality in ipython notebook, too. You can’t save everything this way (e.g. objects, connections), and it’s hard to keep the notebook organized with so much boilerplate codes.


回答 2

这个问题涉及到:如何在IPython Notebook中进行缓存?

为了保存单个单元的结果,缓存魔术派上了用场。

%%cache longcalc.pkl var1 var2 var3
var1 = longcalculation()
....

重新运行笔记本时,将从高速缓存中加载此单元格的内容。

这不能完全回答您的问题,但是对于所有冗长的计算结果都可以快速恢复的情况,这可能就足够了。这对我来说是一个可行的解决方案,同时点击了笔记本顶部的“ run-all”按钮。

缓存魔法救不了整个笔记本的状态还没有。据我所知,还没有其他系统可以恢复“笔记本”。这将需要保存python内核的所有历史记录。加载笔记本电脑并连接到内核后,应加载此信息。

This question is related to: How to cache in IPython Notebook?

To save the results of individual cells, the caching magic comes in handy.

%%cache longcalc.pkl var1 var2 var3
var1 = longcalculation()
....

When rerunning the notebook, the contents of this cell is loaded from the cache.

This is not exactly answering your question, but it might be enough to when the results of all the lengthy calculations are recovered fast. This in combination of hitting the run-all button on top of the notebook is for me a workable solution.

The cache magic cannot save the state of a whole notebook yet. To my knowledge there is no other system yet to resume a “notebook”. This would require to save all the history of the python kernel. After loading the notebook, and connecting to a kernel, this information should be loaded.