Question: How to remember the state of a variable

vlkomar · April 1, 2022, 11:25am

Hi everyone. I’ve faced a problem and need your help.
Let’s imagine, we’ve created a script which parses data from a website. It takes about 10 mins to parse it. You’ve written these parsing code into a cell. Then you’ve written down the results of the output data into a variable and have started to analyze, correct and clear this data using the previous variable.

Question: When you close the service (jupyter notebook) and returned again, how to remember the state of the variable so that you don’t come back to the first stage - scrapping 10 min data?

bollwyvl · April 1, 2022, 12:50pm

Two ideas, which use a database or files on disk:

database: in the case of fetching stuff, if it’s using requests, requests-cache is wonderful.
- import it, run one line, and then all the request.gets end up in a sqlite database in the current working directory.
- as long as nothing about a request, it will return the cached response, and not touch the internet (even if offline)
  - it caches based on the whole request, so you can partially invalidate it with an extra ? param
pile of files: for the general case, organize tasks with doit, and have them leave stuff on disk
- it’s like GNU make (which is also excellent and worth learning), but in python
  - there’s even a %doit magic
- with small tasks with good targets and file_dep, it can be really good at not doing rework: consider a classic scraping problem of search results
  - do one task that requests the first page, which includes how many pages
  - do one task per page of results
  - do one task per result

In general, once past fetching, the more the whole pipeline caches its steps along the way, the happier it will be when things fail. So:

requests.get(url) → raw/{some-id}.html
raw/{some-id}.html - parsed/{some-id}.json
parsed{some-id}.json - report/{some-id}.html

With this approach, it can scale to multiple processes/computers, invalidate just some of the work, etc.

There are much heavier-weight systems like airflow, luigi and dagster, but each of this is basically an ecosystem that requires learning.

Topic		Replies	Views
Jupyter notebook General community , jupyterhub	12	3778	June 8, 2020
Persistent computation on JH server after client disconnects Zero to JupyterHub on Kubernetes how-to , help-wanted	3	1189	September 26, 2022
Caching in Jupyterlab extension extensions community , jupyterlab , jupyterhub , help-wanted	0	743	July 16, 2022
Storing/retrieving variables between sessions with %store Notebook	0	1027	July 17, 2024
Sharing variables JupyterLab	0	74	May 7, 2025

Question: How to remember the state of a variable

Related topics