Install pip package such as backtrader in the jupyter/datascience-notebook image

Hi,

Sorry for a dumb question but what’s the best way to install a pip package such as backtrader?
I have been adding my own packages in the jupyter/datascience-notebook docker image using the Dockerfile like this:

RUN mamba install --quiet --yes \
    'ta-lib' && \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

But there are other packages that are not available in conda so I have to use pip…and
I don’t wish to blindly use the following:

RUN pip install --no-cache-dir backtrader

Any hints would be appreciated.

thanks

you can create an environment.yml such as:

channels:
  - conda-forge
dependencies:
  - ta-lib
  - pip
  - pip:
    - backtrader
    # others

and mamba env update -p ${NB_PYTHON_PREFIX} -f environment.yml to do it all in one step. I suppose you could use shell redirection to write it directly to a file within the docker container, but just using COPY might be more maintainable.

you’ll usually benefit from sourcing as many (especially binary) dependencies as possible from mamba, which can be a bit trial-and-error.

1 Like

thank you for your msg. I am a bit confused again. Where do I reference the environment.yml??

This is how I added some of the the packages:

FROM jupyter/datascience-notebook

USER root

RUN apt-get update && \
    apt-get install libpq-dev  -y && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

USER ${NB_UID}

# R packages including IRKernel which gets installed globally.
# Ref: https://github.com/jupyter/docker-stacks/tree/main/datascience-notebook
RUN mamba install --quiet --yes \
    'r-getpass' \
    'r-rpostgresql' && \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

# Install new packages in the default python3 environment
# Ref: https://github.com/jupyter/docker-stacks/tree/main/scipy-notebook
RUN mamba install --quiet --yes \
    'spacy' \
    'ta-lib' && \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

# Install backtrader using pip because it's not available in conda
RUN pip install --no-cache-dir backtrader

Is this ok or I could improve it by saving some space?

thanks

Can recommend putting everything in one environment.yml next to the Dockerfile so it’s just doing one solve.

COPY environment.yml /tmp/

RUN mamba env update --prefix ${NB_PYTHON_PREFIX} --file /tmp/environment.yml \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

Brilliant, thanks!

finally, how would the yaml look like for the given packages in my previous msg?

thanks again

i’ve already pretty much written the whole thing for you at this point.

just copy and paste the package names into something like Install pip package such as backtrader in the jupyter/datascience-notebook image - #2 by bollwyvl

1 Like

Hi there,

Unfortunately, I am getting the following error:

mamba update: error: argument -p/--prefix: expected one argument
The command '/bin/bash -o pipefail -c mamba env update --prefix ${NB_PYTHON_PREFIX} --file /tmp/environment.yml     mamba clean --all -f -y &&     fix-permissions "${CONDA_DIR}" &&     fix-permissions "/home/${NB_USER}"' returned a non-zero code: 2

Here is my environment.yml:

channels:
  - conda-forge
dependencies:
  - r-getpass
  - r-rpostgresql
  - ta-lib
  - spacy
  - mplfinance
  - pandas-ta
  - bt
  - pip
  - pip:
    - backtrader
    - backtesting
    - bta-lib
    - backtrader[plotting]

And here is the Dockerfile:

FROM jupyter/datascience-notebook

USER root

RUN apt-get update && \
    apt-get install libpq-dev  -y && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

USER ${NB_UID}

COPY environment.yml /tmp/

RUN mamba env update -p ${NB_PYTHON_PREFIX} -f /tmp/environment.yml \
    mamba clean --all -f -y && \
    fix-permissions "${CONDA_DIR}" && \
    fix-permissions "/home/${NB_USER}"

Yep, rando typo from me in a narrow web form.

RUN mamba env update ... /tmp/environment.yml && \
#                                              ^
#                                 missing this |

A typographic preference of mine is to escape the line and put any operators/shell on the next line, so more like:

RUN mamba env update ... /tmp/environment.yml \
    && mamba clean ....

nope, that doesn’t fix it.
I am getting this args error:

mamba update: error: argument -p/--prefix: expected one argument

welp, maybe toss a RUN env above that to see what environment variables are known. But looking at the upstream it might just be that the env is already updated, and/or using base (shiver) to install user packages next to conda and mamba

Anyhow, some additional defensive shell techniques:

  • lead with set -eux (fails harder, and when a variable is undefined, which appears to be the case)
  • quote paths, but especially those that are partially constructed from env vars
RUN set -eux \
  && mamba env update --file "/tmp/environment.yml" ...
1 Like

thanks, seems to work but this give rise to a final problem.
So when I install spacy. It also then need to download a model separately e.g. en_core_web_sm. This model is vital for text processing.
According to the docs, you first install spacy and then install the model using:

python -m spacy download en_core_web_sm

When I do:

RUN python -m spacy download en_core_web_sm

it gives the error:

/opt/conda/bin/python: No module named spacy
The command '/bin/bash -o pipefail -c python -m spacy download en_core_web_sm

So in the previous step, we install spacy but then I get the error msg that No module named spacy??

Any final thoughts?

Most of the spacy models are available from conda-forge, so you can just add spacy-model-en_core_web_sm to your environment.yml.

I am loosing my mind now. I added the spacy-model in the environment.yml as you asked but I am getting this error now:

docker build -t fuse/own-ds-notebook .
Sending build context to Docker daemon   5.12kB
Step 1/6 : FROM jupyter/datascience-notebook
 ---> a65e5e20a596
Step 2/6 : USER root
 ---> Using cache
 ---> 089541889e49
Step 3/6 : RUN apt-get update &&     apt-get install libpq-dev  -y &&     apt-get clean && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> b177cb046add
Step 4/6 : USER ${NB_UID}
 ---> Using cache
 ---> 89703ecc4f36
Step 5/6 : COPY environment.yml /tmp/
 ---> Using cache
 ---> 74130e093ff0
Step 6/6 : RUN set -eux &&     mamba env update --file "/tmp/environment.yml" &&     mamba clean --all -f -y &&     fix-permissions "${CONDA_DIR}" &&     fix-permissions "/home/${NB_USER}"
 ---> Running in 91e784e2b3d6
+ mamba env update --file /tmp/environment.yml

CondaEnvException: Unable to determine environment

Please re-run this command with one of the following options:

* Provide an environment name via --name or -n
* Re-run this command inside an activated conda environment.

The command '/bin/bash -o pipefail -c set -eux &&     mamba env update --file "/tmp/environment.yml" &&     mamba clean --all -f -y &&     fix-permissions "${CONDA_DIR}" &&     fix-permissions "/home/${NB_USER}"' returned a non-zero code: 1

:alien:

You may need to activate the environment first:

RUN set -eux \
    && source activate \
    && mamba install # ...

finally, it’s working!! thank you so much for your patience and your help!
Massively appreciated! :slight_smile: