Hi,
Sorry for a dumb question but what’s the best way to install a pip
package such as backtrader
?
I have been adding my own packages in the jupyter/datascience-notebook
docker image using the Dockerfile
like this:
RUN mamba install --quiet --yes \
'ta-lib' && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
But there are other packages that are not available in conda
so I have to use pip
…and
I don’t wish to blindly use the following:
RUN pip install --no-cache-dir backtrader
Any hints would be appreciated.
thanks
you can create an environment.yml
such as:
channels:
- conda-forge
dependencies:
- ta-lib
- pip
- pip:
- backtrader
# others
and mamba env update -p ${NB_PYTHON_PREFIX} -f environment.yml
to do it all in one step. I suppose you could use shell redirection to write it directly to a file within the docker container, but just using COPY
might be more maintainable.
you’ll usually benefit from sourcing as many (especially binary) dependencies as possible from mamba, which can be a bit trial-and-error.
1 Like
thank you for your msg. I am a bit confused again. Where do I reference the environment.yml
??
This is how I added some of the the packages:
FROM jupyter/datascience-notebook
USER root
RUN apt-get update && \
apt-get install libpq-dev -y && \
apt-get clean && rm -rf /var/lib/apt/lists/*
USER ${NB_UID}
# R packages including IRKernel which gets installed globally.
# Ref: https://github.com/jupyter/docker-stacks/tree/main/datascience-notebook
RUN mamba install --quiet --yes \
'r-getpass' \
'r-rpostgresql' && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
# Install new packages in the default python3 environment
# Ref: https://github.com/jupyter/docker-stacks/tree/main/scipy-notebook
RUN mamba install --quiet --yes \
'spacy' \
'ta-lib' && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
# Install backtrader using pip because it's not available in conda
RUN pip install --no-cache-dir backtrader
Is this ok or I could improve it by saving some space?
thanks
Can recommend putting everything in one environment.yml
next to the Dockerfile
so it’s just doing one solve.
COPY environment.yml /tmp/
RUN mamba env update --prefix ${NB_PYTHON_PREFIX} --file /tmp/environment.yml \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
Brilliant, thanks!
finally, how would the yaml
look like for the given packages in my previous msg?
thanks again
i’ve already pretty much written the whole thing for you at this point.
just copy and paste the package names into something like Install pip package such as backtrader in the jupyter/datascience-notebook image - #2 by bollwyvl
1 Like
Hi there,
Unfortunately, I am getting the following error:
mamba update: error: argument -p/--prefix: expected one argument
The command '/bin/bash -o pipefail -c mamba env update --prefix ${NB_PYTHON_PREFIX} --file /tmp/environment.yml mamba clean --all -f -y && fix-permissions "${CONDA_DIR}" && fix-permissions "/home/${NB_USER}"' returned a non-zero code: 2
Here is my environment.yml
:
channels:
- conda-forge
dependencies:
- r-getpass
- r-rpostgresql
- ta-lib
- spacy
- mplfinance
- pandas-ta
- bt
- pip
- pip:
- backtrader
- backtesting
- bta-lib
- backtrader[plotting]
And here is the Dockerfile
:
FROM jupyter/datascience-notebook
USER root
RUN apt-get update && \
apt-get install libpq-dev -y && \
apt-get clean && rm -rf /var/lib/apt/lists/*
USER ${NB_UID}
COPY environment.yml /tmp/
RUN mamba env update -p ${NB_PYTHON_PREFIX} -f /tmp/environment.yml \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}" && \
fix-permissions "/home/${NB_USER}"
Yep, rando typo from me in a narrow web form.
RUN mamba env update ... /tmp/environment.yml && \
# ^
# missing this |
A typographic preference of mine is to escape the line and put any operators/shell on the next line, so more like:
RUN mamba env update ... /tmp/environment.yml \
&& mamba clean ....
bollwyvl:
--prefix
nope, that doesn’t fix it.
I am getting this args error:
mamba update: error: argument -p/--prefix: expected one argument
welp, maybe toss a RUN env
above that to see what environment variables are known. But looking at the upstream it might just be that the env is already updated, and/or using base
(shiver) to install user packages next to conda
and mamba
Anyhow, some additional defensive shell techniques:
lead with set -eux
(fails harder, and when a variable is undefined, which appears to be the case)
quote paths, but especially those that are partially constructed from env vars
RUN set -eux \
&& mamba env update --file "/tmp/environment.yml" ...
1 Like
thanks, seems to work but this give rise to a final problem.
So when I install spacy
. It also then need to download a model separately e.g. en_core_web_sm
. This model is vital for text processing.
According to the docs, you first install spacy
and then install the model using:
python -m spacy download en_core_web_sm
When I do:
RUN python -m spacy download en_core_web_sm
it gives the error:
/opt/conda/bin/python: No module named spacy
The command '/bin/bash -o pipefail -c python -m spacy download en_core_web_sm
So in the previous step, we install spacy
but then I get the error msg that No module named spacy
??
Any final thoughts?
Most of the spacy models are available from conda-forge , so you can just add spacy-model-en_core_web_sm
to your environment.yml
.
I am loosing my mind now. I added the spacy-model
in the environment.yml
as you asked but I am getting this error now:
docker build -t fuse/own-ds-notebook .
Sending build context to Docker daemon 5.12kB
Step 1/6 : FROM jupyter/datascience-notebook
---> a65e5e20a596
Step 2/6 : USER root
---> Using cache
---> 089541889e49
Step 3/6 : RUN apt-get update && apt-get install libpq-dev -y && apt-get clean && rm -rf /var/lib/apt/lists/*
---> Using cache
---> b177cb046add
Step 4/6 : USER ${NB_UID}
---> Using cache
---> 89703ecc4f36
Step 5/6 : COPY environment.yml /tmp/
---> Using cache
---> 74130e093ff0
Step 6/6 : RUN set -eux && mamba env update --file "/tmp/environment.yml" && mamba clean --all -f -y && fix-permissions "${CONDA_DIR}" && fix-permissions "/home/${NB_USER}"
---> Running in 91e784e2b3d6
+ mamba env update --file /tmp/environment.yml
CondaEnvException: Unable to determine environment
Please re-run this command with one of the following options:
* Provide an environment name via --name or -n
* Re-run this command inside an activated conda environment.
The command '/bin/bash -o pipefail -c set -eux && mamba env update --file "/tmp/environment.yml" && mamba clean --all -f -y && fix-permissions "${CONDA_DIR}" && fix-permissions "/home/${NB_USER}"' returned a non-zero code: 1
You may need to activate
the environment first:
RUN set -eux \
&& source activate \
&& mamba install # ...
finally, it’s working!! thank you so much for your patience and your help!
Massively appreciated!