Install R packages in JH Kubernetes deployment

Having two urgent questions, as I need to support R users soon.

  • How to boil R libraries in the image of single user Dockerfile
    • I am using the following script; when building the image it takes a long time in the R packages installation lines; and I need to interrupt it. Does it look fine?
    • # Make sure to match your JupyterHub application version
      FROM quay.io/jupyter/datascience-notebook:hub-5.2.1
      
      USER root
      
      # Install OS packages, dependencies, packages, ...
      # For example:
      #RUN pip install jupyter-ai[all]
      RUN pip install minio
      RUN pip install pandas
      RUN pip install duckdb
      RUN pip install xgboost
      RUN pip install prophet
      RUN pip install plotly
      RUN pip install polars
      
      ### Install R packages
      RUN R -e "install.packages(c('tidyverse','data.table','janitor','ggplot2','plotly','gganimate','caret','mlr3','xgboost','glmnet','torch','keras','tidyquant','lubridate','knitr','shiny'), repos='https://cloud.r-project.org')"
      
      
      
      USER jovyan
      
  • How to install R libraries in the notebook itself?

Looking forward

long time in the R packages installation

Please have a look at the upstream Dockerfile. Wherever possible docker-stacks the mamba package manager tool to provision a virtual environment, which is activated by default. This directly supports the conda(-forge) ecosystem, and indirectly, PyPI via an extra file, environment.yml, use of which in a container has advantages (better tooling support e.g. renovate) and disadvantages (another COPY in Dockerfile).

Using the equivalent of sudo pip and sudo R to install packages may have unintended side-effects for the ones already installed with mamba which are accessible to (and changeable by) a user.

Consider the following:

  • don’t drop to root
  • use mamba install and clean up after yourself (it will cache a couple hundred megabytes for intermediate data)
USER ${NB_UID}

RUN mamba install --yes \
  minio \
  pandas \
  duckdb \
  xgboost \
  prophet \
  plotly \
  polars \
  r-tidyverse \
  r-data.table \
  r-janitor \
  r-ggplot2 \
  r-plotly \
  r-gganimate \
  r-caret \
  r-mlr3 \
  r-xgboost \
  r-glmnet \
  r-torch \
  r-keras \
  r-tidyquant \
  r-lubridate \
  r-knitr \
  r-shiny \
&& mamba clean --all -f -y
&& fix-permissions "${CONDA_DIR}" \
&& fix-permissions "/home/${NB_USER}"

The above it not tested, but the list of packages does in fact solve.

4 Likes

I’m back with another question: how to create a new (multiple) Python environment boiled in the image? I need to support users with different Python versions; when the user creates a new environment using conda, the pod does not become alive after idle turn off.

Well, when users create new conda environments, they do that in their own persistent storage. In this case, you can use nb_conda_kernels which will pick up kernels installed inside conda environments at runtime and add them to Launcher.

thanks!
should I boil nb_conda_kernels in the image? can I install it simply by mamba?

Also, is it normal that the pod does not become alive (if the user has created a new env)?
Looking forward!

Yes, you can install it in the image and configure tit appropriately based on the PATH where persistent storage is mounted for users.

Without single user server logs, it will be hard to say what is going wrong!!