Install R packages in JH Kubernetes deployment

Having two urgent questions, as I need to support R users soon.

  • How to boil R libraries in the image of single user Dockerfile
    • I am using the following script; when building the image it takes a long time in the R packages installation lines; and I need to interrupt it. Does it look fine?
    • # Make sure to match your JupyterHub application version
      FROM quay.io/jupyter/datascience-notebook:hub-5.2.1
      
      USER root
      
      # Install OS packages, dependencies, packages, ...
      # For example:
      #RUN pip install jupyter-ai[all]
      RUN pip install minio
      RUN pip install pandas
      RUN pip install duckdb
      RUN pip install xgboost
      RUN pip install prophet
      RUN pip install plotly
      RUN pip install polars
      
      ### Install R packages
      RUN R -e "install.packages(c('tidyverse','data.table','janitor','ggplot2','plotly','gganimate','caret','mlr3','xgboost','glmnet','torch','keras','tidyquant','lubridate','knitr','shiny'), repos='https://cloud.r-project.org')"
      
      
      
      USER jovyan
      
  • How to install R libraries in the notebook itself?

Looking forward

long time in the R packages installation

Please have a look at the upstream Dockerfile. Wherever possible docker-stacks the mamba package manager tool to provision a virtual environment, which is activated by default. This directly supports the conda(-forge) ecosystem, and indirectly, PyPI via an extra file, environment.yml, use of which in a container has advantages (better tooling support e.g. renovate) and disadvantages (another COPY in Dockerfile).

Using the equivalent of sudo pip and sudo R to install packages may have unintended side-effects for the ones already installed with mamba which are accessible to (and changeable by) a user.

Consider the following:

  • don’t drop to root
  • use mamba install and clean up after yourself (it will cache a couple hundred megabytes for intermediate data)
USER ${NB_UID}

RUN mamba install --yes \
  minio \
  pandas \
  duckdb \
  xgboost \
  prophet \
  plotly \
  polars \
  r-tidyverse \
  r-data.table \
  r-janitor \
  r-ggplot2 \
  r-plotly \
  r-gganimate \
  r-caret \
  r-mlr3 \
  r-xgboost \
  r-glmnet \
  r-torch \
  r-keras \
  r-tidyquant \
  r-lubridate \
  r-knitr \
  r-shiny \
&& mamba clean --all -f -y
&& fix-permissions "${CONDA_DIR}" \
&& fix-permissions "/home/${NB_USER}"

The above it not tested, but the list of packages does in fact solve.

3 Likes