Having two urgent questions, as I need to support R users soon.
How to boil R libraries in the image of single user Dockerfile
I am using the following script; when building the image it takes a long time in the R packages installation lines; and I need to interrupt it. Does it look fine?
# Make sure to match your JupyterHub application version
FROM quay.io/jupyter/datascience-notebook:hub-5.2.1
USER root
# Install OS packages, dependencies, packages, ...
# For example:
#RUN pip install jupyter-ai[all]
RUN pip install minio
RUN pip install pandas
RUN pip install duckdb
RUN pip install xgboost
RUN pip install prophet
RUN pip install plotly
RUN pip install polars
### Install R packages
RUN R -e "install.packages(c('tidyverse','data.table','janitor','ggplot2','plotly','gganimate','caret','mlr3','xgboost','glmnet','torch','keras','tidyquant','lubridate','knitr','shiny'), repos='https://cloud.r-project.org')"
USER jovyan
How to install R libraries in the notebook itself?
Please have a look at the upstream Dockerfile. Wherever possible docker-stacks the mamba package manager tool to provision a virtual environment, which is activated by default. This directly supports the conda(-forge) ecosystem, and indirectly, PyPI via an extra file, environment.yml, use of which in a container has advantages (better tooling support e.g. renovate) and disadvantages (another COPY in Dockerfile).
Using the equivalent of sudo pip and sudo R to install packages may have unintended side-effects for the ones already installed with mamba which are accessible to (and changeable by) a user.
Consider the following:
don’t drop to root
use mamba install and clean up after yourself (it will cache a couple hundred megabytes for intermediate data)