Delta-sharing library not working with singleuser image anaconda

alison · November 15, 2021, 9:03am

Hi,

I have a question about the anaconda distribution in the jupyterhub docker image

I’m trying to use the new delta-sharing library created by databricks

If I use the anaconda distribution in the jupyterhub/singleuser docker image and I try the command “delta_sharing.load_as_pandas(table_url)”, the pyarrow library used throws a FileNotFoundError
If I don’t use the singleuser image and I configure, make, make install my own python then the command “delta_sharing.load_as_pandas(table_url)” works.

So my possible questions are:

What can I add to your jupyterhub/singleuser anaconda distribution for it to work with the delta-sharing library ?
If it’s not possible to change/fix the anaconda distribution, how can I change the juputerhub/singleuser image to point to my configured python version and still work ?

Thank you very much

manics · November 15, 2021, 11:13am

Hi! Can you show us how you’re installing delta-sharing in your Docker image- do you have a link to your Dockerfile?

alison · November 15, 2021, 11:40am

Hi Manics,

In both cases I get the ‘delta-sharing’ library from our artifactory using the following:

RUN echo “[global]” > /tmp/pip.conf &&
echo “index-url = https://${artifactory_username}:${artifactory_password}@artifactory.cib.echonet/artifactory/api/pypi/pypi/simple” >> /tmp/pip.conf &&
echo “trusted-host = artifactory.cib.echonet” >> /tmp/pip.conf

and then I do pip install delta-sharing

The difference is how python was built

In one case I don’t build anything and just use a predefined jupyterhub singleuser docker image. I’ve tried both of the following (and they both use anaconda) and I haven’t seen any difference (I get the same error)

FROM artifactory.cib.echonet/jupyterhub/singleuser:1.2.0
FROM artifactory.cib.echonet/jupyterhub/k8s-singleuser-sample:0.11.1

In the other case I build python myself using in the Dockerfile from a centos machine the following:

#####################

ARG artifactory_username
ARG artifactory_password
ARG artifactory_apikey

ARG python_version=“3.7.1”
ARG python_dir="/apps/python"
ARG python_dist_file=“Python-${python_version}.tgz”

ENV LD_LIBRARY_PATH=/usr/local/lib:/usr/local/include

RUN yum -y group install “Development Tools”
RUN yum -y install zlib-devel
RUN yum -y install libffi-devel
RUN yum -y install openssl-devel
RUN yum -y install libsqlite3x-devel
RUN yum -y install bzip2-devel
RUN yum -y install xz-devel

RUN curl “https://artifactory.cib.echonet/artifactory/external-generic-local/python/python/python/linux/${python_dist_file}”
-o “/tmp/${python_dist_file}” -u ${artifactory_username}:${artifactory_password} &&
mkdir -p ${python_dir} &&
tar -xzvf /tmp/${python_dist_file} -C ${python_dir} &&
rm “/tmp/${python_dist_file}”

RUN cd ${python_dir}/Python-${python_version} &&
./configure --with-openssl="/usr" --enable-loadable-sqlite-extensions &&
make && make install

RUN echo “[global]” >> /etc/pip.conf &&
echo “index-url = https://${artifactory_username}:${artifactory_apikey}@artifactory.cib.echonet/artifactory/api/pypi/pypi/simple” >> /etc/pip.conf &&
echo “trusted-host = artifactory.cib.echonet” >> /etc/pip.conf

####################

Do I maybe need to add the delta-sharing library with ‘conda install’ rather than ‘pip install’ when I use the anaconda distribution in the singleuser image case ?

I’ve tried the ‘delta_sharing.load_as_spark(table_url)’ - the spark way - and that works with both python installations. It’s just the load_as_pandas() that I’m having issues with

Thank you very much,

Alison

bollwyvl · November 15, 2021, 2:45pm

delta-sharing looks pretty simple to build… but has a lot of not-so-trivial dependencies, many of which won’t be (up-to-date) in the anaconda distribution. Mixing pip install and conda install is relatively benign when done as a one-shot in a container… but you never really know.

Getting it on conda-forge (the community-lead upstream of the anaconda distribution) would likely give a tested, compatible, continuously updated solution. I’ve opened up this pull request to kick the tires on it. Feel free to weigh in there!

More broadly: conda-forge’s Miniforge (or Mambaforge) can be a better fit for containerization for size/reproducibility purposes, as it encourages you to only bring what you need (e.g. not a compiler) and document exactly what goes in… in this case, having a pip stanza in an environment.yml is a good way provide a more complete picture.

Also, IANAL, but: depending on your company size, anaconda stock prices, and the phases of the moon, etc. you may be in violation of the ToS for the anaconda distribution. This covers not only distribution, but also just “commercial activity.” The packages and installers created by conda-forge are, however, definitely not encumbered, hence we have all but shifted to them in various Jupyter projects.

alison · November 15, 2021, 6:07pm

Thank you very much bollwyvl

So if I’ve understood correctly:

As jupyterhub uses the Anaconda distribution I need to install the ‘delta-sharing’ library via ‘conda’ rather than ‘pip’

Would it be possible to run Jupyterhub without Anaconda ? All the singleuser images seem to use Anaconda

Thank you very much,

Alison

bollwyvl · November 15, 2021, 6:20pm

As jupyterhub uses the Anaconda distribution I need to install the ‘delta-sharing’ library via ‘conda’ rather than ‘pip’

Yes, conda install -c conda-forge delta-sharing-python will work within the hour.

Would it be possible to run Jupyterhub without Anaconda

It is, of course, but the convenience of having pre-built binaries, especially for nasty things like GDAL, generally leads people to rely on conda for at least the base level of python, as the system package managers generally lag for what data scientists demand.

The biggest win, though, for conda in a container is the ability to add advanced technology at run time as a non-root user.

alison · November 15, 2021, 6:28pm

Thank you very much bollwyvl !

I can see the file is here → Files :: Anaconda.org

I will try this via conda install

Alison

Topic		Replies	Views
What is the jupyterhub-singleuser conda package for? JupyterHub	2	1100	August 17, 2022
Use JupyterHub with non Jupyter Docker Images discuss jupyterhub	2	842	September 10, 2023
Installing Libraries for all Users The Littlest JupyterHub	5	4016	June 19, 2024
Change default JupyterLab working directory JupyterLab jupyterlab , how-to , help-wanted	0	2663	April 27, 2023
Singleuser won't use any images except the default Zero to JupyterHub on Kubernetes help-wanted	0	355	June 15, 2021

Delta-sharing library not working with singleuser image anaconda

Related topics