Why is pyspark module installed on a browser launched notebook (python3) but not installed in python3 terminal console?

m1p1h · November 21, 2023, 4:04pm

Using the pyspark-notebook docker image. I can open a notebook via the browser and see the pyspark module is installed (help(‘modules’)). But when I try and use what appears to be the same python distribution (looking at the kernelspec) via the container command line, pyspark doesnt appear to be installed.

Aside from installing it as part of the build, am i missing something in terms of how jupyter prepares the environment when notebooks are initialised as a browser instance.

I can run pyspark from command line but ultimately im trying to run a pyspark based notebook from the command line using papermill.

minrk · November 24, 2023, 8:35am

Jupyter doesn’t really prepare environments at all, it runs in the environment you’ve given it.

When you’re working with two Python environments, it’s almost always down to the value of sys.prefix because the two are operating in different Python environments. You can also check the value of sys.path, which will show you where import looks for packages.

If you’re running pyspark from its own command-line entrypoint and it’s not actually installed as a Python package, one of the things the pyspark script does is add itself to sys.path so you don’t really have to install it to use it. This results in exactly the kind of confusion you describe.

A long time ago, I made the tiny findspark, which does exactly this, but the other way around (look for spark and add it to sys.path), so that you can get similar behavior. But for personal use, you can generally replace it with a single call to sys.path.extend with the right path for your environment prior to importing pyspark.

Topic		Replies	Views
Pyspark library is missing from jupyter/pyspark-notebook when running with jupyterhub/zero-to-jupyterhub-k8s Zero to JupyterHub on Kubernetes help-wanted	5	4655	November 12, 2021
Integrate JupyterHub with PySpark which is running on docker container JupyterHub	1	274	November 13, 2023
Pyspark & Docker image General	0	296	November 16, 2023
Unable to get PySpark kernel to use an updated pip version Kernels notebook	0	784	May 10, 2022
Pyspark kernel not found in Quay.io/jupyter/pyspark-notebook docker image Kernels	0	262	November 17, 2023

Why is pyspark module installed on a browser launched notebook (python3) but not installed in python3 terminal console?

Related Topics