I am using a Jupyter Notebook which is provided by an AWS managed service called EMR Studio. My understanding of how these notebooks work is that they are hosted on EC2 instances that I provision as part of my EMR cluster. Specifically with the PySpark kernel using the task nodes.
Currently when I run the command
sc.list_packages() I see that pip is at version 9.0.1 whereas if I SSH onto the master node and run
pip list I see that pip is at version 20.2.2. I have issues running the command
sc.install_pypi_package() due to the lowered pip version in the Notebook.
In the notebook cell if I run
import pip then
pip I see that the module is located at
<module 'pip' from '/mnt1/yarn/usercache/<LIVY_IMPERSONATION_ROLE>/appcache/application_1652110228490_0001/container_1652110228490_0001_01_000001/tmp/1652113783466-0/lib/python3.7/site-packages/pip/__init__.py'>
I am assuming this is most likely within a virtualenv of some sort running as an application on the task node? I am unsure of this and I have no concrete evidence of how the virtualenv is provisioned if there is one.
If I run
sc.list_packages() I see pip at a version of 20.2.2 which is what I am looking to initially start off with. The module path is the same as previously mentioned.
How can I get pip 20.2.2 in the virtualenv instead of pip 9.0.1?
If I import a package like numpy I see that the module is located at a different location from where pip is. Any reason for this?
<module 'numpy' from '/usr/local/lib64/python3.7/site-packages/numpy/__init__.py'>
As for pip 9.0.1 the only reference I can find at the moment is in
/lib/python2.7/site-packages/virtualenv_support/pip-9.0.1-py2.py3-none-any.whl . One directory outside of this I see a file called
virtualenv-15.1.0-py2.7.egg-info which if I
cat the file states that it upgrades to pip 9.0.1. I have tried to remove the pip 9.0.1 wheel file and replaced it with a pip 20.2.2 wheel which caused issues with the PySpark kernel being able to provision properly. There is also a
virtualenv.py file which does reference a
__version__ = "15.1.0" .