Unable to get PySpark kernel to use an updated pip version

I am using a Jupyter Notebook which is provided by an AWS managed service called EMR Studio. My understanding of how these notebooks work is that they are hosted on EC2 instances that I provision as part of my EMR cluster. Specifically with the PySpark kernel using the task nodes.

Currently when I run the command sc.list_packages() I see that pip is at version 9.0.1 whereas if I SSH onto the master node and run pip list I see that pip is at version 20.2.2. I have issues running the command sc.install_pypi_package() due to the lowered pip version in the Notebook.

In the notebook cell if I run import pip then pip I see that the module is located at

<module 'pip' from '/mnt1/yarn/usercache/<LIVY_IMPERSONATION_ROLE>/appcache/application_1652110228490_0001/container_1652110228490_0001_01_000001/tmp/1652113783466-0/lib/python3.7/site-packages/pip/__init__.py'> 

I am assuming this is most likely within a virtualenv of some sort running as an application on the task node? I am unsure of this and I have no concrete evidence of how the virtualenv is provisioned if there is one.

If I run sc.uninstall_package('pip') then sc.list_packages() I see pip at a version of 20.2.2 which is what I am looking to initially start off with. The module path is the same as previously mentioned.

How can I get pip 20.2.2 in the virtualenv instead of pip 9.0.1?

If I import a package like numpy I see that the module is located at a different location from where pip is. Any reason for this?

<module 'numpy' from '/usr/local/lib64/python3.7/site-packages/numpy/__init__.py'>

As for pip 9.0.1 the only reference I can find at the moment is in /lib/python2.7/site-packages/virtualenv_support/pip-9.0.1-py2.py3-none-any.whl . One directory outside of this I see a file called virtualenv-15.1.0-py2.7.egg-info which if I cat the file states that it upgrades to pip 9.0.1. I have tried to remove the pip 9.0.1 wheel file and replaced it with a pip 20.2.2 wheel which caused issues with the PySpark kernel being able to provision properly. There is also a virtualenv.py file which does reference a __version__ = "15.1.0" .