Session 1 - 19 April
Dear rubber duck
Have you ever thought that it was helpful to speak to someone about something, even though the other person did not say much? I don’t have anyone around to be that person right now, and I don’t own a rubber duck, so I figured I’ll write to you in this forum!
It is my hope that by documenting this process I may provide some insights on the general process of contributing to open source projects in general.
Defined my goal: to make mybinder.org / repo2docker support pipfiles
I’m starting out on a journey to solve a problem that I really want solved. I want to make mybinder.org able to understand how to use the Python package dependecy files named pipfile
and pipfile.lock
. I want this as I’ve found myself twice or more in a situation where I was about to suggest the authors of a repo with jupyter notebooks also added a MyBinder.org badge only to spot the pipfiles.
MyBinder.org currently understands how to use environment.yml
and requirements.txt
python package dependency files, but not the pipfile’s. Getting MyBinder.org to support these pipfile’s is really a question of making repo2docker support them though, so that is where I’ll work - towards repo2docker!
This work will be an attempt to close issue #174! (ping: @yuvipanda @minrk @choldgraf @jezcope @jzf2101 @trallard @Madhu94 @betatim)
Found CONTRIBUTING.md
I’ve already got started and read the README.md file of repo2docker but there was nothing on how to get started with a contribution in the file itself. But, I spotted the CONTRIBUTING.md file! I read it through and picked up on how to setup a local development environment.
Read more documentation
But, as repo2docker is quite new to me, I figured I’ll avoid a past mistake of running into issues I could have avoided by simply reading a bit of the documentation ahead of time.
What I learned
Repo2Docker will inspect a git repository and ask its buildpacks in a specific order if they can handle figure out how to create a Dockerfile for the repository, this is the Detection phase. We need to add code to detect pipfile
or pipfile.lock
in an existing buildpack or create a new one.
Questions!
At this point, I better write down some of the questions I’ve ended up with before I loose track of them. I’d love to get your help with input about them!
Question 1: Should I add a buildpack or augment one?
Hmmm… I think I should add one, but I’m a bit confused… I saw fewer than expected in the repository code base, one named conda but none seemed associated with
requirements.txt
. Perhaps its part of theconda
buildpack? Hmmm…OK - Session 2: I’m quite confident I should augment the logic in the PythonBuildPack now.
Question 2: If we add a buildpack, it should be put in the ordered list for the detection phase, but at what position would make sense?
Hmmm… I think this is a question along with Q1 that could be answered by those that has contributed a lot to the project already if I ask them.
OK - Session 2: No longer a relevant question due to not adding a buildpack.
Question 3: What makes sense when finding the various combinations of
pipfile.lock
andpipfile
?OK - Session 1: Oh I think I got this one myself after simply writing it down! I think if we find either one of these, we will let
pipenv install
do the job for us! I thinkpipenv install
will use the lock-file if there is one, or use the less tightly pinned packages from thepipfile
if there is nopipfile.lock
to be found. So,pipenv install
will solve the logic for us, we just need to find either one of these files I think.OK Correction - Session 3:
pipenv install
will work on thePipfile
whilepipenv sync
with work on thePipfile.lock
. So, let’s prioritize the locked file and the sync command and follow up with the install command if there are none.OK Correction - Session 3: I use the
pipenv install
command no matter what in order to be able to use the--system
flag that isn’t available in thepipenv sync
command. The install command can accomplish the same thing if passed two additional parameters:--ignore-pipfile
and--deploy
after having created aPipfile.lock
if there were none.
Question 4: What should we do if we find a combination of environment.yml / requirements.txt /
pipfile?Hmmm… I think this relates closely to Q2.
OK - Session 2: We should only care about
environment.yml
, but if there was no such file butrequirements.txt
andPipfile
orPipfile.lock
then we should ignorerequirements.txt
I think.
Session 2 - April 20
I’ve setup a developer environment and solved a minor challenge along the way that I documented in a post below as something to fix at some point. For now though, I want to progress towards the goal and not get stuck so I wrote it down and continued.
I’m looking into the source code trying to understand how things work as best as I can. I realize I needed a better understanding of the buildpacks in place. So, I’m starting to write down some overview about them. Perhaps I can answer Q1, if I should add a new buildpack or augment one.
Overview of the detect() function of the buildpacks
The ordering of the buildpacks detect functionality goes as follows:
- LegacyBinderDockerBuildPack, will detect a Dockerfile with a
FROM andrewosh/binder-base
statement. - DockerBuildPack, inherits from BuildPack, will detect a Dockerfile.
- JuliaProjectTomlBuildPack, inherits from PythonBuildPack, will detect either
Project.toml
orJuliaProject.toml
. - JuliaRequireBuildPack, inherits from PythonBuildPack, will detect a
REQUIRE
file and requires aProject.toml
to not be found. - NixBuildPack, inherits from BaseImage > BuildPack, will detect a
default.nix
file. - RBuildPack, inherits from PythonBuildPack
- CondaBuildPack, inherits from BaseImage > BuildPack, detects
environment.yml
- PythonBuildPack, inherits from CondaBuildPack, detects python in
runtime.txt
,setup.py
in root folder, andrequirements.txt
.
Hmmmm, leaning towards the idea of augmenting the PythonBuildPack, I think pipenv files compete with requirements.txt files and PythonBuildPack is working with them.
I learned about the test setup
The tests folder contained a conftest.py
file that had a useful docstring!
Each directory that has a script named ‘verify’ is considered
a test. jupyter-repo2docker is run on that directory,
and then ./verify is run inside the built container. It should
return a non-zero exit code for the test to be considered a
success.
That is excellent! I figure why not start out by creating some tests, that way I’d define the functionality I want to achieve and can communicate that to the maintainers of the repo through concrete code as well!
Test 1 - Stub done: I want pipfile
or pipfile.lock
to take precedence over a requirements.txt
file.
Test 2 - Stub done: I want a environment.yml
to take precedence over a pipfile
or pipfile.lock
for the same reasons they are taking precedence over a requirements.txt
file that I imagine. I imagine conda can install more than pip/pipenv can so we should not limit ourselves.
Test 3 - Stub done: I want pipfile
or pipfile.lock
to take the same kind of precedence as requirements.txt
over setup.py
. Oh… I learned now that setup.py
is installed after requirements.txt
anyhow. I also found no test associated with setup.py. Let's make this test anyhow at some point where
setup.pyis verified to be installed after the
pipenv` installation.
I made a [WIP] PR
I submitted a [WIP] PR to jupyter/repo2docker! See #649.
Session 3 - April 21
I got a basic idea of how things work and I have created some test to succeed along with all the other test that should still not fail while doing that.
I first ran a single test to verify I could do that.
# run a specific test and get lots of output
pipenv run pytest -s tests/venv/pipfile/environment-yml/
It worked out great and I could understand clearly that a Dockerfile was created, built, and tested. This takes quite a while. So, by decided to run all tests so I could cache a lot of work.
# lets run all tests to cache a lot of work for the future
pipenv run pytest
Questions!
Question 5: Should we use
pipenv install --dev
orpipenv install
by default?Hmmm… I think
--dev
currently should be added, but I’m not sure.OK: I decided to use
--dev
flag.
Question 6:
pipenv install
will do nothing for us unless we enter the environment as well I think, hmmm… One could also makepipenv install
install things without a virtual environment as the Dockerfile kinda is one anyhow and it would reduce potential complexity down the road I think. Okay so the question becomes: should I install a virtual environment and enter it withpipenv shell
, or should I make thepipenv install
install things directly, which I think we can make it do but I don’t know right now how.Hmmm… I’ll look and learn from how things have been done for other buildpacks such as the
conda
andpython
(also referred in tests asvenv
) buildpacks.OK: In this repo2docker code section I notice the answer should probably is to use a specific
pip
binary to do the install, or at least I realize we should avoid the complexity of doingpipenv shell
or similar to enter a environment.OK Correction: Apparently entering
pipenv shell
wasn’t easy from a Dockerfile, so by using--system
and--python
we install it directly.
Question 7: How to make
pipenv
install not in a virtual environment, but instead use a specificpip
binary to install things?Hmmm… I should read up a lot on the command line options for
pipenv
.Hmmm… Multiple options show up on how to use.
- Generate a requirements.txt: I could let
pipenv
generate arequirements.txt
file and use the pre-existing system within repo2docker to manage these. I would need to lookout for all interactions with such file though. I’m specifically cautious to not overlook something relating to how python versions are managed. I recall reading some code about extraction of a python version.- Speficy a python executable: What would it mean to use the
--python
flag?- Specify eh… --system? What would it mean to use the
--system
flag? I don’t think this is relevant to us. I think this influences the choice of having pckages installed in the user / system level, but I’m not especially confident about these aspects.I’ll need to decide on option 1 or 2 I think.
Hmmm… If we generate a requirements.txt file we may deviate from expected behavior where
.env
files are loaded, and perhaps also something relating to the Python version. Perhaps we need to choose option 2 to do this properly because with option 1 that won’t happen.OK: I’m going with option 2. This may be less straight forward but it should be the most robust solution long term I think.
OK Correction: I’m going with option 3. See Question 6’s final entry.
Question 8: How is the
python_version()
function used and should I adjust something based on introducingPipfile
’s that somehow potentially involve specifications of python versions. I also know thatrequirements.txt
can includepython=3.7
statements etc. How is that different from usingruntime.txt
for repo2docker etc?
Question 9: I notice that a Pipfile can explicitly install a package with
setup.py
. So, should we really have a logic that installssetup.py
after what was installed withPipfile
?OK: I decided to enforce the logic that if you have a
setup.py
file and also aPipfile
, thePipfile
need to have imported the local package like this where the localdummy
package is installed.[[source]] url = "https://pypi.python.org/simple" verify_ssl = true name = "pypi" [packages] there = "*" dummy = {path=".", editable=true}
Question 10: I found
$NB_PYTHON_PREFIX
and$KERNEL_PYTHON_PREFIX
within the code and now understand that the python environment that starts up the notebook server is one, and the actual environment that the Python kernel to be used within it will or at least can be another one. In the scripts I’ve seenpip
been invoked in three different ways and I’m now lost. What are the differences between the three pips?
${KERNEL_PYTHON_PREFIX}/bin/pip
${NB_PYTHON_PREFIX}/bin/pip
pip
Hmmm… Is the third option simply the same as one of the others? Where should
pipenv
be installed
Question 11: Why are we installing this version of pip?
elif os.path.exists(requirements_file): assemble_scripts.append(( '${NB_USER}', 'pip install "pip<19" && ' + \ '{} install --no-cache-dir -r "{}"'.format(pip, requirements_file) ))
Question 12: In what of the two-three python environments does it make sense for me to install
pipenv
?
Question 13: If we use
--system
and notpipenv shell
etc, we won’t get the benefits of loading the.env
right? Perhaps we can do an additional plug for this? See: Automatic loading of .env.
Session summary
I worked a lot with defining the tests and struggled a while with the setup.py
tests as I got very confused about being able to import a local package even though it wasn’t installed. But, it was because it was locally available but it did not really get installed with dependencies etc. So when I figured out I could check to see if it got a dependency installed as well things turned around.
I spent also a lot of time figuring out how to actually do the pipenv install
part and get packages to be detected in the right environment. Now everything seem to work though, I added commits up 3397068 in #649!
I think the key part that remains relates to Python versions.
Session 4 - Evening April 21
The goal is to start learning about pinning Python versions. I added a test to install Python 3.5 to get started. I quickly concluded that the test failed, and I got warnings about not having Python 3.5 etc, but as I remember reading that if we have PyEnv installed things may be managed for us. So, I set out to install that and see what happens.
-
Installing PyEnv isn’t trivial.
GitHub - pyenv/pyenv: Simple Python version management -
We need various apt-get dependencies:
Common build problems · pyenv/pyenv Wiki · GitHub
Questions!
Question 14: Where should we install
pyenv
?Hmmm… Various files has been put in
/tmp
I’ve noticed.
Question 15: What apt-get packages is already installed and which needs adding?
Question 16: Where should I install these apt-get build dependencies for PyEnv?
Question 17: Should I use
pyenv
or resort to overridingpython_version()
instead?.Hmmm… For now, after realizing the effort of getting
pyenv
installed, I’ll try overridingpython_version()
in a similar way that the CondaBuildPack does it. They inspect theenvironment.yml
file and choose a python version based on that.
Question 17: When overriding the python_version() function that normally inspects
runtime.txt
, one may wonder what makes most sense what to do when both aruntime.txt
andPipfile
is defined withpython_version = "3.5"
declared in it for example. Should I prioritize one or the other?Hmmm… I leaning to want to override
runtime.txt
with python_version specified in the Pipfile, I’d like to scream some feedback to the user about this though…Hmmm… For now I’ll go with ignoring
runtime.txt
entirely if there is aPipfile
orPipfile.lock
, it is simple.OK: I went with giving priority to
Pipfile.lock
, thenPipfile
, thenruntime.txt
.
Question 18: What makes thing work with py36 but not py35?
Step 40/47 : RUN ${KERNEL_PYTHON_PREFIX}/bin/pipenv lock --python >${KERNEL_PYTHON_PREFIX}/bin/python ---> Running in c7ae30385795 Creating a virtualenv for this project… Pipfile: /home/erik/Pipfile Using /srv/conda/bin/python (3.5.5) to create virtualenv… ⠙ Creating virtual environment...Already using interpreter /srv/conda/bin/python Using base prefix '/srv/conda' New python executable in /home/erik/.local/share/virtualenvs/erik-zof0I2Qp/bin/python ERROR: The executable /home/erik/.local/share/virtualenvs/erik-zof0I2Qp/bin/python is not >functioning ERROR: It thinks sys.prefix is '/home/erik' (should be '/home/erik/.local/share/virtualenvs/erik->zof0I2Qp') ERROR: virtualenv is not compatible with this system or executable ✘ Failed creating virtual environment [pipenv.exceptions.VirtualenvCreationException]: File "/srv/conda/lib/python3.5/site->packages/pipenv/vendor/click/decorators.py", line 17, in new_func [pipenv.exceptions.VirtualenvCreationException]: return f(get_current_context(), *args, **kwargs) [pipenv.exceptions.VirtualenvCreationException]: File "/srv/conda/lib/python3.5/site->packages/pipenv/cli/command.py", line 319, in lock [pipenv.exceptions.VirtualenvCreationException]: ensure_project(three=state.three, >python=state.python, pypi_mirror=state.pypi_mirror) [pipenv.exceptions.VirtualenvCreationException]: File "/srv/conda/lib/python3.5/site->packages/pipenv/core.py", line 574, in ensure_project [pipenv.exceptions.VirtualenvCreationException]: pypi_mirror=pypi_mirror, [pipenv.exceptions.VirtualenvCreationException]: File "/srv/conda/lib/python3.5/site->packages/pipenv/core.py", line 506, in ensure_virtualenv [pipenv.exceptions.VirtualenvCreationException]: python=python, site_packages=site_packages, >pypi_mirror=pypi_mirror [pipenv.exceptions.VirtualenvCreationException]: File "/srv/conda/lib/python3.5/site->packages/pipenv/core.py", line 935, in do_create_virtualenv [pipenv.exceptions.VirtualenvCreationException]: extra=[crayons.blue("{0}".format(c.err)),] [pipenv.exceptions.VirtualenvCreationException]: /home/erik/.local/share/virtualenvs/erik->zof0I2Qp/bin/python: error while loading shared libraries: libpython3.5m.so.1.0: cannot open shared >object file: No such file or directory Failed to create virtual environment.
Question 19: Oh… I think I have a bug introduced by not specifying where my Pipfile resides when I do
pipenv lock
andpipenv install
because of thebinder
folder. I better ensure to specify the file explicitly.OK: I could confirm that was the case, I added a test that failed as expected. Then I added a commit to fix the test and problem solved!