Jupyter newbie - Pre-reqs for Python based Jupyter notebooks

Hi,
I am a moderate user of RStudio on Mac and well versed with Python. I would like to start using Jupyter. Can you please help me with the following questions.

  1. An R notebook is basically a Markdown document that supports R. Is this true of .ipynb also? Or, is it a mix of Python code and Markdown?
  2. Some of the GitHub repos that host the .ipynb seem to ‘launch’ the notebook also. So, is an .ipynb an executable?
  3. Given an .ipynb, can I see it in action - the parts that do not generate graphs - with a CLI?
  4. Do I need Python 3 for Jupyter? Should pip be aliased to Python 3.
  5. Why does the installation refer to conda? Are conda and Jupyter two different ways to build and work with .ipynb?

I’ll try to help with some of these:

  1. Actually notebook files are stored as json files. They support any language there is a kernel for, including R. (Example of R in Jupyter is here if you pick Jupyter+R to launch.) The kernel gets specified towards the end of the json file. Within the notebook, individual cells can be code or markdown or raw text. The first two being the most common you would use. They get stored as json in the .ipynb file and when you open the notebook in Jupyter, the various cells get handled appropriately.
  2. I suspect when you are saying some of the Github repos, launch the notebook, you are referring to those that use MyBinder.org to launch a remote Jupyterhub session.
  3. This part I am least sure of as I don’t normally have occasion to do this beyond debugging. (The visual debugger is very cool, see here.) I suspect you want to look inside what is going on in the kernel, sort of like this. I know there is also kernelspec and maybe that is related to what you need?
    But maybe you are asking if you can run it from the command line, too? Which you can either do directly with as a notebook with nbconvert or converted to a script using jupytext, see here.
  1. I believe current Jupyter may require Python3. However, you can run Python 2 as a kernel still. Examples here and [here]https://github.com/binder-examples/python2_with_3). I’ve seen different pips aliased to Python 2 and 3 depending on what you need. At this point unless you need Python 2, just use Python 3 and then you don’t have to worry.
  2. Conda and Jupyter aren’t two different ways to build and work with .ipynb. Jupyter is needed to work with Jupyter notebooks actively as notebook. Conda is a popular package management system and environment building system that features a Python distribution as one of the many things it can do and so that is why it is highly recommended. See here for another take on that. You could do it without it, or try another package manager/Python distribution. Enthought’s Canopy used to be a popular way to go.

Jupyter also is a larger ecosystem than the Jupyter notbeook server, too. There is JupyterHub, etc. JupyterLab is the next evolution of the Jupyter notebook interface in a lot of ways. If you use RStudio a lot it will be more familiar to you in some ways. I made a page a few months ago to convince folks running a group aimed at training biologists on computing here that features an awesome image made by others and links to more about the ecosystem.

1 Like

Thanks for your detailed response.

  1. So, .ipynb is JSON. Got it.
  2. Consider this .ipynb file. It shows the values for Out and images of plots also. That gave me the impression of execution. However, looking into the .ipynb file, it seems the code output and images are defined in the notebook itself. I will look into MyBinder - thanks.
  3. Yes; something like this - How to run R scripts from the command line.
  4. Migrating my old Mac to Python 3 is something I am procrastinating about; if Jupyter can work Python 2 then I will let it be. :smile:
  5. I guess, I will get my hands dirty with Jupyter and Python and then progress to package management.

One last question.

Let’s say, I want to predict the failure of an equipment given sensor data. First, I would use Jupyter to clean and the explore the sensor data. Second, train and test some models for prediction. Then, the resulting model is deployed to production; with R, there is the Shiny applications. Or, exported via PMML for portability. How is this done with .ipynb?

This forum does not allow me to post links to Shiny and PMML; a Google search should quickly list out the same.

I’m still not clear what you see for item 3? You can write direct Python scripts and run them from the command line. This is the more traditional way of running them. If you need to run a script within the notebooks, you can use %run magics in the notebook or use %bash or ! to call out to the command line from within the notbeook. %run offers more options and better integration back into Jupyter. JupyterHub, which MyBinder is a glorified version of, includes access to the terminal and so you can run command line things there.

In regards to migrating your Mac to Python 3, I understand. However, thanks to MyBinder and other places I can leave my local computer alone and run Python 3 remotely. That has allowed me to not mess around with my working local system but explore Python 3 a lot more easily.

I’m familiar with Shiny. For converting a Jupyter notebook to an application you can use appmode or Voila. This recent answer might help you understand those better. Both can run on remote sessions served via MyBinder so that you can test the basics there.
For PMML, there is sklearn2pmml or Nyoka; see here for announcement of Nyoka replacing py2pmml. #7 here discusses various ways. Or see here.

1 Like
  1. Internally the .ipynb notebook is a json file with a lot of metadata. ipynb can include markdown and also can include things from different languages

  2. ipynb is a data file. You can execute it remotely

  3. Yes. ipynb is a readable file which means that you can open it up in a text editor and you can automatically process it

  4. it’s better to use python3 for any new development. python2 is in maintainence mode and the only time you should use python2 is if you need to use a new package which has not been ported.

  5. the installation refers to conda because conda is the package manager that a lot of people use particularly on windows and mac. It’s not necessary to use conda with Jupyter and you can install it only with pip.

1 Like