I like the idea of a twitter poll. Maybe it doesn’t deliver the highest quality data, has constraints, etc but we can do it in 5minutes. Once we decide to.
What do you think of offering two answers plus a third on “Other, please reply”. as a way to get around the 4 answer limit? I’ve seen people do this before and you get some interesting replies. Though I assume a lot of people don’t bother replying
Analyzing mybinder and putting a poll on that website makes sense. Gigantum will at least keep doing some in-person usability testing, and maybe we can blast our email list, and I’m happy to share what we learn.
But as far as Twitter goes… how’s this?
This is a poll for individuals who use Python at least weekly (for work, learning, pleasure, etc.).
What is your preference for Project Jupyter’s web notebooks - Jupyter Lab and “classic” Jupyter Notebook? Feel free to expand in a comment!
I prefer Lab
I prefer Notebook
I prefer editor/terminal
Other (please comment!)
Choices are limited to 25 characters, but you can add explanations in the main tweet “body”. I’m trying to synthesize what folks are thinking here, but I’m open to refinement.
To me that poll wording seems slightly inconsistent. The intro says it’s aimed at all regular Python users. The poll question focuses on Jupyter notebooks but offers a I prefer editor/terminal option- presumably no one edits the raw JSON of ipynb files?.
If you’re interested in all Python users perhaps it’d be clearer to have two separate polls:
One obvious thing to change is scoping the question to folks in the “PyData / NumFocus” ecosystem. I guess “for data-driven tasks (so generally not websites, devops, etc.)”
I think there’s enough buy-in that doing a coordinated Twitter poll seems worthwhile, which means one question, max 4 responses. There are a set of use-cases where this set of choices is relevant: Binder, Jupyter Hub deployers and educators in general, my company (Gigantum) and other companies who are deploying web front-ends for data processing with Python (Bloomberg, IBM, etc.), and of course Jupyter developers and promoters.
https://archive.analytics.mybinder.org/ is public. The logs of individual sessions aren’t, we’d have to dig them out of our logging archive and as part of that process it to just extract paths of requests the people made or some such.
The results are pretty comparable to our 3:1 preference for classic Jupyter Notebooks: 13% of users use “Notebooks” vs 4% for JupyterLab. But the stark contrast with PyCharm (28%) and VSCode (23%) and even Vim (7%) is actually pretty surprising to me!
So, in light of this, I chatted with some folks at Gigantum, and we still feel that for the parties I’ve been thinking of (e.g., Bloomberg, Binder, Gigantum…) we do have a more pointed interest in how people prefer to work with notebooks specifically - we all have platforms built around notebook use / presentation.
I therefore propose the following small change, which perhaps we can start posting to Twitter on Wednesday or so if we get consensus by end-of-day-ish on Tuesday?
I’ve bolded the change in wording, and welcome edits / clarifications, but I also don’t want to spend too much time on what is ultimately not going to be a super-representative poll anyway.
I also had a quick look at the Binder logs. I suppose I should make a binder project (instead of a Gigantum one) that pulls those JSONL files in and then checks the individual github repos to try and figure out if they’re configured for Jupyter Notebooks or JupyterHub? The scraping is not hard, but I’m not sure if there’s a reliable way to determine the actual tool the binder repo is configured for?
So maybe I meant this Wednesday, right? Or maybe I let this slip for a week.
I realized I was waiting to do some initial analysis of the binder repos to launch the survey, but I’ll just launch that survey from @gigantumscience tomorrow AM, and I don’t think we need to be super-coordinated about it. Other folks like the Binder team can either tweet the same thing, retweet, or perhaps learn from my mistakes!
And I will get those analysis of existing binder repos done soon… there’s just a lot going on (for all of us!).
Just a comment as a very regular (and happy) Jupyter and Binder user: I don’t think that Binder is very representative of how people use notebooks in general. My guess is that Binder is mainly used for demos and except for rare cases where one might need a Jupyterlab extension (Dask comes to my mind), it’s probably not very common to use the more “productive” oriented Jupyterlab in that context. I use Jupyterlab all the time, but almost never on Binder, in particular never when I create e.g. content for online courses, where people unfamiliar with notebooks might be confused by Jupyterlab.
Right - that unfamiliarity and the challenge that comes with it are ultimately what (I think) we’re all interested in. I would be curious to hear more about your specific experience - were you actually experiencing more student difficulty with Jupyterlab? Care to point us at your courses and provide some sense of student characteristics?
Your point is well-taken though, that we should be careful to generalize Binder usage to the population in general.
I’ll try to give a complete explanation here (sorry for the length of the post). I’ll start by summarising how I typically proceed:
I create course content as Jupyter notebooks for a “real” in-person course that I have to give. The material usually ends up in a Github repository. Note that these courses are “crash-courses” of 1-2 days for people with some Python experience but not necessarily with Jupyter.
For the live course, I set up a JupyterHub with all the course material (TLJH + nbgitpuller) and students can just login. At the beginning of the course, I go through notebooks super-basics, just so that people can execute and modify code.
The course itself is a mix of me going through notebooks/live coding and exercises that students can do in the notebook. In that sense, I see notebooks as “in the background”. They are just a medium to offer content and a a simple platform to start coding immediately.
I try to make the course available and runnable to anyone using Binder.
I try to give instructions on how to run the course locally, usually using conda
This is a course on Dask. It’s the only example where I used Jupyterlab to have access to the dask-dashboard (it’s not yet entirely runnable on Binder because of issues with large datasets): https://github.com/guiwitz/DaskCourse
Now I have two main reasons for which I tend to use the classic notebook interface:
It offers notebooks as simple documents which look just like on GitHub or nbviewer with the additional option that one can run things. This makes the user experience (especially for beginners) very consistent. If there are “other things” on the page, people get naturally distracted and/or confused.
Installation: in general people want to try to install things locally at some point. Given that ranting about difficulties of installing things in Python is a popular activity, I try to show people how simple it is, and usually tell them to install miniconda and then provide an environment.yml file (often the same as for Binder) for installation. That works usually great. But here comes the JupyterLab drawback: installing extensions can be a pain. First, they cannot be added to the environment.yml file (if I’m wrong about that, I’d be super happy to learn about a solution!). Second, the installation is not straightforward: they need to be built, then one has often to restart, sometimes there a version problems and extensions work only with a certain Jupyterlab version etc. As I said, I enjoy working with JupyterLab, and those problems can be overcome with a bit of experience, but I think it can scare away people who have been primed with the “Python is difficult to install” moto. In some cases, one even runs in true issues: I really struggled for example at some point with installing extensions on a JupyterHub because somehow there was a limit on “build size” and I had to use some additional options.
I’m not entirely sure about how valid my point 1. is, and it rests more on an impression. The Dask course that I gave was on Jupyterlab and in the end people were not very confused (but probably these were more experienced users). So if the extension installation process was easier I would probably tend to use Jupyterlab most of the time. In my opinion some should actually be included by default (like matplotlib widgets, ipywidgets, table of content etc.), but I’m not at all familiar with the technology behind them to know if that’s feasible at all. I know that there is the extension-manager, but I sometimes ran into trouble with it. Maybe it has become more stable now and this is the way to go?
I’m completely aware that there’s no major “objection” in what I wrote for using JupyterLab and I often hear from experienced users that those are minor issues. But beginners get frustrated very easily, and my goal is to make them use Python so that they see all the benefits before they run into those issues.
I hope that helps! Sorry again for the length of the post!
As an update to my previous comment (which was moved elsewhere because it was technical), after a couple of weeks, I couldn’t solve the problem until someone suggested I disable all my ad blockers [I had only been ‘whitelisting’]. [Apologies to everyone else who had tried to help me]
[My wish still stands - an indepdent installer as JupyterLab develops.]
However - after fixing this, JupyterLab is looking more promising, indeed!
[users may have to live without ad-blockers, though]
Thanks @guiwitz for the extensive answer. It is actually helpful - you seem to be pragmatic and facing the same general question of “how do I quickly on-board ‘normal’ students?” By “normal” I mean students who are at least somewhat motivated about the subject, but probably not about software installation.
This is perhaps part of the reason that I am more positive on JupyterLab - at UC Berkeley I worked on a standard campus VM, and now I work at Gigantum where we use Docker for everything. I assume repo2docker can solve your issue as well, but definitely, Gigantum will allow an expert to install arbitrary software using arbitrary Linux commands in the “advanced” section. For example, here’s a short post about Dask Dashboards in Gigantum (I hesitate somewhat here, as I don’t want to be a skeezy self-promoter - I welcome PMs or comments here to that concern!). With these kinds of fully virtualized methods, you learn to pay a somewhat substantial up-front cost, but then you get superior on-the-same-page-ness from that point forwards.
But switching tracks, and answering this query:
I think as long as you’ve got nodejs installed, it’s not so hard to install extensions via the JupyterLab extension manager… It’s quick to try! It’s on-by-default now in JupyterLab 2.1. While that’s actually a minor annoyance to me - as I want to support JupyterLab for naive users without nodejs - your use-case helps me see the wisdom in that choice
So, I think virtualization solves some of your issues. It seems that changes in the JupyterLab extension manager may also sufficiently solve some of your issues. As always in Jupyter-land, there are ever-expanding sets of choices!