I’m putting together an internal proposal for replacing our JupyterHub deployment with BinderHub. It’ll be private for initial evaluation, but potentially will be made public if it all works out. I’d like to include an estimate of the human resources required (e.g. total hours per week/month, don’t need a breakdown) if it becomes public, including tasks such as:
- general maintenance and upgrades
- investigating outages and problems
- monitoring usage and dealing with abusive usage
- maintenance of the main backend gke.mybinder.org
I imagine this is all of the work on https://github.com/jupyterhub/mybinder.org-deploy/
Obviously it’ll depend on our setup and usage but I think the requirements for the main mybinder.org are a good starting point, and this information could be useful for others planning a public BinderHub.
I don’t have a good estimate unfortunately. I spend about 30-60min on a working day trying to keep up with all things related to mybinder.org. This includes a lot of different stuff: checking on the deployment, admin tasks to move things forward, admin tasks to repair stuff, checking chat/forum, thinking about stuff, evangelising for mybinder.org, PRs on the various project project binder works on (BinderHub & repo2docker mainly, zero2jupyterhub sometimes, kubespawner, various notebook extensions rarely).
Without wanting to tempt fate the actual day to day operation of mybinder.org requires very little effort. Most of it repairs itself through restarts. Usually day-to-day stress comes when we add new features or otherwise (try to) improve things. The time between failures/events that require attention is probably “a few weeks” now. Every time we add a new cluster it goes down to “a few hours” until we overcome the growing pains for that cluster.
Usage on mybinder.org has gone up significantly now that university term has started again in Europe and the Americas. We are seeing more traffic than before so I suspect we are in for some more growing pain soon (but I don’t yet know what it will be ).
In summary I’d say operating a “not brand new” BinderHub deployment on a good quality cloud provider like GKE takes a few minutes per day of checking things over. If you want to keep up-to-date with new development you need a bit more time to understand how risky an update is (can I apply it before lunch or do I need to be available for problems?). The way to do that is to spend time engaging with the various repos and their day-to-day (potential to spend an infinite amount of time on this but generally the speed is low).
Thanks! 1h/day doesn’t sound that much for a resource as important as mybinder.
Just to echo what @betatim says - it’s very rarely more than 1hr a day these days, and at least personally it comes and goes in waves (e.g. the last few weeks I’ve been traveling and generally have been absent from Binder ops and issues). That said, the challenge with “it almost always ‘just works’” is the “except when it doesn’t” part. I don’t think this is unique to BinderHubs at all, but there’s a certain degree of uncertainty associated with a continuously-running service - especially when you’re improving and changing things. That said, I’m happy that running a BinderHub is now much more stable than it was, say, 16 months ago (just look back through our team-compass post-mortem notes for a taste).
Another thing worth noting is that none of us (maybe with the exception of a few) are professionally trained in dev-ops (at least, initially, though we do learn on the job). When things break in particular ways, it does require somebody that knows kubernetes more intimately, but that is becoming increasingly rare.
One thing to remember is that there are several people who spend some time every day on Binder. So one person with 1hr/day wouldn’t be able to run mybinder.org for the long term. It depends on several other projects and the effort invested in them (for example JupyterHub which does a huge amount of heavy lifting for BinderHub where Min and Georgiana spent a lot of time).
I think we made a conscious choice to focus on fixing and automating things so that operating the service doesn’t take a lot of time because we don’t have time :). At first this sounds like a good thing but it does mean the time we have had was spent on this instead of adding new features, documentation and the like. I’m happy we did what we did and will keep lobbying for us to keep working like this but it isn’t “free” :-/
The upside to all this is that if someone can negotiate “Binder as my 20% project” at work (and add a bit of their free time once in a while) you can make significant contributions and shape where we are going!