I’d like to start a discussion about ways to recognize ‘significant and sustained’ contributions by companies/institutes. I work at Quansight and we have several developers contributing to JupyterLab core and various JupyterLab extensions. To be clear I’m not talking about one-off contributions but places where significant work has been done in a community friendly manner. Some of our clients would like to show that they are good open source citizen’s since it helps their reputation as an open source friendly working environment. In the past we have done these in blog posts but by their nature these tend to be sort of ephemeral. Do folks in the community have ideas on how this could be done. I realize that there probably needs to be a way to distinguish these kind of contributions from the direct financial support Jupyter has received for organizations. How the community does this might need to differ for jupyterlab-core vs extensions or maybe it can be the same. One idea we have for extensions specifically might be to add a ‘development supported by’ section to the readme of the extension on github.
I think the advantages of this for the Jupyter community can be to encourage other companies/institutes to step up in terms of committing their resources.
This is a big topic, so I thought to kick things off I would make a suggestion. I created a PR to the JupyterLab git repo that attempts to give a narrative history of the contributors to it:
I have left off the particular company who funded our work at Quansight from it initially, since we have not checked in with them yet on the idea. However, I wanted to put it out there to get feedback from the larger community on whether adding a “history” section to the README is a good way to highlight the different people and organizations who have supported a project. Obviously, it’s easier for newer projects like an extension, and might be too long and complicated for a project like JupyterLab.
I’d welcome a section in the README (or equivalent) for companies and institutions that fund(ed) work to display their logo. I think it is good for those who are doing the funding (they get some street cred) and for the project (it functions a bit like a “testimonials” section)
I would not like a history section that mentions people or organisations specifically. A short “This was started at PyCon 2002, then revived again in 2012 and has been going strong ever since!” is nice. Anything more detailed encourages a cult of personality and almost always will miss out people or be biased in some other way. When I do read it in other projects it is rarely interesting.
intermediaries should not be mentioned. This means if the National Funding Agency is funding people at University of Averagetown to work on Project Amazing it should be the banner of the National Funding Agency that appears (same for companies that provide labour for their clients).
One immediate question to which I have no good answer is how to decide when to add the banner. Probably a objective metric would be best as it would be objective and simple to decide if the threshold has been crossed. However I can’t think of such a metric. A more fuzzy method for deciding that is based on assuming good faith actors will eventually lead to someone (out of not knowing or because they are more “outgoing”) asking to be represented after contributing “much less” than others. A fixed “after N funded hours/PRs/issues/lines of code you can add your banner” also seems naive and unimplementable. Another question is if your spot is in perpetuity or only while you maintain your level of contributions.
Now an idea from my past life in High Energy Physics (HEP). The field of HEP has been dominated by large collaborations formed of hundreds to thousands of people from tens to hundreds of research institutes. They form a collaboration which then builds and operates a experiment (for example there are four big experiments at the LHC right now, with a total of 2-5 thousand authors (guesstimate)). The currency of academia are publications and in HEP this manifests in everyone from a PhD student to a professor wanting to become a recognised “author” with their collaboration.
As an author your name is on the list of people listed as authors for each journal publication submitted by the collaboration (you might have seen three page journal articles with a ten page author list). Generally it is only “authors” who get sent to scientific conferences to speak on behalf of the collaboration.
You have to earn your spot on the authorlist through service to the collaboration. Once you stop covering your service requirements (people leaving the field for example) you have a certain amount of “retirement benefits” after which you will be no longer listed as an author for future papers.
What counts as service then? Usually all the tasks that have to be done for the experiment to continue to function and the collaboration to keep producing papers. This means being on shift operating the experiment, writing&maintaining the software needed to take and analyse the data, reviewing draft publications, outreach activities, serving in certain parts of the management, etc. There are tasks for which PhD students are better suited and tasks for senior professors. The common theme is that usually these are tasks that need doing but have no “academic glory” attached to them. Hence service. The concrete tasks and positions are usually organised by the governing body of the collaboration or the appointed management.
The system is generally “self governing”, the collaborations themselves create and modify and police the rules. They aren’t imposed by the labs hosting the experiments or funding agencies. The different collaborations I was part of had quite different setups, my friends in other collaborations report yet different rules/procedures. There is, of course, politics involved in assigning the more interesting of the thankless tasks, etc. However on the whole it seems to work well for collaborations from ~400 people up to ~1500 (this is what I’ve experience personally). From what others tell me collaborations of >3000 people seem to have become dysfunctional (there is often not enough service tasks for all the people to maintain their service requirements … which creates all sorts of weird social results).
Key take aways:
very large groups bound by not much more than a memorandum of understanding manage to work together over 20year time scales to operate some of the most complex pieces of kit ever created
qualifying as an “author” is governed by fairly objective rules, with a side of politics
benefits of authorship status are “self evident” (to those part of the system)
you generally can’t double dip (tasks that qualify you as author are rarely also useful for other forms of academic credit)
status as an author is “personal” (you don’t become one just because you join a particular research group at a university)
maintaining your status requires continuous effort
Another thought is to do something sort of more in line with our copyright policy:
If individual contributors want to maintain a record of what changes or contributions they have specific copyright on, they should indicate their copyright in the commit message of the change when they commit the change to one of the Project Jupyter repositories.
If funding institutions want credit, they can immediately start with noting it in the commits they fund. Additionally, I think the next level of exposure would be to expose contributions (code and funding) in the changelog and/or release announcement. For example, see the VS Code announcement that lists such things: Visual Studio Code November 2019
If a company wanted even more exposure, they could make a blog post of their own, in conjunction with our release announcement, which elaborates on the things they sponsored, why they think it is cool, how excited they are to support the community, etc.
Having a bunch of logos is also easier to keep updated than a history section! I would be fine with that, just was worried it might come across as more “add” like than some text (similar to how NPR only allows reads aloud text from sponsors instead of letting them submit their own audio?).
I drafted a history section for the Git plugin, curious what your thoughts are (too boring?). It also includes intermediate agencies (Quansight), but we could remove them, I understand your point about just attributing to source funders.