Autograding Notebooks with Otter-Grader

Hi all,

UC Berkeley’s Data Science Education Program is excited to announce the beta release of version 1.0.0 of Otter-Grader!

Otter-Grader is a new open source Python and Jupyter notebook autograder that allows instructors to customize their assignment creation and grading pipeline. It supports many different types of autograding infrastructure, from grading locally on the instructor’s machine to a deployable grading service that students can submit to. It was originally designed to be a serverless autograding solution for use at institutions who can’t afford or maintain the server overhead required for traditional autograding services.

Otter is on the verge of its v1 release, which includes updates to its existing functionality, including a reorganized CLI and several bug fixes. It also includes a variety of new features, chief among them being a deployable grading server called Otter Service and an assignment development tool called Otter Assign (a fork of okpy’s jassign that is also backward-compatible with jassign’s format). These updates are intended to make Otter a robust full-scale assignment development and grading tool.

With the updates in v1, Otter is divided into six main tools:

  • Otter Assign, which allows instructors to write questions, solutions, and tests in a single notebook and parses this notebook into the requisite files
  • Otter Check, which allows students to run checks from the command line and in a notebook against public tests
  • Otter Export, which exports notebooks to PDFs using pandoc and a custom LaTeX template
  • Otter Generate, which takes autograder test files and generates an autograder configuration file compatible with Gradescope’s proprietary autograding service
  • Otter Grade, which allows instructors to grade students’ submissions locally in parallel Docker containers
  • Otter Service, which builds and manages a deployable Otter grading service

The main goal of Otter is cross-compatibility. We want instructors to be able to fit Otter into their assignment pipelines and be able to use it in the easiest form without much manipulation required. For example, when grading with Otter Grade or Otter Service, PDFs of notebooks can be automatically uploaded to Gradescope for easily grading manually graded questions. Otter’s future contains features that are intended to extend this compatibility, including Canvas LTI for Otter Grade and Otter Service and integrations with other LMSs.

Otter is designed to be a scalable autograding solution with a low barrier to entry. It hopes to encapsulate all relevant steps of the assignment pipeline in a way that is conducive to any instructor’s preferred platform.

We are always looking for users and contributors, so please don’t hesitate to reach out to us! Please contact us at with any questions or for more information.

Thank you!


Thank you very much for sharing the news! I have two questions:

  1. Could you elaborate when to use otter and when to use nbgrader? What did you see as the key differences from an instructor’s point of view?
  2. Gradescope or Canvas seem first-class citizens in that tool - they got their own flags and they are not modelled as plugins/extension/… Now if I want to use other third party software, like ILIAS (for publishing test result) or stud.ip (for downloading the initial source code), how difficult you would estimate to integrate them? If I plan to do so, how is the proposed workflow?
1 Like
  1. The main difference between otter and nbgrader is that the former is agnostic of the organization of Hubs whereas the latter is designed to be used in a Hub-per-class system. Because Otter is used on the students’ end through a normal Python import and by the instructor through the command line or a 3rd-party LMS, it doesn’t rely on having assignments organized and graded by the Hub itself, freeing it from the restriction of needing to provision individual Hubs for each class.
  2. Gradescope and Canvas are currently the supported LMS’s because they are the LMS stack available at UC Berkeley, where Otter is being developed. While we do have plans down the road to integrate the use of other LMS’s, retrofitting the existing code shouldn’t be to difficult as it stands. If you’re grading locally, the main step would be building a metadata parser that can parse the export format of your LMS (cf. Gradescope additionally has support for its proprietary autograding service. Beyond export formats, the distribution and collection of materials is left to instructors, so there isn’t much need to retrofit other parts of Otter to make them compatible with other LMS’s.

Thank you very much for your explanation!

Hi Chris,

thanks for the extensive information. My team and I are developing a learning platform based on Jupyter Notebooks. We were prototyping a solution with OKPY when stumbled upon Otter Grader. I would like to ask you 2 questions in that context:

  1. Can Otter Grader be seen as a successor of OKPY? Both tools have similar functions, although Otter offers more.
  2. What are the key differences between OKPY and Otter Grader (except the additional functionality)?
1 Like
  1. I don’t know that “successor” is necessarily the correct term. Otter is more of an alternative to OkPy, designed to do the same thing in a different way. Lately OkPy has become more of a legacy solution, an Otter is coming into the space and filling many of the cracks that the shrinkage of OkPy has left behind. In that sense, maybe successor is more apt, but still, the design philosophies are somewhat different, given that OkPy is primarily a service and Otter is primarily a client.
  2. OkPy is primarily a server that organizes and grades submissions, and a smaller client-side package that interfaces with students. Otter is different in that it is primarily a client package, with tools and APIs designed to interact with instructors and students, rather than a server. Otter is designed to be light-weight and modular, allowing instructors much more customizability in their assignment distribution, collection, and grading pipelines and requiring minimal overhead.

Hope this helps!