Enhancing Data Collection in JupyterHub for Learning Analytics

hello,

We deployed Z2JH (Zero to JupyterHub) at our institution as a trial for programming courses. After completing one semester, we are considering collecting data from JupyterHub to perform learning analytics with the goal of improving the student learning experience. However, upon examining the logs from Kubernetes pods, we found that the available information primarily includes login and logout events, while other details seem less useful.

We aim to collect more granular information, such as logs that capture when students type code in cells, run cells, or copy and paste code, etc. Is it possible to gather such detailed activity logs? Additionally, are there other types of useful information we can collect to better understand student behavior?

Our data collection will comply with ethical guidelines and include only anonymized data. After gathering the data, we plan to analyze it to identify patterns that might influence the submission or non-submission of programming exercises. We are considering a 6-month or longer project to achieve this and would greatly appreciate your guidance on where to start and how to approach this process.

best

1 Like

The jupyter-events system provides ways to instrument additional server-side features which can be configured to report to JupyterHub.

A browser client such as jupyterlab or notebook >=7 can emit well-known, JSON schema-described events, but this is not enabled by default. Adding such a thing is relatively straightforward, but requires doing some frontend development to get to an installable package which would be included in a learner’s runtime environment.

After building the backend schema to accept your events, the basic pattern (minus boilerplate) would be something like the status bar which captured common “plumbing” commands:

// plugin.ts

/* a bunch of boilerplate, see docs */

const INTERESTING_COMMANDS = ['notebook:run-cell'];

function activate(app: JupyterFrontEnd) {
  const {commands, events} = app;
  commands.commandExecuted.connect(
    (commands, executed) => {
      if(!INTERESTING_COMMANDS.includes(executed.id) {
        return;
      }
      events.emit({
        data: {"id": executed.id},
        version: "0",
        schema_id: "https://example.com/command/schema#"
     }).catch(console.warn);
  })
}

/* more boilerplate */
2 Likes

thank you very much for your response. I spent some time on the tutorials you shared. But I am still a little confused. The frontend is quite clear: I need to make an extension for the front end and install it on learner’s notebook.

my most important question is:
Assuming that I make a simple javascript extension that collects mouse clicks and movements and send the data to backend every 50ms. What backend should the data be sent to? what URL and port exactly? the hub or the learner’s server? and how can I configure the the server to listen on these events.

I have other questions as well, like how can I actually detect in the js extension if the learner clicks on the “run cell”, etc.
In the event tutorial in here: Logging your first event! — jupyter_events documentation the events are logged in a file but everything is run on the notebook. in js extension we don’t have access to filesystem

I was wondering which of these extension examples are most related to my case?

learner clicks on

Operating at the raw mouse level is not really to be recommended, as adding custom dynamic listeners to all elements could be very expensive indeed.

The previously recommended approach using commands relies on the common underlying code that a menu bar, context menu, keyboard shortcut, etc. all use, so will be more accurate as to user intent if not their actual mouse/keyboard/speech.

A look at the “run cell” button shows it has an attribute data-command="notebook:run-cell-and-select-next": this is the same command run by shift+enter. Without knowing what experiments you’ll be running on your learners, commands are likely the best way to gauge intent.

What backend

By using the built in app.events.emit(), these will be sent to the jupyter_server instance.

The previously posted first link above to jupyterhub’s documentation describes how to write to a file on the learner’s single-user server file system, but the EventLog could be extended to send JSON events to another system, such as a log aggregator, which could theoretically be co-deployed as a jupyterhub service.

which of these extension examples

Probably none of the extension examples are going to show how to collect lots of data, as that’s generally not something a single-user application really cares about. As a site administrator, you have the ability to change this, but so does a learner, to some extent.

A very simple pip- or conda- installable extension would have both a server side (defines your custom events) and a client side (js, etc) in one package. Starting from the extension-template, the server piece would follow and then register an event type while the above code snippet could go in the ts plugin.

After building this wheel, this would get added to the learner environment, and likely further configured (e.g. don’t hard-code your log aggregator port, put that in a spawn hook, env var, or something jupyterhub can control).

2 Likes

Hi @bpfrd :wave:,
at Paris Saclay University we are doing something similar, with every student running their own jupyter lab container.
In order to record the results of the execution of specific cells (identified by a given and parametrable metadata) I have started to code an extension that you can find here.
We choose to store the records as learning traces in json lines format.
The extension is in a very early stage of development but if it can be useful I would love to have feedback on its use in different infrastructures and environments.

3 Likes

thank you very much for sharing your experience and github repo. I was wondering if you have also a backend extension for handling the collected the data sent from frontend.

We decided to store the learning traces in each student workspace, so they can be submitted together with the assignments … we are using git and a forge to collect the assignments, the teaching infrastructure is independent from jupyter lab, even if in practice we are essentially using jupyter notebooks.
The learning traces are collected afterwards from the git repositories.

The creation, submission and collection of the assignments are managed with Travo, but this is outside the jupyter lab extension… so I’m not sure this is relevant for your application.

1 Like