Is there a way to bulk delete old users?

Does anyone have an example of current best practices for user management that they could share?

I have a cluster that serves over 500 users, but about half of them leave the institution each year (we’re a two year school, so we have annually about 250 new users and 250 users that graduate). In the past when we only had about 100 users, I would manually delete old users from JupyterHub and their associated PVC/PV from GCP. However, it’s getting a bit unwieldy and seems like this would be ripe for automation.

It’s easy to bulk delete users PVC/PV since users have their graduation year included in their username, and GCP let’s you filter PVC by searching (e.g. I can search the term ‘23’ and see all users with 23 in their username). Once I have that list it’s easy to “select all” and then “delete” the storage claims. The admin panel on JupyterHub also let’s you filter the users in a similar way, but there’s no “select all” and then “delete” feature.

Is there another way to do this, perhaps from the command line with a call to an API? Thanks for any insight!

AFAIK, there is no API to delete multiple users at a time. I can think of two options

  • Write a script to list all users from the API and delete one by one based on your filters.
  • Use idle culler service that can delete users and servers that are inactive for a while. I think this option would be more easy to manage as the users that finish the graduation will not have any activity and you can configure that if user X is inactive for Y period of time, cull that user. You can even configure how often you run the service.

There isn’t a single API call to delete lots of users, but you can script against the API to find and/or delete the users. A few hundred shouldn’t take more than a few seconds.

Here’s a script that would find users matching a given pattern using fnmatch (e.g. '*23'):

import argparse
from fnmatch import fnmatch

import requests

# manage this token separately,
# or get from $JUPYTERHUB_API_TOKEN
token = "super-secret"
headers = {
    "Accept": "application/jupyterhub-pagination+json",
    "Authorization": f"Bearer {token}",
}


def should_delete(user, pattern):
    """Return whether we should delete this user

    Currently: glob match on usernames,
    but could be any other condition on the user model
    """
    return fnmatch(user["name"], pattern)


def find_users(pattern, hub_url):
    """Returns generator of user models that match `pattern`"""
    url = hub_url.rstrip("/") + "/hub/api/users"

    next_page = True

    params = {}

    while next_page:
        r = requests.get(url, headers=headers, params=params)
        r.raise_for_status()
        resp = r.json()
        user_list = resp["items"]
        for user in user_list:
            # only yield users that should be deleted
            if should_delete(user, pattern):
                yield user
        pagination = resp["_pagination"]
        next_page = pagination["next"]
        if next_page:
            params = {
                "offset": next_page["offset"],
                "limit": next_page["limit"],
            }


def delete_user(name, hub_url):
    """Delete a given user by name via JupyterHub API"""
    print(f"Deleting user {name}")
    r = requests.delete(
        hub_url.rstrip("/") + f"/hub/api/users/{name}",
        headers=headers,
    )
    r.raise_for_status()


def delete_matching_users(pattern, hub_url):
    """Delete users whose name matches a glob-style pattern"""
    # complete list before deleting because deleting changes the offsets!
    for user in list(find_users(pattern, hub_url)):
        name = user["name"]
        delete_user(name, hub_url)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("pattern", help="The pattern of username to delete")
    # default URL is the default direct Hub access URL,
    # assumes running on the same machine
    parser.add_argument(
        "--hub-url", help="The JupyterHub base URL", default="http://127.0.0.1:8081/"
    )
    args = parser.parse_args()

    delete_matching_users(args.pattern, hub_url=args.hub_url)


if __name__ == "__main__":
    main()

Use with e.g. python delete-users.py '*23' to delete all users whose names end with 23.

and the jupyterhub config to grant the script permission to do this:

c.JupyterHub.services = [
    {
        "name": "stale-user-deleter",
        # manage this token separately:
        "api_token": "super-secret",
    },
]

c.JupyterHub.load_roles = [
    {
        "name": "stale-user",
        "services": ["stale-user-deleter"],
        "scopes": [
            "list:users",
            "delete:users",
        ],
    }
]

If you want to be extra safe, you could do this in two steps, where collecting the list is one step that only collects usernames and dumps them to a file (or stdout), which you can review manually, and then run a simple deletion script that only iterates through a given list of usernames and deletes them. I’ve organized the functions in the script above such that you could use the same functions to do it either way.

This worked perfectly, thanks. I think for now I’ll run this on an as needed basis instead of automating it fully to avoid any unexpected deletion of users. Thanks so much for the help.

1 Like

here’s the updated script… for some reason i couldn’t edit my post and fix the github url: