ZTJH settings for large Microsoft Azure AD groups

Hi,

I’ve been setting up a Jupyterhub instance as a POC for my company. So far things are looking really amazing; however, I’m struggling a bit with the mapping of group membership to resource allocation. The core of my challenge is maintaining the group membership information. At present, I’ve figured out a path to get the full list of groups from Microsoft and then update the profiles and mounts based on group membership; however, Jupyterhub forgets the groups after some time making it all for naught. Can anybody provide any insight into how to prevent the group membership information from being forgotten?

Details:

I happen to be a member of several hundred groups and Microsoft will therefore not return the full list of groups to which I belong in the response token. One work around is for my IT department to define a list of groups in the SSO configuration so that the of groups returned to Jupyterhub is the intersection of the pre-configured list and the groups the user is a member of. This is painful as we have high turnover in IT and it takes days of re-explaining what they need to do every time. The other option suggested by Microsoft is that after I get the access token, I can make a second call to the graph api to get the full list of groups. I’ve figured out that recipe that works for this second path and with that information, I can update the profiles and mounts.

values.yaml (fragment)

...
hub:
  config:
    JupyterHub:
      authenticator_class: azuread
    AzureAdOAuthenticator:
      admin_groups:
        - TEAM-JUPYTERHUB-ADMIN
      allowed_groups: ORG-ALL-PEOPLE
      auth_state_groups_key: user.roles
      authorize_url: https://login.microsoftonline.com/${redacted}/oauth2/v2.0/authorize
      auto_login: yes
      client_id: ${redacted}
      client_secret: ${redacted}
      enable_auth_state: yes
      login_service: https://login.microsoftonline.com
      manage_groups: yes
      oauth_callback_url: https://${redacted}/hub/oauth_callback
      tenant_id: ${redacted}
      username_claim: samAccountname
      scope:
        - openid
        - email
        - profile
        - GroupMember.Read.All
        - Group.Read.All
        - User.Read
...

Note 1: I did need to work my IT organization to enable the GroupMember.Read.All and Group.Read.All scopes, get access pre-approved by the admin and then they had to do something on the back end to inject the samAccountName in the auth response. They didn’t share any details about what they did, so I’m unable to provide that information.

Note 2: The auth_state_groups_key is set to ‘user.roles’ which is a list of groups IT pre-configured for my app. I was originally using the Authenticator ‘post_auth_hook’ method and found that it ran to late to discover I was an admin. I figured it would not be too painful to have IT setup that one group as a specially blessed group that would always be present in the token.

jupyterhub_config.py (fragments)

...
import logging
...
from kubespawner import KubeSpawner
from kubernetes_asyncio.client import V1Pod
from oauthenticator.azuread import AzureAdOAuthenticator

logger: logging.Logger = logging.getLogger(__name__)
...
async def insert_microsoft_entra_id_groups(
        authenticator: AzureAdOAuthenticator,
        auth_state: dict) -> dict:
    '''Add the AD username to the auth state.

    :param auth_state: The auth state instance.
    :type auth_state: dict

    :returns: The auth state with the AD groups added.
    :rtype: dict
    '''
    if not isinstance(authenticator, AzureAdOAuthenticator):
        return auth_state

    get_members_url: str = 'https://graph.microsoft.com/v1.0/me/memberOf?$select=displayName'
    access_token: str = ''
    groups_user_is_a_member_of: list[str] = []

    access_token = auth_state['token_response']['access_token']

    get_more_data: bool = True
    while get_more_data:
        logger.debug(f"Get data from {get_members_url}...", end='')
        get_members_response: Response = get(
            get_members_url,
            headers={'Authorization': f"Bearer {access_token}"}
        ).json()

        received_groups: list[str] = [d['displayName'] for d in get_members_response.get('value', [])]
        logger.debug(f"got {len(received_groups)} groups.")
        groups_user_is_a_member_of.extend(received_groups)

        if '@odata.nextLink' in get_members_response:
            get_members_url = get_members_response['@odata.nextLink']
        else:
            get_more_data = False
    groups_user_is_a_member_of.sort()
    auth_state['entra_id_groups'] = groups_user_is_a_member_of

    if auth_state.get('entra_id_groups'):
        logger.debug('\n'.join(auth_state['entra_id_groups']))
    else:
        logger.warning("Microsoft Entra ID Groups Not Found")

    return auth_state

async def custom_options_form(spawner: KubeSpawner) -> str:
    '''Custom options form for the spawner.

    :param spawner: The KubeSpawner instance.
    :type spawner: kubespawner.KubeSpawner

    :returns: The html form
    :rtype: str
    '''
    groups = await get_groups(spawner)

    if not hasattr(spawner, 'default_profile_list'):
        spawner.default_profile_list = spawner.profile_list

    spawner.profile_list = spawner.default_profile_list
    if 'TEAM-JUPYTERHUB-GPU-ACCESS' in groups:
        spawner.profile_list.extend([
            {
                'display_name': 'AI Team GPU server',
                'description': 'Dynamically added for members of \"TEAM-JUPYTERHUB-GPU-ACCESS\"',
                'default': False,
            }
        ])

    return spawner._options_form_default()


async def pod_customization(
        spawner: KubeSpawner,
        pod: V1Pod) -> V1Pod:
    '''
    This is a hook that can be used to modify the pod before it is created.

    :param spawner: The KubeSpawner instance.
    :type spawner: kubespawner.KubeSpawner
    :param pod: The pod to modify.
    :type pod: kubernetes_asyncio.client.V1Pod

    :returns: The modified pod.
    :rtype: kubernetes_asyncio.client.V1Pod
    '''
    groups = await get_groups(spawner)

    if 'TEAM-JUPYTERHUB-AI-TEAM-SHARE-ACCESS' in groups:
        inject_mount(pod, 'ai-team-share')

    return pod


async def get_groups(spawner: KubeSpawner) -> list[str]:
    '''
    Get the groups the user is a member of.

    :param spawner: The KubeSpawner instance.
    :type spawner: kubespawner.KubeSpawner

    :returns: The list of groups the user is a member of.
    :rtype: list[str]
    '''
    groups: list[str] = []
    auth_state = await spawner.user.get_auth_state()
    if auth_state.get('entra_id_groups'):
        groups = auth_state['entra_id_groups']
    elif auth_state['user'].get('roles'):
        groups = auth_state['user']['roles']

    if groups:
        logger.debug(f"User \"{spawner.user.name}\" is a member of {len(groups)} groups.")
    else:
        logger.debug(f"User \"{spawner.user.name}\" is not member of any groups")

    return groups


def inject_mount(
        pod: V1Pod,
        name: str = 'ai-team-share'
        ) -> None:
    '''Inject a mount to the pod.

    :param pod: The pod to modify.
    :type pod: kubernetes_asyncio.client.V1Pod
    :param name: The name of the mount.
    :type name: str

    :returns: Nothing
    :rtype: None
    '''
    pod.spec.volumes.append({
        'name': name,
        'persistentVolumeClaim': {
            'claimName': name
        }
    })
    pod.spec.containers[0].volume_mounts.append({
        'name': name,
        'mountPath': f"/mnt/{name}"
    })
...
c.Authenticator.modify_auth_state_hook = insert_microsoft_entra_id_groups
c.KubeSpawner.options_form = custom_options_form
c.KubeSpawner.modify_pod_hook = pod_customization
...

Can you turn on debug logging and show us the logs for a user login with the correct groups, and the logs corresponding to the groups being removed?