Need some help with spawners

dstack4273 · June 6, 2022, 4:58pm

I need some help with a couple of things that I’ve been struggling with sorting out.

The first thing should be pretty easy. I have two docker images one that we’re using SystemUserSpawner to relate it to the user home directories on the system where everything is being run. The other is self-contained and doesn’t require any volume mounts to the host system. I think that I would need to implement a custom spawner that can use either SystemUserSpawner or DockerSpawner depending on which image is selected in the image picker. I haven’t been able to find a good example of how to go about implementing this though. I would be grateful for any pointers or examples that anyone knows of that does a similar thing.

This second thing is less straight forward and I’m not even sure where it might make the most sense to handle the problem. As I mentioned above, we have an image that uses SystemUserSpawner. The system itself has an automated account request process so a user that doesn’t already have access can request an account and after some amount of time they’ll be ready to work. The problem is that if a user attempts to use our Hub without an account on the host system they get stuck where they’re unable to successfully spawn a container or stop any of the phantom containers they attempted to spin up. As an admin, I can go in and see their account and these “containers” that they attempted to start but cannot stop them[1] or delete the user[2]. I made a new API token and tried to work around the Admin UI to purge the containers and users with no luck. I also tried to drop the rows of the users from the sqlite table[3] which didn’t let the user log in, but did remove their row from the Admin user server view.

The only resolution we’ve been able to figure out so far is to restart the Hub which clears up the problematic entries in the user table and drops the phantom containers from the hub/proxy. It would be great if there were some way to reset things for those users without requiring a restart. Even more ideal would be if there is some logic that could be added to the spawning process that would prevent the bad state ever getting reached to begin with.

Our production instance is running Hub version 1.4.2, but I’ve gotten 2.3.0 staged and should be deploying this week if that matters any.

Thank you so much for reading and if you have any questions please let me know.

These aren’t the exact same problem, but the errors I get in my scenario match these two:
[1] - Deleting a spawning server · Issue #2975 · jupyterhub/jupyterhub (github.com)
[2] - Deleting user who has a running server is creating a warning message · Issue #647 · jupyterhub/jupyterhub (github.com)

dstack4273 · June 6, 2022, 4:59pm

Sorry I could only include 2 links in my initial post :-/

[3] - As detailed in the “how to fix” section from jstaf:
Deleting OS users causes Jupyterhub server to fail to start · Issue #1060 · jupyterhub/jupyterhub (github.com)

manics · June 7, 2022, 11:09am

Have a look at GitHub - jupyterhub/wrapspawner: Mechanism for runtime configuration of spawners for JupyterHub, it may work for you.

Would you mind expanding on the exact problem you need to solve? I think if you modify your spawner to return an error if the user doesn’t exist on the host system you won’t have any stale servers since they’re never started.

dstack4273 · June 7, 2022, 1:15pm

Sorry if I was meandering to the point. Yes exactly, I believe that would resolve the issue since as far as I’ve been able to tell there is no other way to interrupt or clear the bad state a user gets into unless you restart the Hub. I think I remember seeing a spawner function that resolves just before spawning happens, would that be the most reasonable place to check for a user on the host system?

Thanks,
-Dustin

minrk · June 8, 2022, 7:14am

That would be Spawner.pre_spawn_hook.

dstack4273 · June 13, 2022, 9:46pm

If I return False if the check for the user account isn’t successful, would the spawn process be interrupted and everything else that might be pending start get stopped on its own, or would I also need to call stop() to terminate spawning? I think if I’m understanding the Spawner class code correctly the call to run_pre_spawn_hook function it just returns whatever is returned from pre_spawn_hook so returning False should stop the rest?

manics · June 14, 2022, 9:08am

You’ll need to raise an exception in your hook. The server should be stopped:

github.com

jupyterhub/jupyterhub/blob/3b59c4861f155f868bcf29c00dfa78034d289950/jupyterhub/user.py#L802-L820

      
        
            except Exception as e:
                if isinstance(e, AnyTimeoutError):
                    self.log.warning(
                        f"{self.name}'s server failed to start"
                        f" in {spawner.start_timeout} seconds, giving up."
                        f"\n{start_timeout_message}"
                    )
                    e.reason = 'timeout'
                    self.settings['statsd'].incr('spawner.failure.timeout')
                else:
                    self.log.exception(
                        "Unhandled error starting {user}'s server: {error}".format(
                            user=self.name, error=e
                        )
                    )
                    self.settings['statsd'].incr('spawner.failure.error')
                    e.reason = 'error'
                try:
                    await self.stop(spawner.name)

dstack4273 · June 26, 2022, 4:56pm

I attempted to get this functioning on our test and prod environments last week and it didn’t seem to accomplish what I was hoping. As I’ve thought about it, I think that I might need to implement the check up a layer earlier.

In our system we have a custom log-in page where we use gitlab for oauth to authenticate. The user then is presented with the image dropdown to choose what they want to spawn and then it fails because they don’t have an account on the system.

I added a pre_spawn_hook function to my config file and rebuilt the hub image as recommended:

def system_account_check(spawner):
    import pwd
    username = spawner.user.name
    try:
        pwd.getpwnam(username)
    except KeyError:
        print("User doesn't have an account on this system or it isn't finished being created. Try again later")
        return False
    return True

c.Spawner.pre_spawn_hook = system_account_check

The user, without a system account ends up getting failed spawn attempts, and in the logs while they’re logged in after attempting to spawn a container I see a sea of lines like this in the logs:

[I <datetime> JupyterHub pages:401] <username> is pending stop

That was how it had been before, and that’s not really anything new. What is new however is now after the user gets in this state, I can no longer load the admin page. The browser console shows a 500 error response in the call to /hub/api/users?offset=0&limit=50. If I take a look in the logs I see this:

[E <datetime> JupyterHub web:1789] Uncaught exception GET /hub/api/users?offset=0&limit=50 (IP)
    HTTPServerRequest (protocol='https'), host=<hostname>, method='GET',
    uri='/hub/api/users?offset=0&limit=50', version='HTTP/1.1', remote_ip=IP)
    Traceback (most recent call last):
    File "/usr/local/lib/python3.8/dist-packages/tornado/web.py", line 1702, in _execute
        result = method(*self.path_args, **self.path_kwargs)
    File "/usr/local/lib/python3.8/dist-packages/jupyterhub/scopes.py", line 494, in _auth_func
        return func(self, *args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/jupyterhub/apihandlers/users.py", line 161, in get
        user_model = self.user_model(u)
    File "/usr/local/lib/python3.8/dist-packages/jupyterhub/apihandlers/base.py", line 302, in user_model
        servers[name] = self.server_model(spawner)
    File "/usr/local/lib/python3.8/dist-packages/jupyterhub/apihandlers/base.py", line 209, in server_model
        model['state'] = spawner.get_setate()
    File "/srv/jupyterhub/jupyterhub_config_prod.py", line 176, in get_state
        state = super().get_state()
   File "/usr/local/lib/python3.8/dist-packages/dockerspawner/systemuserspawner.py", line 187, in get_state
        if self.user_id >= 0:
    File "/usr/local/lib/python3.8/dist-packages/traitlets/traitlets.py", line 577, in __get__
        return self.get(obj, cls)
    File "/usr/local/lib/python3.8/dist-packages/traitlets/traitlets.py", line 540, in get
        default= obj.trait_defaults(self.name)
    File "/usr/local/lib/python3.8/dist-packages/traitlets/traitlets.py", line 1580, in trait_defaults
        return self._get_trait_default_generator(names[0])(self)
    File "/usr/local/lib/python3.8/dist-packages/dockerspawner/systemuserspawner.py", line 164, in _user_id_default
        return pwd.getpwnam(self.user.name) .pw_uid
KeyError: "getpwnam(): name not found: <username>"

So I guess that maybe this means that the pre_spawn check is going to be insufficient because the user is getting inserted to the users table once they are authenticated, and somehow or another there is a relationship between systemuserspawner and the users api such that it’s going to cause other (arguably worse) problems than just the user not being able to get into the system until after we restart the hub.

manics · June 27, 2022, 1:45pm

You need to throw an exception and let it propagate up the stack. The return value of pre_spawn_hook is ignored by JupyterHub.

dstack4273 · June 28, 2022, 11:53pm

Thanks for the pointer @manics, I spent a ton of time trying to wrap my head around exceptions yesterday and I’m still a little confused. I would’ve expected the exception raised from the pwd.getpwnam to propagate up the stack, but I suppose my try/except is catching it so I have to do something with the exception myself. I tried changing my function today to remove the return values and just reraise the KeyError like so:

def system_account_check(spawner):
    import pwd
    username = spawner.user.name
    try:
        pwd.getpwnam(username)
    except KeyError:
        print("User doesn't have an account on this system or it isn't finished being created. Try again later")
        raise

c.Spawner.pre_spawn_hook = system_account_check

But that left things in the same state. Based off of the fact that now a user is able to get to the spawn page to select an image to spawn, which then fails due to them not having an account on the host system. I think that this is probably something that needs to be handled at authentication time. We use LocalGitLabOAuthenticator, which just as far as I can tell is just a class that combines LocalAuthenticator and GitLabOAuthenticator. I see that there are a couple of methods that I might be able to use to help with this from LocalAuthenticator (pre_spawn_start and system_user_exists). The problem is that it seems like everyone that is uses these methods are using them in their own authentication classes…which seems like overkill for this problem. I haven’t been able to sort how I might go about calling either of those methods directly. Is there something that I’m overlooking/is this even possible?

manics · June 30, 2022, 10:52pm

If you need to override some class methods that’s easy to do in jupyterhub_config.py if you want, e.g.

from jupyterhub.auth import DummyAuthenticator

class CustomAuthenticator(DummyAuthenticator):
    async def authenticate(self, handler, data):
        username = await super().authenticate(handler, data)
        if username:
            ... do stuff ...
            return username
        return None

c.JupyterHub.authenticator_class = CustomAuthenticator

dstack4273 · July 10, 2022, 7:14pm

That ended up working brilliantly, thanks for the pointer. For whatever reason I couldn’t get a call to system_user_exists to work, but since that account check is essentially what I had already implemented in my pre_spawn_hook function, I just reused that code and it works great. Now when someone tries to log-in and doesn’t have a system account I raise an http error and made a fancy little template telling them to contact our team for assistance. Thank you so much!

-Dustin

Topic		Replies	Views
dockerspawner.SystemUserSpawner requries --allow-root in TLJH JupyterHub jupyterhub	1	1800	December 2, 2020
Switch to Hub/Lab4: Jupyterhub and spawner Jupyterlab container can't communicate properly: "No user identified" JupyterHub	5	604	April 22, 2024
SwarmSpawner -- JupyterHub can't connect to user servers JupyterHub help-wanted	1	651	November 5, 2021
Adding docker volume mounts with SystemUserSpawner? JupyterHub jupyterhub , help-wanted	1	941	July 27, 2020
Problem with DockerSpawner volumes JupyterHub	4	629	December 10, 2023

Need some help with spawners

Related topics