SLURM batch spawner failing with client process running, config issue

I am running a slurm cluster and running batchspawner, but the spawned server does not communicate with the frontend server and keeps getting killed off.

 c = get_config()                                                                                                                                                                                                                                                                 
 import batchspawner                                                                                                                                                                                                                                                              
 import wrapspawner                                                                                                                                                                                                                                                               
 c.JupyterHub.ip = '0.0.0.0'                                                                                                                                                                                                                                                      
 c.JupyterHub.hub_ip = '0.0.0.0'                                                                                                                                                                                                                                                  
 c.JupyterHub.hub_connect_ip = 'server_ip'                                                                                                                                                                                                                                       
 c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'                                                                                                                                                                                                                       
 c.Spawner.http_timeout = 60                                                                                                                                                                                                                                                      
 c.BatchSpawnerBase.req_nprocs = '1'                                                                                                                                                                                                                                              
 c.BatchSpawnerBase.ip = 'server_ip'                                                                                                                                                                                                                                             
 c.BatchSpawnerBase.req_runtime = '12:00:00'                                                                                                                                                                                                                                      
 c.BatchSpawnerBase.start_timeout = 240                                                                                                                                                                                                                                           
 c.BatchSpawnerBase.req_host = 'server_ip'                                                                                                                                                                                                                                       
 c.BatchSpawnerBase.exec_prefix = ''                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                  
 c.SlurmSpawner.batch_script = '''#!/bin/bash                                                                                                                                                                                                                                     
 #                                                                                                                                                                                                                                                                                
 #SBATCH --output=/nfs/cluster/jupyterhub/jupyterhub_slurmspawner_%j.log                                                                                                                                                                                                          
 #SBATCH --job-name=jupyterhub-spawner                                                                                                                                                                                                                                            
 {% if partition  %}#SBATCH --partition={{partition}}                                                                                                                                                                                                                             
 {% endif %}{% if runtime    %}#SBATCH --time={{runtime}}                                                                                                                                                                                                                         
 {% endif %}{% if gres       %}#SBATCH --gres={{gres}}                                                                                                                                                                                                                            
 {% endif %}{% if nprocs     %}#SBATCH --cpus-per-task={{nprocs}}                                                                                                                                                                                                                 
 {% endif %}{% if options    %}#SBATCH {{options}}{% endif %}                                                                                                                                                                                                                     
 #!/usr/bin/scl enable devtoolset-8 -- /bin/bash                                                                                                                                                                                                                                  
 eval "$(conda shell.bash hook)"                                                                                                                                                                                                                                                  
 conda activate deep_learning                                                                                                                                                                                                                                                     
 {{prologue}}                                                                                                                                                                                                                                                                     
 which jupyterhub-singleuser                                                                                                                                                                                                                                                      
 printenv                                                                                                                                                                                                                                                                         
 {% if srun %}{{srun}} {% endif %}{{cmd}}                                                                                                                                                                                                                                         
 echo "jupyterhub-singleuser ended gracefully"                                                                                                                                                                                                                                    
 {{epilogue}}                                                                                                                                                                                                                                                                     
 '''                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                  
 c.ProfilesSpawner.ip = 'server_ip 

I am hoping that it is an IP configuration issue. Newbie to jupyterhub configurations, so fairly sure I have messed up somewhere trivial.

 [I 2021-11-15 11:56:08.804 SingleUserNotebookApp mixins:576] Starting jupyterhub-singleuser server version 1.5.0                                                                                                                                                                 
 [I 2021-11-15 11:56:08.808 SingleUserNotebookApp notebookapp:2302] Serving notebooks from local directory: /home/abc                                                                                                                                                             
 [I 2021-11-15 11:56:08.808 SingleUserNotebookApp notebookapp:2302] Jupyter Notebook 6.4.5 is running at:                                                                                                                                                                         
 [I 2021-11-15 11:56:08.808 SingleUserNotebookApp notebookapp:2302] http://xxxx:34577/                                                                                                                                                                          
 [I 2021-11-15 11:56:08.808 SingleUserNotebookApp notebookapp:2303] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).                                                                                                                     
 [I 2021-11-15 11:56:08.816 SingleUserNotebookApp mixins:557] Updating Hub with activity every 300 seconds                                                                                                                                                                        
 slurmstepd: error: *** STEP 156.0 ON xxxx CANCELLED AT 2021-11-15T12:06:05 ***                                                                                                                                                                                 
 [C 2021-11-15 12:06:05.807 SingleUserNotebookApp notebookapp:1972] received signal 15, stopping 

Fixed by downgrading traitlets: ProfilesSpawner stops Jupyterhub from recognizing running worker · Issue #41 · jupyterhub/wrapspawner · GitHub

Has anyone managed to get this working without downgrading traitlets? I’ve been banging my head against this for some time now and am going crazy.

I’m using: jupyterhub 3.0.0; latest batchspawner (git+https://github.com/jupyterhub/batchspawner@435d7ce7cbe02091becbc22c90c57d2c2de36b7f) and wrapspawner (git+https://github.com/jupyterhub/wrapspawner@83781e1fc6085f8939b6fea6304d8d0f024b0884)

In batchspawner.py: BatchSpawnerBaseasync def start(self): I see a final while loop which runs after it detects the submitted job is running:

            await gen.sleep(self.startup_poll_interval)
            # Test framework: For testing, mock_port is set because we
            # don't actually run the single-user server yet.
            if hasattr(self, "mock_port"):
                self.port = self.mock_port

It never escapes this while loop and I cannot find anything in the code where the port should be getting set (except initially where it’s set to 0).

Hi Dane,

Have you managed to get batchspawner/profilespawner going in jupyterhub=3.x? The current jupyter requires tratelets 5.x and the batchspawner can no longer import from 4.3.3.

Thanks,

Alex