Does mybinder limit background processes? Differing behavior between mybinder and repo2docker

Hi. I’m getting different runtime behavior between running a Binderized repo on the public mybinder instance and running it locally on my machine with repo2docker. Does mybinder limit the ability to have background processes run in parallel to the Jupyter server?

reproducible minimal failing / working example: I’m writing a notebook talk for an example at a conference GitHub - pyhf/pyhep-2021-notebook-talk: Jupyter notebook talk given at PyHEP 2021 on distributed statistical inference with pyhf. If you run the Binderized instance of the draft example on the current commit of f3fd846e9a24cf8c0f148e43ca033a8cecb92012 (c.f. launch button)

Binder

and go through the necessary authentication with Globus and then run the example on the local host you will get the following error for the cell

# retrieve output
result = fxc.get_result(task_id)
---------------------------------------------------------------------------
TaskPending                               Traceback (most recent call last)
<ipython-input-13-ddf3088a7076> in <module>
      1 # retrieve output
----> 2 result = fxc.get_result(task_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/funcx/sdk/client.py in get_result(self, task_id)
    250         task = self.get_task(task_id)
    251         if task['pending'] is True:
--> 252             raise TaskPending(task['status'])
    253         else:
    254             if 'result' in task:

TaskPending: Task is pending due to {self.reason}

Normally this would mean that the underlying tool (funcX) hasn’t finished executing the job yet, but the test case is designed to be trivial to compute and so should be done in well under a second of runtime. It looks like from inspection

$ ps -u $USER
    PID TTY          TIME CMD
      1 ?        00:00:00 python3
     18 ?        00:00:07 jupyter-noteboo
     84 pts/0    00:00:00 bash
    123 ?        00:00:00 off_process_che <defunct>
    124 ?        00:00:00 funcx-endpoint <defunct>
    125 ?        00:00:20 funcx-endpoint
    128 ?        00:00:00 off_process_che
    139 ?        00:00:59 funcx-endpoint
    142 ?        00:00:00 off_process_che
    146 ?        00:00:00 bash <defunct>
    147 ?        00:00:00 bash <defunct>
    164 ?        00:00:00 bash <defunct>
    165 ?        00:00:01 funcx-manager <defunct>
    175 ?        00:00:00 funcx-worker <defunct>
    224 ?        00:00:00 off_process_che <defunct>
   2117 pts/0    00:00:00 ps

that the funcx-workers are having problems as well. Does mybinder forbid additional processes?

If I build on the same commit with repo2docker though

$ repo2docker --ref f3fd846e9a24cf8c0f148e43ca033a8cecb92012 https://github.com/pyhf/pyhep-2021-notebook-talk

and run through the same process it is able to run with no issues

# retrieve output
>>> result = fxc.get_result(task_id)
>>> result
{'cls_obs': 0.0457153916230104, 'fit-time': 0.06638431549072266}

Any thoughts on what might be the difference in behavior between the two?

Main differences I can think of are:

I can reproduce your error on mybinder. Is there a way to get the funcx daemon to output some logs?

Is there a way to get the funcx daemon to output some logs?

I’ll ask the devs (there will be a delay from them as there is a US holiday) but from the docs it looks like it might require setting up a redis server and then altering the funcX config to point to it.

From Globus Compute Endpoints - Globus Compute 2.8.0 documentation

The endpoint establishes three outbound ZeroMQ channels to the forwarder (on the three ports returned during registration) to retrieve tasks, send results, and communicate command information.

This might be the problem- if those are three random ports they’ll be blocked.

1 Like