Running A Cell In A Background Thread

I’m trying to run a long-running cell in a background thread so I can check on from other cells. If I just run the cell as normal, the other cells will hang waiting on the long running cell.

Surprisingly, I can get pretty close to what I want just by running the long running cell in a background thread. The only issue that the output from the backgrounded thread starts creeping into the output of the other cells that I run. It looks like this:

# Long running cell
import threading, time

def network_call():
    for i in range(20):
        print(i)
        time.sleep(1)
    
threading.Thread(target=network_call).start()

0
1
2

# Another cell
print("Output from another cell")

Output from another cell
3
4
5
6

Any tricks I can try that might keep the long running cell output on that celll? Or other workarounds?

I was hoping that as a trick to work around this present flaw, you could use the %%capture out magic on the long running cell to collect a pure output stream and not contaminate the other cells out streams. I explain it here; you can just ignore the %store information if you want to display it in the notebook.

However, testing it, I found the %%capture didn’t contain it. But you can keep the output from the other cell isolated from the long running output using %%capture for the normal cell. And then show the output of the normal cell (print() in your example) in another cell with:

import sys
sys.stdout.write(out.stdout);

Better approach to try, running the long running cell in a multiprocessing process, see Python 3 Module of the Week: multiprocessing – Manage processes like threads.
Following your posted example, you’d run:

# Long running cell
import multiprocessing, time

def network_call():
    for i in range(20):
        print(i)
        time.sleep(1)
    
multiprocessing.Process(target=network_call).start()


# Another cell
print("Output from another cell")

This is working in my tests in the classic notebook interface and JupyterLab.
The output from the first cell stays isolated in the first cell as it continues to run and doesn’t pollute elsewhere. Yet, while the first ‘long-running’ cell keeps running, you are able to run the other ‘normal’ cells.

3 Likes

That works PERFECTLY! Thanks!

I’ve generally had better luck using the multiprocessing module in Python. This is another example where multiprocessing shines. (I couldn’t tell you why.) Fortunately, as noted in that link, the developers of the multiprocessing module developed it following the threading API so that you can pretty much swap in one or the other without changing much to be able to test either.

Here is a slightly more illustrative example showing the concurrent/interleaved output:

 # Long running cell
import multiprocessing, time
t0 = time.time()

def network_call():
    for i in range(5):
        print("Step %i at time %1.1f" % (time.time() - t0, i))
        time.sleep(1)
    
multiprocessing.Process(target=network_call).start()

Step 0 at time 0.0
Step 1 at time 1.0
Step 2 at time 2.0
Step 3 at time 3.0
Step 4 at time 4.0

print("doing other stuff at time %1.1f" % (time.time() - t0))

doing other stuff at time 1.6

1 Like

A problem with multiprocessing is that the processes have separate memory spaces. This means that I have to design inter-process communication rather than being able to quick-n-dirty use the same variables. What I often do during development is running some long-running process in a separate thread and control it via some global variables. For this it would be awesome to have the long-runner output in the right cell. But I’m not sure if this is possible at all, because the jupyter server would have to figure out due to which cell input the kernel is writing to stdout at a particular moment.

But I’m amazed that this works for multiprocessing (where the process can be used for this identification, I guess). I wasn’t aware of this.

1 Like

In my case, the thing I’m kicking off in the long running cell runs out-of-process anyway–so I’m no worse off introducing additional processes. I’m guessing that the even though it runs in a separate process, it blocks the kernel so it can return stdout back to the caller (that’s just a guess though). In any event, the multiprocessing invocation does exactly what I needed. Obviously, YMMV…

1 Like