I get `Read timed out` errors on when pushing image OVH

Hi,

I’m getting repeated build errors when launching https://ovh.mybinder.org/v2/gh/geomar-tm/python-intro-201804/add-dask-labview?urlpath=lab (on OVH):

[...]
Pushing image
Pushing image
Error during build: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.

On GKE, the same build is working fine.

1 Like

Thanks for letting us know. For some images we end up with this problem on the OVH docker registry (another example). There is unfortunately nothing you can do about it. I’ll create a ticket for it to remove the image from the storage that backs our registry.

Still trying to work out what the pattern is that leads to some images having this problem :-/

Thanks for having a look, @betatim. It worked just after you opened the ticket. Does this mean the registry got fixed or was it just a lucky coincidence?

After pushing more changes to the repo, I see the same timeout on OVH for the new (and final, for now) Git ref again.

I think it was just coincidence. We have reduced the traffic to the OVH cluster for now till we figure this one out so you should always end up on GKE.

Is there anything special that comes to mind that you do (download huge files, delete lots of stuff…?)?

There’s not much going on:

  • graphviz installed via apt.txt
  • a fairly standard scientific python env file
  • dask-labextension installed in postBuild

The v2.0.0 worktree is 45 MB.

1 Like

We’ve increased the CPU and RAM available to the registry, let’s see if that fixes the problem. It wasn’t clear from the logs what was going wrong and reproducing the problem didn’t work reliably either.

The problem is still here.

It seems that it is the last part of the docker push that is taking to long => after all containers layers are pushed in the registry, the registry construct a global manifest to send back to the docker client. And the timeout appear (sometimes) at this moment on the OVH registry.

While we are working on our side to fix this issue, I think it would also be great to be able to configure the timeout parameter on the repo2docker command.

Here is the python error log on the push :

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 397, in _error_catcher
    yield
  File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 479, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.6/http/client.py", line 449, in read
    n = self.readinto(b)
  File "/usr/lib/python3.6/http/client.py", line 483, in readinto
    return self._readinto_chunked(b)
  File "/usr/lib/python3.6/http/client.py", line 578, in _readinto_chunked
    chunk_left = self._get_chunk_left()
  File "/usr/lib/python3.6/http/client.py", line 546, in _get_chunk_left
    chunk_left = self._read_next_chunk_size()
  File "/usr/lib/python3.6/http/client.py", line 506, in _read_next_chunk_size
    line = self.fp.readline(_MAXLINE + 1)
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/jupyter-repo2docker", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3.6/site-packages/repo2docker/__main__.py", line 344, in main
    r2d.start()
  File "/usr/lib/python3.6/site-packages/repo2docker/app.py", line 723, in start
    self.push_image()
  File "/usr/lib/python3.6/site-packages/repo2docker/app.py", line 456, in push_image
    for line in client.push(self.output_image_spec, stream=True):
  File "/usr/lib/python3.6/site-packages/docker/api/client.py", line 345, in _stream_helper
    data = reader.read(1)
  File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 496, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 402, in _error_catcher
    raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
1 Like

I created https://github.com/jupyter/repo2docker/issues/711 to track this.

If someone has time to help tracking down where the timeout is controlled/documented that would be great, a patch would be even greater :slight_smile:

1 Like

This is also happening with pyhf's Binder builds (fine on GKE, push times out on OVH).

Pushing image
Pushing image
Pushing image
<class 'str'>
https://github.com/diana-hep/pyhf
Error during build: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.

Thanks for making the repo2docker Issue, Tim. I have no free cycles (thesis writing) or I would join you on working on it.

This is should/might resolve itself after today. We ware experimenting with using a different docker registry for the OVH cluster.

OVH is currently not in the federation redirect so you should only get sent to the GKE cluster while we fiddle.