Too Agressive caching of nbviewer?

Hello all,

I received some personal communication that nbviewer seem to cache some notebook indefinitely. I’ve digged into the discourse posts and it seem to not be the first time this happens:

Who is experiencing these, how often and can you share links ?

ALso for transparency I haven’t been involved in nbviewer deployment for quite some time, so here is my curent understanding on how it is deployed, I guess that might interest a few of you.

nbviewer.jupyter.org, is proxied by cloudflare, to fastly which is used as as CDN, when you add ?flush_cache=1, that is the cache it flushes. Fastly as (after I tweaked it) a ma TTl for cache of 600s, and was setup to keep showing the page if nbviewer itself did not respond, but now doesn’t. Otherwise the TTL is taken from nbviewer responses headers.

For nbviewer itself, it is hosted on OVH (an european provider), using helm/kubernete (see GitHub - jupyter/nbviewer.org-deploy: Deployment files for nbviewer.jupyter.org), and I’m not sure who has access to the actual credentials to deploy nbviewer.

I believe nbviewer itself does some caching – it used to use memcache, but I’m usure now.

Any help on how to investigate caching issues would be welcome.

I’ll also tag this with mmybinder.org ops tags as I’m unsure there is a better tag.

1 Like

So After a bit of investigation it looks like if I hit the fastly URL it’s correct, so cloudflare is over-caching.

Correction, the fastly is incorrect, and now the nbviewer one is as well. So I’m confused.
RIght now I have 3 tabs, where

  • Github rendering is correct,
  • Fastly is not
  • Cloudflare is.
1 Like

So, some more debugging later.

If you are a admin on fastly like I do, you can make cache request for a given object on all nodes, for a given URL, it look like so:

You notice that some of the hashes are not identical so different nodes have objects cached differently.

The format of the TTL string is described here in fastly docs. the don’t seem good. You can check from your computer which node/cache you get with

curl -svo /dev/null -H "Fastly-Debug:1" $URL

and the reply should be more like

< fastly-debug-ttl: (H cache-bur17532-BUR 588.449 0.000 12)

Without dashes.

My guess is that some page had infinite caches and and are sticking in the cache, so I’ll try to flush them. Maybe flush all caches

So even after flushing cache for this key, the hash value seem to differ between caching nodes, and some TTL is still infinite/None.