Hi all,
I’m running into a problem of gaining access to a GCP instance. I’ve got a few issues compounded here, but I think I’ve narrowed down what’s going on.
Relevant context:
–I set this up on behalf of a student team
–I followed this guide to a T back in July (https://the-littlest-jupyterhub.readthedocs.io/en/latest/install/google.html)
–I created a 20gb 18.04 ubuntu OS
–I did not have or use a domain name to point to the IP Address before following this guide (https://the-littlest-jupyterhub.readthedocs.io/en/latest/howto/admin/https.html); I set it up with the GCP external IP address itself
–Opening a window with “SSH” under the “connect” column in the VM instances page of the GCP will hang and fail to connect me to the VM.
It says this indefinitely:
"Connecting…
Transferring SSH keys to the VM.
The key transfer to project metadata is taking an unusually long time. Transferring instead to instance metadata may be faster, but will transfer the keys only to this VM. If you wish to SSH into other VMs from this VM, you will need to transfer the keys accordingly.
[CLICK HERE] to transfer the key to instance metadata. Note that this setting is persistent and needs to be disabled in the Instance Details page once enabled.
You can drastically improve your key transfer times by migrating to [OS Login.]"
The TLJH that I set up worked up until two days ago. It broke when a student uploaded a 2gb file, likely because the GCP instance had run out of space. This is what the student had to say:
"Don’t know whether this info would help: I tried to upload a 2GB file last night and it got stuck at 63% uploading, so I refreshed the page. It gave me bad gateway at that time. Then I tried to log in again, after I typed in password, it shows a 500: Internal Server Error. This morning, when I go to D-Lab looking for help, I cannot even log in the page, like your state now"
At this point, I intervened trying to fix the issue, but I think I made it worse.
My first reaction was to restart the instance by stopping and starting the GCP instanced. However this caused the external IP to change, and because I had set up HTTPS with the (old) IP Address, I think it caused “scrambled credentials to be sent.”
At this point, I didn’t realize that this is probably what was happening, and restarted the instance another time. Interestingly enough, the error changed from “scrambled connection” to “connection refused” where chrome tells me that there are firewall problems. I did some reading on how to fix this problem, and found that this error can occur for TLJH notebooks if the instance runs out of space, which is more or less what I suspect happened when the student uploaded a large file. So I increased the capacity of the instance and restarted again, but with no luck. Accessing the external IP would again give me a “connection refused” error.
Next, I created a snapshot of the instance in case we might lose any information (unfortunately I do not have a snapshot of the instance before any of these problems, so this was my last ditch effort at saving any data). I tried creating a new instance from this snapshot itself with a larger storage space (in case the earlier try just gave me unformated storage), but this didn’t work.
I then created a new TLJH notebook from scratch, with the hope of loading the snapshot as an additional disk to access its contents. I was able to access the notebook from an HTTP connection and added the snapshot as an additional disk via the GCP. However, navigating through the notebook’s terminal, I was unable to find this disk anywhere to load the data.
I then experimented with giving it HTTPS encryption, and found that adding a Automatic HTTPS with Let’s Encrypt using the external IP alone would also cause “scrambled credentials” to be sent and would lock me out of the instance. What’s interesting is that this seems to be a new problem, because I’ve been able to access the external IP address with encryption directly without a problem in the past (so I need to do more bug testing), but I suspect that I may have lost track and stopped/started the instance which gave it a new external IP address and thereby replicated the first issue that I encountered (note: my issue and not the student’s issue).
So now without being able to access the notebook directly, I tried to ssh directly into the instance to update some configurations, but I had problems with that. I created an SSH key and added it to the GCP under “SSH Keys.” I then tried to ssh into the instance using the command ssh account@some.ip.address.here but kept getting “Permission denied (public key)” as an error.
I’m actually not sure if this is the proper ssh call, so please correct me if I was wrong, but I’m about 80% sure that I should have been able to ssh to the instance through my terminal at that step since I had copied my SSH key into the GCP
In any case, I ended up deleting the newly created instances and am back to square one trying to fix the problem.
Currently, where I’m at is my old instance gives me “connection refused” errors and I can’t ssh into it (I’ve also tried adding my SSH Key in this instance). I’m pretty sure that there are configuration problems in the instance so SSH is a necessary(?) step, but I can’t figure out how to access it.