CSV file gets uploaded incorrectly

Whenever I try to upload a CSV file to work with the data in Jupyter Notebook, the file is uploaded incorrectly. It does not show the header properly and looses some of the data as well. Here is how I fixed it:

  1. Create a new file in JupyterLab
  2. Rename it to Whatever you would like “—.csv”
  3. Open the file in the editor in JupyterLab
  4. Open your original CSV file in a notepad
  5. Copy the contents from your original CSV in the Jupyter editor of your new file

What does your JupyterLab say in the corner for the kernel? Or better yet, where are you accessing it from since you say you are ‘uploading’?

I suspect this is the issue as described here; however, without more information it is hard to be sure.

I have downloaded a dataset (CSV format) from Kaggle. To use it in Jupyter, I am uploading the CSV using the “Upload Files” option in Jupyter.

This is how the image looks like when I open the CSV in Jupyter:

Link for the Dataset

The Original CSV file looks something like this:

You specifically cut off the right side upper corner that shows the kernel I asked about.

Fortunately for making progress despite not addressing the question, the icon in the far right corner matches the JupyterLite branding you can see here at the main JupyterLite page.

You’ll note that the ‘Status’ on that page with the charge symbols around it is trying to warn you that JupyterLite is very much under development and so consider it experimental. This notice is echoed at the Try Jupyter page. Look for the caution symbols highlighting Experimental.

JupyterLite runs inside the browser on your machine using Web assembly (WASM) to do the computation and so it is more limited by what browser functionality easily allows. Progress is being made to work around these limitations but development takes time. Hence, the warnings.

Thus for troubleshooting what could possibly be happening with your csv file, my suggestions elsewhere already linked above and covered in the bottom paragraph of another recent post on this forum with a similar title are how I suggest you proceed. (IMPORTANT UPDATE: As of March 29, 2023 the problem with uploads greater than 1 mb silently getting truncated has been fixed if you use JupyterLite version 0.1.0b22 and beyond.) Use an actual typical, full-Python kernel based JupyterLab remote session. See if you can drag-and-drop in the csv file from your local machine into a session launched from MyBinder with a typical, full Python kernel backing it.

Using JupyterLab with a typical, full Python elsewhere would also work for troubleshooting.

By determining if the same thing happens there or not, we can rule out JupyterLite/browser abilities limiting upload size to be the issue or not. I suspect given several recent posts related to this, that is the culprit; however, if it is ruled out then we can look into other things having eliminated that as being involved.

1 Like

Hi, here you go with the full image. Sorry about cutting the right hand corner part. I thought you were interested in the Kernel Details on the bottom part.

Since I am new to this, I did not know what to look for as “status” of the Kernel hence that chopped off snapshot. Thanks for the detailed explanation. My attempt to post this was just to make other users aware and not waste time. I did not know so much in detail about what works and what does’nt in the Lite version, but hope I did not offend the developers with this post.

You’re fine. I should have specified what corner.
You can clearly see that is ‘Pyodide’. So it is the WASM-based Python kernel for JupyterLite.

Hence quite a few of the things that work in Jupyter that everyone else describes aren’t going to work there. As a learner, you a better off using typical full Python kernels.

However, while I believe the issue may be the size limit for upload, it also may be that you aren’t exactly reading in the ‘csv’ file correctly. I cannot rule that out. I don’t have a Kaggle account so I don’t have access to the file. Occasionally though, pd.read_csv() needs additional options set to read text data correctly. Give your post above of a view of the CSV using the CSV viewer in JupyterLite, I don’t think that is the issue. While I’d prefer the actual text be view with the editor view in your JupyterLite, i.e, right-click on the CSV file in the file navigator panel and select 'Open with...’ > ‘Editor’ , I suspect it would show the same thing you already supplied. (Note after suggesting the latter way to view the uploaded content, I realized you had already supplied something that probably pretty much addressed this and edited these last few lines after-the-fact.)

Hi! If I have understood correctly, does having Full Python mean that you are suggesting installing JupyterLab on my local PC?

While that is an option, that wasn’t what I was suggesting for now. You can follow what I pointed out above to use a typical, full Python kernel on a temporary remote machine. As I suggested above you can follow the instructions here to use sessions served by MyBinder.org to have typical, full Python kernels in JupyterLab in your browser without installing anything and not requiring login. If the behaviour is different there and it works without cutting off anything in your CSV file than you know it is JupyterLite/pyodide/browser-limitations in play as the culprit.

I think I missed in the original version of my last post that you posted a view of the uploaded CSV already:

While that isn’t looking directly at the lines of text in the Editor view, I suspect it does indicate the file upload process got cut off in some way as if it was too large to be completely uploaded. In other words, you are having the trouble I specifically suggested in my first reply.