Hi All,
What is the maximum file size that jupyter notebook can import and convert into csv file? I have orc files - sample.orc (file size 2GB) and sample2.orc (file size 63GB) to import into jupyter notebook but cannot even load sample.orc to read and covert into csv file.
Appreciate your kind help, suggestions.
Does it work from the terminal version of your process?
Is there a shell command you can use to offload this to another process? Something along the lines of:
!orc2csv in.orc out.csv
With file sizes that large, keeping the whole thing in RAM (twice) can have fairly negative impacts when working interactively, whether in a notebook, or just in a kernel.
Hi bollwyvl,
Thanks for reply. I am using jupyter notebook 6.4.12,
Python 3.9.13
Selected Jupyter core packages…
IPython : 7.31.1
ipykernel : 6.15.2
ipywidgets : 7.6.5
jupyter_client : 7.3.4
jupyter_core : 4.11.1
jupyter_server : 1.18.1
jupyterlab : 3.4.4
nbclient : 0.5.13
nbconvert : 6.4.4
nbformat : 5.5.0
notebook : 6.4.12
qtconsole : 5.2.2
traitlets : 5.1.1
how shall i apply the code that you have suggested? Below code is trying to read 18.5MB orc file in the jupyter notebook but I got parse error
import csv
example = open("./19297__currentWLAN.orc", "rb") #reading 19297__currentWLAN.orc file which has 18.5MB
reader = pyorc.Reader(example)
rows = reader.read()
with open('Sample123.csv', 'w') as out:
csv_out = csv.writer(out)
csv_out.writerow(reader.schema.fields.keys()) #write the header, columns parameters
csv_out.writerows(rows)
Sorry, I don’t really know much about that file format.
Can you put that script into a python script, and run it that way? I’m just not sure if this is actually related to jupyter, or even ipython/ipykernel. Is there any chance the file is malformed? Do you have any other tools that can open this file?
1 Like