Hi,
I am new to Jupyter Notebook.
How do I extract large ORC file into CSV file using python jupyter notebook? and is there any limitation of ORC files in jupyter notebook that can handle?
I am extracting large ORC file (58G) into csv files in python jupyter, but I could not generate csv file after run this python code in jupyter notebook.
import pyorc
import csv
example = open("./ORC-A.orc", "rb")
reader = pyorc.Reader(example)
rows = reader.read()
with open('orc.csv', 'w') as out:
csv_out = csv.writer(out)
csv_out.writerow(reader.schema.fields.keys())
csv_out.writerows(rows)
When I tried 19297__currentWLAN.orc, which is smaller size of ORC file (18.5MB), I got Parse Error could not convert this into csv file
> #Checking version
> !python -V
> !jupyter --version
> !jupyter notebook --version
>
> import pyorc
> import csv
> example = open("./19297__currentWLAN.orc", "rb") #19297__currentWLAN.orc -file size is 18.5 MB
> reader = pyorc.Reader(example)
> rows = reader.read()
> with open('19297.csv', 'w') as out:
> csv_out = csv.writer(out)
> csv_out.writerow(reader.schema.fields.keys())
> csv_out.writerows(rows)