Cleaning Data - Splitting Data

Desstro · June 20, 2022, 6:16pm

Hello Jupyter world!

I am extremely new to Jupyter notebooks. So new in fact that I am currently working on a project for my DMDA degree and I am stumped about how to clean a column that has multiple data points per cell. I can easily do this in the excel data file by splitting the data but was curious if there is a way to do so in the jupyter notebook itself.

Example: column header = “genres”
Data contained in a cell example: “Action|Adventure|Science Fiction|Thriller”

How can I split that up into individual genres with the first being the main genre, or should I just do this in the data table itself?

bollwyvl · June 21, 2022, 3:41pm

You’ll want to check out the official pandas docs: they are really quite excellent.

This stack overflow has some more insights. It kind of depends on whether this is a strict tree, or more of a “tag” system as to what would be most useful: tags might benefit from a sparse adjacency matrix.

Topic		Replies	Views
Del/drop - column.axis in data frames_Python(jupyter notebook) discuss how-to	2	16402	July 31, 2019
Leave only first character in the column cell General	0	363	February 22, 2020
How can I split cell in Jupyter Lab? (Jupyter 4.0.1) extensions jupyterlab , how-to , help-wanted	9	6556	November 28, 2023
JupyterLab Tabular Data Editor (Intern Project) JupyterLab	2	978	July 18, 2020
Repository of "beautiful" notebooks for students Meta	1	622	August 3, 2020

Cleaning Data - Splitting Data

Related topics