Cleaning Data - Splitting Data

Hello Jupyter world!

I am extremely new to Jupyter notebooks. So new in fact that I am currently working on a project for my DMDA degree and I am stumped about how to clean a column that has multiple data points per cell. I can easily do this in the excel data file by splitting the data but was curious if there is a way to do so in the jupyter notebook itself.

Example: column header = “genres”
Data contained in a cell example: “Action|Adventure|Science Fiction|Thriller”

How can I split that up into individual genres with the first being the main genre, or should I just do this in the data table itself?

You’ll want to check out the official pandas docs: they are really quite excellent.

This stack overflow has some more insights. It kind of depends on whether this is a strict tree, or more of a “tag” system as to what would be most useful: tags might benefit from a sparse adjacency matrix.

2 Likes