Problem displaying dataframe in Jupyterlab notebook using python

Hi,

I’m trying to get a notebook running in a jupyterlab instance to display a dataframe.

According to some stackoverflow posts, I should be able to simply import IPython.display and
use this as display. However, this does not seem to do anything specific and in particular, it does
not display the dataframe in tabular format.

Any tips on how to output the contents of the dataframe as a table within jupyterlab?

I’m using the jupyterlab/pyspark-notebook container from a couple of days ago on our k8s cluster.

Thanks in advance for any pointers,
Sean.

You can just use display(...) in JupyterLab, you shouldn’t need to import anything.

For example: Binder

Thanks for the quick response - I tried the sample you provided in a notebook on our system and everything works as you indicate.

It seems to be an issue with using spark and the Dataframe that is returned from
a sqlContext.createDataFrame() call - when I used display on such a dataframe, I get a
summary of the types in the dataframe, rather than the contents of the dataframe itself.

For example:

df1 = sqlContext.createDataFrame(departmentsWithEmployeesSeq1)
display(df)

returns

DataFrame[department: structid:string,name:string, employees: array<structfirstName:string,lastName:string,email:string,salary:bigint>]

(for a simple example, I’m working with…)

Any idea how to deal with this?

Thanks, rgds,
Sean.

It seems you working with Spark dataframe which is a different beast than the more common Pandas dataframe? (See here although that may be outdated now as I expect development has continued.)
I had come across some information about displaying them a while back when trying to help someone with an issue. Maybe the resources I pointed to in the top of my post here could be useful for you?