Reading web based url data into data frame using Jupyter Notebook

I am a newbie to python and jupyter notebooks and I have problem which I currently cannot solve.

It seems I am not able to load web based url data into a jupyter notebook data frame. I have no problem with local csv files.

I believe I have the correct syntax loading pandas as pd and numpy as np. When I take following the actions in the example below I get a long error message and I have no clue what it means. Similar errors with other websites.I have also installed the following packages- 1xml, htmllib, BeautifulSoup4, to no avail.

import pandas as pd

import numpy as nd

url= “https://en.wikipedia.org/wiki/Wikipedia:Fundraising_statistics”

tables= pd.read_html (url)

Any suggestions will be appreciated!

This isn’t pertinent to this forum. You are using Python in a Jupyter notebook and looking for it to work. Probably you’d get the same behavior if you ran this code as a Python script or in the Python console/interpreter and so it isn’t a Jupyter issue. This is a good test to consider, at least as a thought experiment if not an actual trial, when trying to decide where to seek support.
Additionally, how you’ve posted it isn’t helping people help you. You’d need to share the error because as you can see below it isn’t reproducible. (In fact, sharing what you see as text is very important 99.999999% of the time on forums such as these. You should use here and here as guides to posting in such forums.)

That being said, a lot of people here know a bit of Python…

As I explained above. I cannot discern what you are experiencing because you didn’t share your error. As you can see below, it runs just fine once you fix the quotes (see more about that below this screenshot),

(The code block I ran is below as code text since normally screenshots are highly discouraged. In this case there wasn’t much else going and and so I refrained form posting it is a gist I could have shared with you for rendering in nbviewer, sort of like this, which was illustrating some grander features, see here.)

You can experience it working for comparing to your own experience so that you can troubleshoot on your own further. Go here and put https://github.com/binder-examples/requirements in the top line of the form where it says ‘GitHub repository name or URL’ and then hit the orange launch button on the right. When the temporary, remote Jupyter session comes up you can install lxml via %pip install lxml and then restart the kernel and try to run the code I pasted below.

Other tips:
Please provide code here formatted as code blocks and not text, see here, with special attention to ‘Block code formatting’.

Preferably, always re-run your shared code, too. It had weird characters for the quotes according to where I ran it. Here is what I ran in the screen shot above:

import pandas as pd
import numpy as nd
url= "https://en.wikipedia.org/wiki/Wikipedia:Fundraising_statistics"
tables= pd.read_html(url)
1 Like

Thank you much for taking the time to respond and your insights. Hopefully, over time I will crawl my way through in the learning process.

I pointed you to a place it works via a MyBinder session. I’d start there. Just save anything useful you make immediately back to your local system because the remote sessions are temporary. Having a place where things work can save you from trying to learn what is going on, especially if it is a network issue out of your control. Hopefully you are using a full Python kernel, too. Where I sent you, you’ll have that, too.

Thank you. I am using a full python kernel installed via Anaconda. I’ve read in another thread that perhaps I have an SSL issue and it was suggested that I install openssl?

installing the openssl package did the trick!

1 Like