Hello,
I have created a script running on a notebook on Jupyterlab.
For some reason the script stop but there is no error message I am just seeing the status of the kermel as “unknown”.
Any idea why this is happening and how to solve it?
Hello,
I have created a script running on a notebook on Jupyterlab.
For some reason the script stop but there is no error message I am just seeing the status of the kermel as “unknown”.
Any idea why this is happening and how to solve it?
try out the same code through another interface, e.g. the IPython shell. Once the program crashes it might provide you with some message you can use for further debugging. Often it is related with running the wrong binaries (numpy, matplotlib, …) and not with native Python code.
Thanks 1katsner.
Will try to run on another interface.
It is my first Python script so will have to go through all the process of installing it ;-(
Hi Emmanuel,
Chances are if you have JupyterLab working, you already have Python installed there. However, if you don’t know that to be the case maybe understanding things better on a standardized environment on a remote machine will give you an idea how to try on your on machine. And so read on to try to run your script on a remote machine…
As long has it isn’t anything that requires extreme security and isn’t overly demanding computationally, you can launch Jupyter from MyBinder.org and run Python directly there.
Go here and click on the launch binder
badge there. A temporary machine will spin up for you. When the Jupyter interface opens, upload your script and an necessary input files using the Upload
button the upper right. Then open a terminal
under the New
dropdown menu next to the Upload
button. In the terminal that comes up, type python
then a space and the name of the script you uploaded. It should try to run and you’ll then be doing what @1kastner suggested without needing to install anything. (This is the traditional way to run a script.) Keep in mind the remote machine is temporary and so any changes you make, make sure you save to your local machine as they will be lost when the remote, temporary machine dies after 10 minutes of inactivity.
There’s a caveat here. You’ll need to also install any dependencies that you need. This is going to sound odd because you are trying to avoid the notebook interface, but I just tested and it works. I am going to advise you to open in your session an new Python 3 notebook and type in %pip install
followed by a space and then the needed package (or %conda install
followed by space and then the needed package) to install any required packages. You’ll know you have them all when you try to run your script as I described above and you don’t get any more errors about modules not being present. Also, once things seem all installed you may want to try running your script from inside the notebook with %run
and a space followed by the name of your script. You may see different behavior on this standardized environment than on your own machine anyway. But trying Python direct as suggested by @1kastner & described above should give you the most information if all was really right with your original local Jupyter in the first place.
Once you have things working (or trying to work) on the remote machine, seeing what was need to do that may give you idea to troubleshoot more of running your script on your local machine.
Thanks I have tried to run the script in spyder and I have the same issue eventhough I am able to run more loops within in the script.
The script is supposed to scrap data on about 4.000 URL with the notebook on Jupyterlab I was able to scrape about 400 URL in one go with spyder I have had more than 1.000. But it still doesn’t do them all in one go.
I wonder if it is not due to the way I am reading my URL.
Here is the lines of the script:
import requests
from bs4 import BeautifulSoup
from PIL import Image
import os.path
import re
import csv
import pandas
contents = []
with open('List_URL.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents:
page = (url[0])
response = requests.get(page)
Page_content = response.content
soup = BeautifulSoup(Page_content,'html.parser')
=> get all the data I need
Your code is essentially unreadable because it has lost the indentation. Please see the ‘Block code formatting’ and ‘Automatic Code Styling’ sections here to better understand how you post code on discourse. Or have it posted somewhere where you know how to post & share code, such as Github’s Gist or other code snippet site, and link to it.
Also, this is clearly not a Jupyter problem if you get essentially the same result running it without JupyterLab involved. Please seek help in places more appropriate down the road. It sounds like a memory problem since you get farther in the case of having less overhead without Jupyter. If that is the case, then refactoring your code to have less in memory at any one time might be one way to adjust it. Another option would be to split up your sets and combine results later. Or you can try running it somewhere with more memory.
Thanks for the reply Wayne,
I have followed the instruction and reformatted the code above so it would be clearer for all personns interested.
I think you are right it looks like a memory issue.
Do you see where I could save memory within the query ?
My guess is that it could come from the loop below. Are you seeing any memory issue with it?
for url in contents:
page = (url[0])
Sorry, I cannot really run it without some fraction of the urls in List_URL.csv
.
One thing that jumps out at me though is that this part seems unnecessary:
for url in urls:
contents.append(url) # Add each url to list contents
for url in contents:
Why not something more along the lines of this below so that you’d save memory by not making the large list contents
when you already have your csv.reader
object that will iterate over lines of the csv file?
with open('List_URL.csv','r') as csvf: # Open file in read mode
urlsreader = csv.reader(csvf)
for url in urlsreader:
page = (url[0])
response = requests.get(page)
Page_content = response.content
soup = BeautifulSoup(Page_content,'html.parser')
You may need to edit part of that because I don’t know if your lines are directly useable but suspect they are?
Thanks Wayne it is working perfectly now.
I have had to adapt a bit your instruction but I have merged the 2 loops together.
The csv file I am using is pretty standard it is basically just the list of URL for which I need to collect data
import requests
from bs4 import BeautifulSoup
from PIL import Image
import os.path
import re
import csv
import itertools
# Definition URL et get the data from this URL
for i in range (5000):
with open('List_URL.csv','r') as csvf: # Open file in read mode
page = next(csv.reader(itertools.islice(csvf, i, i+1)))
page= re.sub(r"\'", "",str(page))
page= re.sub(r"\[", "",str(page))
page= re.sub(r"\]", "",str(page))
response = requests.get(page)
Page_content = response.content
soup = BeautifulSoup(Page_content,'html.parser')
'follwed by scrapping instructions'