Xgboost in python and voila in notebook, runs locally

Hello everyone, thank you for any help you can offer. I was finally able to launch mybinder.org with my Demo2 application last night thanks to BIG help from fomightez. I have been adding steps from a working version on Juypter lab and all was well until I added a step for xgboost. When I launch to a web site, it doesn’t crash but it also never resolves (never is not longer than 30 min). It resolves fine locally, and the data set I am using is quite small still (not thrown all at it yet) so not really understanding why its not resolving. This morning I waited for an hour but nothing. Even if help is at a high level, any hint would be very appreciated. I left the last step (xgboost) commented out in this public project just in case someone can help. Its in under marylouwho/Demo2. My Orange man from New York made a button in the readme file to launch for ease, what a great man! Thank you in advance !! I do have xgboost called both ways (in full program not this sample that runs locally) (meaning Dmatrix and sklearn) but in this example just trying to be this method to work (sklearn).

You’ve added an incredibly long running cell to your notebook. Such a code cell is pretty much at odds with how Voila works. Voila first essentially runs all cells in a notebook to render the view and then opens that view using the output running it generated. You’ve added a cell that essentially runs on and on and on, and so it doesn’t finish running the notebook in a timely manner to generate the view for Voila and so you sit there waiting for it to finish so that it can gather the information to display.

It does finish running after quite a while when in JupyterLab. (I haven’t seen such a long running cell being involved with Voila before and I don’t know if it would finish if you did wait long enough because I don’t know if MyBinder times out without interaction of some sort earlier than happens in this case(?).)
If the goal is to make a dashboard to monitor that running then you’d have to implement it differently because it doesn’t look like it creates anything in ipywidgets that would be viewable. If you just want to get the score at the end to show then you’d do this process separately and reload the results.

It could be as simple as adding the score in as markdown:

the score of this training is :
0.906885579598705

Or you could reload the data from a CSV/TSV file or a pickled form and quickly generate something in output from it.

If you wanted to monitor the fitting as it progresses, then you’d have to implement things very differently. That can be done in theory; however, not as you’ve approached it now by just apparently pasting in code meant to be run in a console or notebook.

So far I’m not understanding why you cannot just share this notebook as a notebook others can run via MyBinder? I’m not seeing how Voila fits in here, especially with such a step involved.

1 Like

Wow you are the best. Thank you for looking into this issue too! I was able to follow your advice, I think then I will attempt the pickle suggestion. Thank you! Great walk-through. Oh how I would love to just give the notebook, I’d be so much further. No, I have folks in other countries who don’t have the setups needed/systems but they all do have a web browser and all wanted to see this three weeks ago, you know the story. I made a video when I got xgboost working with our data set and now there is a request to get it out for many to play with, I was hoping I could use Voila and show them each step with documentation but I am FAR from being able to do on the Web, what I am doing in Juypter Lab. I have attempted to export the notebook to python and basically launch that but ran into just as many issues, I wish for more time…anyways thank you for all your help and so quickly, I’m impressed and bless you!!

Maybe a previously run notebook viewed via nbviewer?
Example with yours with the fit step showing:
Click here.
That would be viewable by anyone with a browser at

https://nbviewer.org/github/marylouwho/Demo2/blob/7dc34a2cfc1d3c1d72c9dc392735297171ba2e25/Demo2.ipynb

It’s just a matter of pointing nbviewer at a URL.

I am going to share my initial concept with you, perhaps you can see a way I do not. I created a Juypter notebook that reads in a file with about 5000 lines. Each line holds ‘characteristics’ of a part, and the actual cost ($) to make the part. I take that dataset, split it, and apply about 5 different processes that show that xgboost does the best in predictions. I train the dataset with XGBoost and then prompt the operator for all the “characteristics” of a new part and let the model predict the ($) price of the part with those characteristics. I output the prediction back to the operator.
I really like the fact voila lets you interact with the notebook in real time, sad to see only way to input a value was with a slide bar, but love the concept. At this point what I can see to do is:

  1. make juypter notebook and have that available for those who can run it
  2. make a HTML static version for the browser only folks to view
  3. make a python only exe out of the code to allow for the full interaction of operator input and output
    If you can see any other avenue, I would be thankful as I have not had any courses and all self-taught.

That is definitely not the case. There’s several ways to input a value using ipywidgets.

Those all seem good ways. HTML static versions is probably moot if you can use nbviewer. That way each time they’d get the current, updated version.

What you’ve said seems like good ways to pursue. Obviously there are more complex ways to share runnable versions, such as using JupyterHub and/or Docker. And I don’t know if XGBoost is possible via WASM at this point. If it was, that would make things much easier.

And since you brought it up, one way to have your users step through a notebook as a narrative with guidance in markdown supplemented with useful images and to limit the code to a separate script, much like you address with option 3. Then you just use the notebook to set values in the namespace and then call the script (or scripts) to run in that namespace using those values, using %run -i myscript.py. The .i flag lets it run using the namespace set in the notebook. I have a fairly mature version of this type of thing here. The computational resources needed are minimal (the big thing it adds is to run the core approach on many input samples with minimal dealing with code and set-up), and so you can step through actually running it by going to here and pressing ‘launch binder’.