Hi there, newbie here. Been learning Python and Jupyter over these last few covid months. Getting pretty good at it and really enjoying the learning.
Sorry if I get the terminology wrong in my question, but here goes.
So let’s say I’ve got multiple cells. I’ve done a calculation up above and have a new variable (or a new value has been assigned to a variable). MOST of the time, two cells down I can access that variable fine as long as I’ve run the upper cell at least once. But SOMETIMES it won’t let me. The only way to access that variable and run my cell is to run that upper cell every time before running the cell in question (or “run all”).
Is there some logic to this that I’m missing? I haven’t been able to discover a pattern so far.
Thanks in advance for your input. It’s just one of those quality of life things that I would love to solve!
Generally think of the notebook cells as all saving to the same global variable space in the notebook. You can actually check these by calling locals() outside a function – or globals() inside a function.
But SOMETIMES it won’t let me.
This seems odd. Either A) you deleting the global variable or B) you restarted the kernel (the thing actually running your python code) in which case the variable would no longer be available or C) You have a local variable inside the function you are using that is overloading the parent and the value is making you think it doesn’t exist (e.g. None).
You may want to create the simplest example that reproduces the issue and that will either quickly reveal the problem as you remove surrounding code snippets, or give a clearer picture about what execution patterns you are using here.
OK, good to know that it supposed to work the way I thought it was supposed to work. I guess I wanted to make sure I wasn’t missing something fundamental before I spent more time researching it.
It will definitely be my mission now to figure it out the next time it happens. I like your idea of removing code until I narrow down what is causing the problem. I will report back!
Hi, there. I finally found some time to do further research and can now report back.
Below are two simplified cells of code to demonstrate what happened to me:
[python]
df1=pd.read_csv(‘filename.csv’) #code
[/python]
[python]
df2=df1
df2[‘col_new’]=df1[‘colA’].rolling(12).sum().round(0)
df2.drop([‘colA’],axis=1, inplace=True) #code
[/python]
In this case, I could only run the second cell after running the first cell. Every time.
What I eventually found was that the .drop() method was dropping colA from BOTH dataframes. So the next time this cell was run, it would choke on the col_new assignment, 'cause it couldn’t find colA.
I don’t understand why this is happening, but my solution was to use the .copy() method which fixed this problem:
[python]
df2=df1.copy()
[/python]
Yes, that wasn’t a Jupyter issue. You would find the same thing running Python in a console/command line. It is a basic Python issue.
It is something you’ll want to understand going forward but you stumbled upon the solution already.
Your line df2=df1 is the issue here and has nothing to do with Jupyter cells. That isn’t a good practice because you aren’t copying the dataframe. You’ll encounter the same problem with other Python data types, such as a list, if you do that type of assignment. This stackoverflow answer covers it for lists, but the same concept holds for other data types, like your dataframe.
Your line df2=df1, just copies the reference to the dataframe, not the actual dataframe so that both df2 and df1 refer to the same dataframe after the assignment. So you thought your line df2.drop([‘colA’],axis=1, inplace=True) was just dropping that column in df2; however, it was also dropping it in the dataframe df1 as well. And that came as a surprise to you because you didn’t realize yet both df2 and df1 were referencing the same dataframe object. Copying lists and dataframes is the way to go when you want to maintain the original but do operations on another.