im using jupyter notebook from anaconda on macos to forecast data. So far I’ve got this code to try and plot my data but the plot wont show:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
#NOTE:if 3 data cells, 0-3. If 928 data cells
for a in range(552):
df=pd.read_excel('DataLSTMReady.xlsx')
df.date=pd.to_datetime(df.Date)
df=df.set_index('Date')
df.head()
plt.plot(x="Date",y="107057")
plt.show()
The excel data is contained as Date in Column 1 and each subsequent column is a product sales timeseries data, the first of which is called 107057, which are the product codes; like this:
If your other plots work in Jupyter, then it is a Python error. Not particular to Jupyter.
That being said …
What is the purpose of the following line in your code?
for a in range(552):
You don’t use a in your posted code.
In the bottom code block you seem to make a dataframe and then don’t use it in your plot? Find an example using a data frame to assign x and y. Seaborn is a nice package built on top of matplotlib for that which makes it easy to work with Pandas dataframes.
Good advice is to start with a simple example that works to make a plot in Jupyter and then build on that. The code you’ve shared is not a self-contained minimal, reproducible example.
For now what I would like is to be able to plot the data from the Date column and the other column mentioned. What I was hoping was to use iloc to select a column and plot the columns one by one as the need arose.
Hard to tell for sure without seeing what you get from df.columns, but I think your are mixing attribute notation (a.k.a ‘dot notation’) and bracket notation in your Pandas dataframe calls and use of invalid Python identifiers.
Example, from the cell I get a dataframe similar to yours:
Look at what it shows when you run df.columns. See how the 3 column is an integer, and not a text string. Note that you can use display(df.num_legs) to display the first column. That won’t work for the second column. Nor will something akin to your mix attempt of display(df.'3').
The reason is that the column name 3 isn’t a valid Python identifier, meaning you cannot use a number as a variable in Python, or in fact, even the following is invalid Python for a related reason that you cannot use numbers as the start of a variable identifier either:
2bfare = 55
So you want to use the ‘bracket notation’ to call your column. In my example, this will work for the two columns to use ‘bracket notation’:
display(df['num_legs'])
display(df[3])
In your case, I predict display(df[107057]) might work.
Additionally, because you aren’t showing the rest of your code or showing an actual minimal reproducible example or referencing your source material, I’m not sure why you are using pyplot.plot()? Can you point to code that suggested that? Earlier you had import matplotlib.pyplot as plt.
Combining using something more standard for plotting with my assigned dataframe example code from above, the code below yields a plot:
# based on https://matplotlib.org/stable/tutorials/introductory/usage.html
import matplotlib.pyplot as plt
plt.plot([5,5], df[3])
Or this yields a nicer looking one with less effort:
%pip install seaborn
# or based on https://seaborn.pydata.org/generated/seaborn.lineplot.html?highlight=lineplot#seaborn.lineplot
import seaborn as sns
sns.set(style="darkgrid")
sns.lineplot(x=5, y=df[3],estimator=None, linewidth = 1.5) # see https://stackoverflow.com/a/71611795/8508004 for why added `estimator=None,`
And thus you might try something like below, where I have df[3]:
ok i got it working, it was a syntax error. I also started anew because my code was from an old notebook. See, I have this excel file where I have many product unit sales per column. And I was trying to forecast the sales by product using LSTM because it sounded fancy to me and also because I have more than 500 products which may or may not be interrelated.
Started about a couple of years ago and then forgot about it. When I tried taking it back up I ran into some Python 2.7 versus Python 3.x errors which I am just sorting out. It did not help that my notebook used syntax that I believe has been deprecated not to mention the fact that I migrated to Python 3.x on my Mac which is also partly responsible for the errors.
So I decided to start this new notebook and realized that I also needed to explore the seasonality of my individual time series and I am at the point now where I would like to create seasonal plots which I have seen videos for using our studio but I am not familiar with the notation in Python.
If I pick the first column using this code, I can see my first data series plotted and some simple trend and seasonality plots below. Ill post this image in a subsequent reply because Im not allowed to post more than 1 image per post.
So right now Im at the point where I want to learn:
How to make seasonal plots or subplots as in r-studio. I expect to explore the seasonality of all the timeseries. I found this resource which sorta helps. Im thinking this will help me find patterns i havent yet seen. I already know we have a seasonal pattern where Q2 and Q3 have peaks.
Find correlation between the different products if it exists. Add other data to see if I can find correlation between it. I tried adding a column for inflation but I could only find anual inflation for my country so the data for the monthly is the same for all 12 months of a given year which i dont think is very useful. Im still trying to find monthly inflation data. I was able to get the markup for that particular product from our erp ( sales price-cost price ) but i would think there will be obvious correlation there. Im wondering if i should look for correlation between a product and other product unit sales. I mean i want to try to add variables that help forecast each individual product but not if it doesnt make sense to do so.
Return to the LSTM forecast but I need to understand more about it before I do this. Because in the end what I would like is to forecast, at least my top 75 steel products. I dont need to forecast all 552 products. Thats not even all steel products, there are 1100 steel products in general but about half are sold almost every year whereas the other half havent moved in quite a while and have little inventory. So im focusing on the 552 with highest rotation and of those 552, top 75 are my prime candidates.
import pandas as pd
from pandas import Series
%matplotlib inline
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.preprocessing import RobustScaler
#nifty = Series.from_csv('ManualStepTimeSeries.csv',header=0)
# Set 1 of commands to explore dataseries
#nifty.head()
#nifty.size
#nifty.describe()
# Set 2 of commands to plot series
# from matplotlib import pyplot
# pyplot.plot(nifty)
# pyplot.show()
# Set 3 of commands to plot 1 column from multiple series on excel file
df=pd.read_excel('DataLSTMReady.xlsx')
df=df.set_index('Date')
df=df.iloc[:,[0]]
print(df)
df['107057'].plot()
# THIS IS FOR SOMETHING I FORGOT
#xcoords = ['2016-01-01','2017-01-01', '2018-01-01', '2019-01-01', '2020-01-01','2021-01-01']
# for xc in xcoords:
# pyplot.axvline(x=xc, color='black', linestyle='--')
#pyplot.show()
# Set 4 of commands - look for skus with stationality - WORKS
# add seasonal to imports
analysis = df.copy()
# when i use inplace=True in set_index I get "AttributeError: 'NoneType' object has no attribute 'iloc'"
decompose_result_mult = seasonal_decompose(analysis, model="additive")
# multiplicative gives inappropriate error...
trend = decompose_result_mult.trend
seasonal = decompose_result_mult.seasonal
residual = decompose_result_mult.resid
decompose_result_mult.plot();
# seasonal plots - MUST LEARN HOW TO
# plots grouping hours and days(holidays) separated by hue - not very useful, perhaps by client?
# add overall margins to data series?????
# ALL THIS STUFF BELOW WAS A SECOND ATTEMPT AT LSTM BUT HAVENT FINISHED TUTORIAL
# SET 5 of commands - forecast using LSTM & RobustScaler & create_datasets
# train_size=int(len(df)*0.9)
# test_size=len(df)-train_size
# train, test = df.iloc[0:train_size],df.iloc[train_size:len(df)]
# print(train.shape, test.shape)
# preprocess data by scaling so import sklearn scaler - didnt use this yet
#f_columns=['columns to be used']
#f_transformer=RobustScaler()
#cnt_transformer=RobustScaler() * only transform train data, not test data
#f_transformer=f_transformer.fit(train[f_columns].to_numpy())
#cnt_transformer=cnt_transformer.fit(train[['cnt']])
#train.loc[:,f_columns]=f_transformer.transform(train[f_columns].to_numpy())
#train['cnt']=cnt_transformer.transform(train[['cnt']])
#test.loc[:,f_columns]=f_transformer.transform(test[f_columns].to_numpy())
#test['cnt']=cnt_transformer.transform(test[['cnt']])
# cut data into datasets
# def create_dataset(X)
# stopped following this tutorial https://youtu.be/xaIA83x5Icg and looked for seasonal plot analysis
Ok right now im following this tutorial and the issue im having is that I cant get the boxplot function correct. Here is the code (ive eliminated the comment code that i use for my reference):
import pandas as pd
from pandas import Series
%matplotlib inline
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.preprocessing import RobustScaler
# Seasonal exploration visualizing ts
import matplotlib.pyplot as plt
import seaborn as sns
# Set 3 of commands to plot 1 column from multiple series on excel file
df=pd.read_excel('DataLSTMReady.xlsx')
df=df.set_index('Date')
df=df.iloc[:,[0]]
#print(df)
# IDEAS TO INCORPORATE
# 1. seasonal plots - MUST LEARN HOW TO
# imported matplotlib as plt & seaborn
# i think i need to create a month column
df['Month']=df.index.month
print(df)
sns.set(rc={'figure.figsize':(11, 4)})
df['107057'].plot(linewidth=0.5);
# boxplots
axes = df.plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)
fig, axes = plt.subplots(3,1, figsize=(11,10), sharex=True)
sns.boxplot(data=df, x='Month', y='107057', ax=ax)
ax.set_ylabel('Lps')
ax.set_title('Unit Sales')
For all anyone knows from what you’ve provided, you clobbered boxplot or seaborn with a numpy array in the code that you don’t show.
If examples of box plots in the tutorial a nd at Seaborn work then you just need to troubleshoot. As for the tutorial itself, that is from 2019 and Seaborn has evolved. You are better off using the examples at the documentation, to try and get the tutorial working and then applying what you learn to your own code.
As stated in my first reply, this forum is targeted for Jupyter issues. So far you’ve just been posting Python syntax errors that aren’t on topic for this forum.
And for future help when you post to forums seeking help, you want to actually include code and data in text form so that others can use it. A screenshot of a lot of numbers isn’t helping the people who you want to help you. Specific example:
That isn’t code and that data isn’t useable.
The title of the Prophet article you link to ‘Time Series Analysis with Jupyter Notebooks and Socrata’ should be ’ Time Series Analysis with Python using Jupyter Notebooks and the Socrata API’. A lot of languages can be run in Jupyter notebooks and it confuses learners starting out who think they are having issues with Jupyter when they are having issues with Python, and often more specifically packages such as Pandas or Matplotlib. Later the author, Robert Voyer, somewhat acknowledges this with:
“We will start with a dataset downloaded using the Socrata API and loaded into a data frame in a Python Jupyter notebook.”
More clear would be ‘Python-backed’ there or something similar.
I understand and Im sorry because Ive been trying to figure it out on my own and i make changes without remembering and post incoherent stuff. Ok so like you said, im looking at the seaborn documentation because Im at the point where I want to do the boxplots so Im at this url:
Ive got it working. I have follow up questions like how do i get the month displayed at the bottom of the seaborn chart instead of the number of the month or how do i make that y= dynamic on the sns.boxplot line. Should I create a new post for that?
If you already had the Months as words, you may be able to put them back by removing that line. And then add something like here to define the order.
list_for_order = ["January, ", <etc in order like you want but text matching monts in yout dataframe>]
ax=sns.boxplot(x='Month', y='107057', order=list_for_orderdata=df)
But you may need to keep df['Month']=df.index.month and then after it put the following based on here:
Or use Pandas’ month_name() function to do that conversion, like here.
Combine that with the use of order= in your boxplot call like I put above to control the order of the months.
Because you aren’t providing, a minimal reproducible example, that I could run, I cannot tell you exactly what will work.
Your latest question should be another post somewhere. Not on this forum. That question is off-topic for this forum and does not belong here. Please, only post questions pertinent to Jupyter here, not general Python/Pandas questions.
Especially about ‘creating a minimal, complete, and verifiable example’. Those trying to help you should be able to paste in code and data (possibly toy data similar to yours and provided as text and not a screenshot) you provide and get what you get. Go to another computer now and try just using what you provided here as text to run your code and produce what you show in say 3 minutes. I suspect you won’t because you never provided a minimal reproducible example. You’ll want to do that when you post your additional questions elsehwere.