Understanding time series- in Layman’s terms
How to start learning time series
?
You
will be lucky enough if you get an opportunity to learn well before you put
your hands on in actual work environment, or if the search engine exactly
points out where to start. But there are good contents too, which the search
engine may not bring in the first page and that is where we miss.
What
is time series?
Any data which has just two
columns, date and variable. Even though there may be ‘n’ dependent variable
that may not be available for analysis / prediction. Eg- stock price, Gold
rates, Energy savings made etc.
Is
it a new topic?
Definitely not, as old as
mathematics, but the computational tools re invented them for a next level .
Where
to start?
While there are many
sophisticated, dedicated, open source tools available, even excel can also help.
Can start with excel and transform to next level using R and Python. The first and foremost thing is to convert
the data as series. For instance , a program is conducted by an utility to save
energy for the period of 5 years. You have the data of energy savings from 2012
say Jan 1. If you expect the data as series with a frequency of month you
should have atleast 12 x 5 = 60 rows [ records], if daily then[ 365 x 5]There
are chances that the data may not be available. Fix it first.
In
R , there are few packages need to be installed. I shall update in the later
part of the content. Initially start with ‘tseries’ and ‘forecast’,’fpp2’,’GGally’. Assign a variable in and put the data as time
series using the function ts.
Y<-ts(data,frequency=12,start=c(2012,1))
What to view and understand?
We
look for a trend- upward, downward, flat. Seasonal- varies during a particular
month every year [ during Diwali gold prices increase, mangoes and summer].
Thirdly look for cyclical pattern- Weekend sales vs weekday sales.
Graphs to view and understand
R has
lot of packages and options. Start using autoplot(),ggseasonalplot(),ggsubseriesplot(),gglagplot()
and ACF. You can try this and update in the comments section.
This topic will be
continued in next post.
Autoplot is a wonderful package
which plots the time series. X axis is the
time and y axis has the values.
Plotting is as simple as
‘
autoplot(thevariable) ‘
We get a good plot of the variable
spread over time. Next step is find the ACF – Auto correlation function.
‘acf(thevariable)
What to understand from it ?
When all the lines are below
the blue band, there is no correlation
and the series has random values .
Choosing a part of the time
series <- use window command.
Variable<--window(filename,start=1973,end=c(1993,11))
So what else is required ?
Analysis and forecasting.
If you want to estimate whether consumption
is having any relation with income over time you can use tslm(variable1~variable2,
data=)
Forecasting ?
In the next update