Wednesday, July 25, 2018

Understanding Time series part-2


Understanding time series- in Layman’s terms
How to start learning time series ?
               You will be lucky enough if you get an opportunity to learn well before you put your hands on in actual work environment, or if the search engine exactly points out where to start. But there are good contents too, which the search engine may not bring in the first page and that is where we miss.
               What is time series?
Any data which has just two columns, date and variable. Even though there may be ‘n’ dependent variable that may not be available for analysis / prediction. Eg- stock price, Gold rates, Energy savings made etc.
               Is it a new topic?
Definitely not, as old as mathematics, but the computational tools re invented them for a next level .
               Where to start?
While there are many sophisticated, dedicated, open source tools available, even excel can also help. Can start with excel and transform to next level using R and Python.  The first and foremost thing is to convert the data as series. For instance , a program is conducted by an utility to save energy for the period of 5 years. You have the data of energy savings from 2012 say Jan 1. If you expect the data as series with a frequency of month you should have atleast 12 x 5 = 60 rows [ records], if daily then[ 365 x 5]There are chances that the data may not be available. Fix it first.
               In R , there are few packages need to be installed. I shall update in the later part of the content. Initially start with ‘tseries’ and ‘forecast’,’fpp2’,’GGally’.  Assign a variable in and put the data as time series using the function ts.
Y<-ts(data,frequency=12,start=c(2012,1))
What to view and understand?
               We look for a trend- upward, downward, flat. Seasonal- varies during a particular month every year [ during Diwali gold prices increase, mangoes and summer]. Thirdly look for cyclical pattern- Weekend sales vs weekday sales.
Graphs to view and understand
               R has lot of packages and options. Start using autoplot(),ggseasonalplot(),ggsubseriesplot(),gglagplot() and ACF. You can try this and update in the comments section.
This topic will be  continued in next post.
              



Autoplot is a wonderful package which plots the time series. X axis is the  time and y axis has the values.
Plotting is as simple as
               ‘ autoplot(thevariable) ‘
We get a good plot of the variable spread over time. Next step is find the ACF – Auto correlation function.
‘acf(thevariable)
What to understand from it ?
When all the lines are below the  blue band, there is no correlation and the series has random values .
Choosing a part of the time series <- use window command.
Variable<--window(filename,start=1973,end=c(1993,11))
So what else is required ? Analysis and forecasting.
If you want to estimate whether consumption is having any relation with income over time you can use tslm(variable1~variable2, data=)
Forecasting ?
In the next update

No comments:

Post a Comment