Hadoop Data Analysis in R
Analysis of Hadoop Data in R
trying an attempt to analyze
the stock and commodities prices and see if I can spot any pattern using R functions. It is very important to have graphical representation of this patterns
which help to determine the demand forces in stock market.
Graphical representation of historical stocks prices which form the repeating
patterns or shape, are commonly used in stock markets.
Time Series Analysis
A time series analysis which can be
useful to see how security or economic variable changes over time or how it changes
compared to other variables over the same time period. Intention is to
analyze a time series of daily closing stock price for NSE (National
Stock exchange), BSE (Bombay stock exchange) & Gold price
over the period of years. we would obtain list of closing
prices for each year and list them in chronological
order. This would be closing price time series for specified range of year i.e
from 2000 to 2013.
ARIMA Forecast
With demand for gold being on rise,and a complex set of factors influence
the investment demand for gold, forecasting the price of gold is seen essential.with
limited set of data that was available to me (i.e Gold daily closing price), I tried
to make an attempt to forecast the price of gold in the short run through time
series modeling using the daily price of gold. One of the most widely adopted methods for time series modeling auto regressive integrated moving averages(ARIMA) algorithm provided by R was used and tested to forecast prices
using the daily prices for the period specified in the range of 1979 to 2013.
ETS Forecast
Similarly we use ETS function available in the forecast package to analyze NSE and
BSE data with exponential smoothing method.
For
this study we will make use of R, shiny and Rhadoop functionality. For beginner
who are new to Hadoop and R go through the below links to set up the
environment before start the code in R
Once the configuration setup is done and
if you’re feeling comfortable, continues with the code in R
The user interface is defined in a source file named ui.R:
library(shiny)
shinyUI(pageWithSidebar(
headerPanel("Stock Analysis!"),
sidebarPanel(wellPanel(
uiOutput("from_day_dropdown"),
uiOutput("from_month_dropdown"),
uiOutput("from_year_dropdown"),
uiOutput("to_day_dropdown"),
uiOutput("to_month_dropdown"),
uiOutput("to_year_dropdown")),
div(div(
selectInput(inputId = "stock_options",
label = "Analysis Option",
choices = c("Comparitive", "Detailed", "Combined","Time-Series","Forecasting-BSE","Forecasting-NSE","Forecasting-GOLD","Forecasting-CURRENCY","BoxPlot"), select="Comparitive")))),
mainPanel(
plotOutput("SelectedCombinedPlot", height="350px", width="800px"),
plotOutput("MonthBack_SelectedCombinedPlot", height="350px", width="800px"),
plotOutput("TwoMonthBack_SelectedCombinedPlot", height="800px", width="800px")
)
))
The server-side of the application is shown
below lets go throught it step by step for better understanding,
first
few lines of code will load the R packages XTS, FORECAST to use its time-series , ARIMA & ETS, function. RHDFS
we use to read the csv file from HDFS. We make use of MERGE function to create
a data frame which contain combined data for NSE, BSE,GOLD using
join of date field.Our data frame is multi-dimensional, in order to accommodate data in graph it
is essential to make data column proportionate,divide data column by constant integer and do little bit of tweaking with data (You can have your own
way to do in code or directly make the changes in csv file). We define 5
funcitons each function will accept the parameter as date range
Function defined
- Comp_graph & detailed_graph create the dataframe for
selective data range and plot the graph.
- Detailed_graph_timeseries create the
XTS object which then will be passed to TS(time series function) with start
& end range of years.
- Detailed_graph_Forecast_GOLD will create the TS object and then passed to
ARIMA function to plot the forecast.
The user interface is defined in a source file named ui.R:
library(shiny)
shinyUI(pageWithSidebar(
headerPanel("Stock Analysis!"),
sidebarPanel(wellPanel(
uiOutput("from_day_dropdown"),
uiOutput("from_month_dropdown"),
uiOutput("from_year_dropdown"),
uiOutput("to_day_dropdown"),
uiOutput("to_month_dropdown"),
uiOutput("to_year_dropdown")),
div(div(
selectInput(inputId = "stock_options",
label = "Analysis Option",
choices = c("Comparitive", "Detailed", "Combined","Time-Series","Forecasting-BSE","Forecasting-NSE","Forecasting-GOLD","Forecasting-CURRENCY","BoxPlot"), select="Comparitive")))),
mainPanel(
plotOutput("SelectedCombinedPlot", height="350px", width="800px"),
plotOutput("MonthBack_SelectedCombinedPlot", height="350px", width="800px"),
plotOutput("TwoMonthBack_SelectedCombinedPlot", height="800px", width="800px")
)
))
The server-side of the application is shown
below lets go throught it step by step for better understanding,
first
few lines of code will load the R packages XTS, FORECAST to use its time-series , ARIMA & ETS, function. RHDFS
we use to read the csv file from HDFS. We make use of MERGE function to create
a data frame which contain combined data for NSE, BSE,GOLD using
join of date field.Our data frame is multi-dimensional, in order to accommodate data in graph it
is essential to make data column proportionate,divide data column by constant integer and do little bit of tweaking with data (You can have your own
way to do in code or directly make the changes in csv file). We define 5
funcitons each function will accept the parameter as date range
Function defined
- Comp_graph & detailed_graph create the dataframe for selective data range and plot the graph.
- Detailed_graph_timeseries create the XTS object which then will be passed to TS(time series function) with start & end range of years.
- Detailed_graph_Forecast_GOLD will create the TS object and then passed to ARIMA function to plot the forecast.