Analysis of Hadoop Data in R

trying an attempt to analyze the stock and commodities prices and see if I can spot any pattern using R functions. It is very important to have graphical representation of this patterns which help to determine the demand forces in stock market. Graphical representation of historical stocks prices which form the repeating patterns or shape, are commonly used in stock markets.

Time Series Analysis

A time series analysis which can be useful to see how security or economic variable changes over time or how it changes compared to other variables over the same time period. Intention is to analyze a time series of daily closing stock price for NSE (National Stock exchange), BSE (Bombay stock exchange) & Gold price over the period of years. we would obtain list of closing prices for each year and list them in chronological order. This would be closing price time series for specified range of year i.e from 2000 to 2013.

ARIMA Forecast

With demand for gold being on rise,and a complex set of factors influence the investment demand for gold, forecasting the price of gold is seen essential.with limited set of data that was available to me (i.e Gold daily closing price), I tried to make an attempt to forecast the price of gold in the short run through time series modeling using the daily price of gold. One of the most widely adopted methods for time series modeling auto regressive integrated moving averages(ARIMA) algorithm provided by R was used and tested to forecast prices using the daily prices for the period specified in the range of 1979 to 2013.

ETS Forecast

Similarly we use ETS function available in the forecast package to analyze NSE and BSE data with exponential smoothing method.    

For this study we will make use of R, shiny and Rhadoop functionality. For beginner who are new to Hadoop and R go through the below links to set up the environment before start the code in R

Once the configuration setup is done and if you’re feeling comfortable, continues with the code in R

The user interface is defined in a source file named ui.R:

library(shiny)

shinyUI(pageWithSidebar(
headerPanel("Stock Analysis!"),
sidebarPanel(wellPanel(
    uiOutput("from_day_dropdown"),
    uiOutput("from_month_dropdown"),
    uiOutput("from_year_dropdown"),

    uiOutput("to_day_dropdown"),
    uiOutput("to_month_dropdown"),
    uiOutput("to_year_dropdown")),
  
  
  div(div(
    selectInput(inputId = "stock_options",
                label = "Analysis Option",
                choices = c("Comparitive", "Detailed", "Combined","Time-Series","Forecasting-BSE","Forecasting-NSE","Forecasting-GOLD","Forecasting-CURRENCY","BoxPlot"), select="Comparitive")))),
  mainPanel(
    plotOutput("SelectedCombinedPlot", height="350px", width="800px"),
    plotOutput("MonthBack_SelectedCombinedPlot", height="350px", width="800px"),
    plotOutput("TwoMonthBack_SelectedCombinedPlot", height="800px", width="800px")
  )
))


The server-side of the application is shown below lets go throught it step by step for better understanding,
first few lines of code will load the R packages XTS, FORECAST to use its  time-series , ARIMA & ETS, function. RHDFS we use to read the csv file from HDFS. We make use of MERGE function to create a data frame which contain combined data for NSE, BSE,GOLD using join of date field.Our data frame is multi-dimensional, in order to accommodate data in graph it is essential to make data column proportionate,divide data column by constant integer and do little bit of tweaking with data (You can have your own way to do in code or directly make the changes in csv file). We define 5 funcitons each function will accept the parameter as date range

Function defined 

  1. Comp_graph  & detailed_graph create the dataframe for selective data range and plot the graph.
  2. Detailed_graph_timeseries create the XTS object which then will be passed to TS(time series function) with start & end range of years.
  3. Detailed_graph_Forecast_GOLD will create the TS object and then passed to ARIMA function to plot the forecast.



No posts.
No posts.