IBM Accelerates Open Database-as-a-Service on IBM Power Systems

IBM (NYSE: IBM) announced a new Database-as-a-Service (DBaaS) toolkit on Power Systems optimized for open source databases, including MongoDB, EDB PostgreSQL, MySQL, MariaDB, Redis, Neo4j, and Apache Cassandra to help deliver more speed, control, and efficiency for enterprise developers and IT departments.

from insideBIGDATA http://insidebigdata.com/2017/05/28/ibm-accelerates-open-database-service-ibm-power-systems/

TOP 10 insideBIGDATA Articles for April 2017

In this continuing regular feature, we give all our valued readers a monthly heads-up for the top 10 most viewed articles appearing on insideBIGDATA. We understand that busy big data professionals can’t check the site everyday.

from insideBIGDATA http://insidebigdata.com/2017/05/28/top-10-insidebigdata-articles-april-2017/

Cloudera Launches Altus to Simplify Big Data Workloads in the Cloud

Cloudera, Inc, (NYSE:CLDR) the provider of a leading modern platform for machine learning and advanced analytics, announced the release of Cloudera Altus, a Platform-as-a-Service (PaaS) offering that makes it easier to run large-scale data processing applications on public cloud.

from insideBIGDATA http://insidebigdata.com/2017/05/27/cloudera-launches-altus-simplify-big-data-workloads-cloud/

How Telcos Can Balance Consumer Privacy and Targeted Ads

In this special guest feature, Charlie Thomas, SVP Insights and Analytics, at Synchronoss, points out there’s big money in big data for telecommunications companies. Mobile service providers have a host of consumer-based and behavioral information on their clients, and the market to turn that targetable data into dollars is at its highest point ever.

from insideBIGDATA http://insidebigdata.com/2017/05/27/telcos-can-balance-consumer-privacy-targeted-ads/

Gain a Competitive Advantage With Machine Learning

Machine learning may seem like something out of a science fiction movie, but it is quickly becoming a driving force for many new products and businesses. There are a number of ways to apply machine learning to nearly any business, and doing so can create a competitive advantage. In fact, 48% of companies are planning […]

from insideBIGDATA http://insidebigdata.com/2017/05/27/gain-competitive-advantage-machine-learning/

TWC9: New Surface Pro, Worlds Largest Git Repo, P Language, Exiting Vim and more… | This Week On Channel 9

This week on Channel 9, Nikola and Vlad discuss the week’s top developer news, including;

Pick of the Week!

Please leave a comment or email us at twc9@microsoft.com.

 


from Channel 9 https://channel9.msdn.com/Shows/This+Week+On+Channel+9/TWC9-New-Surface-Pro-Worlds-Largest-Git-Repo-P-Language-Exiting-Vim-and-more

Parallel Training of Multiple Foreign Exchange Return Models

In a variety of machine learning applications, there are often requirements for training multiple models. For example, in the internet of things (IoT) industry, a unique model needs to be built for each household with installed sensors that measure temperature, light or power consumption. Another example can be found in the online advertising industry. To serve personalized online advertisements or recommendations, a huge number of individualized models has to be built and maintained because each online user has a unique browsing history. Moreover, such a model has to be updated in a frequent manner to capture the change of consumer behavior. When the number of models goes high, and even the algorithm is carefully designed and proved to be solid, it could be a challenge to implement in production. Especially for time sensitive applications, multiple model training cannot afford the extra delay caused by iterations through a huge number of models. A good example is the financial industry. In this article, we will show an example of fitting multiple foreign exchange (FX) rate models and demonstrate how we can leverage the powerful parallel computation capability provided by Oracle R Enterprise (ORE). 

FX Rate Data

The FX rate data can be obtained from Federal Reserve Economic Data . Instead of going online to fetch the data manually, a library in R called ‘quantmod’ provides a convenient way of downloading the data. Here is the code used for this purpose.

library(quantmod)
symbol = "DEXCAUS"
getSymbols(symbol,src="FRED")

The symbol “DEXCAUS” means the FX rate of Canadian dollar to US dollar. In this example, we downloaded foreign exchanges rates for 22 currencies and focused on the time range from 1999 to 2015. 

rm(list=ls())
symbols <- c( "DEXBZUS", "DEXCAUS", "DEXCHUS", "DEXDNUS", "DEXHKUS", "DEXINUS", "DEXJPUS", "DEXKOUS", "DEXMXUS", "DEXNOUS", "DEXSDUS", "DEXSFUS", "DEXSIUS", "DEXSLUS", "DEXSZUS", "DEXTAUS", "DEXTHUS", "DEXUSAL", "DEXUSEU", "DEXUSNZ", "DEXUSUK", "DEXVZUS")
for( symbol in symbols){ getSymbols(symbol, src="FRED") }
mergecode<-paste("merge(", paste(symbols,collapse=","),")", collapse="")
merged.df<-eval(parse(text= mergecode))
fxrates.df.raw <- data.frame(date=index(merged.df), coredata(merged.df))
fxrates.df.raw <- fxrates.df.raw[fxrates.df.raw$date>'1999-01-04',] 

Non-stationarity

Let us take a first look into the FX rate data. We plot the FX rate of the Canadian dollar to the US dollar: 

ggplot(fxrates.df.raw[fxrates.df.raw$date>'2015-01-01',], aes(date, DEXCAUS)) + geom_line()+ labs(x = "day", title = "CA dollar ~ US dollar FX Rate") + theme(plot.title = element_text(hjust = 0.5)) 
https://blogs.oracle.com/R/resource/fx.jpg 

At a first glance, the series does not look to be stationary. To confirm it, we can run an Augmented Dickey–Fuller test to check if it has unit roots, which means that the series can have F(t) = ρF(t-1) + a(t) with ρ = 1. We can use the R library fUnitRoots to do the test. The null hypothesis is that the unit root exists. 

library(fUnitRoots)
adfTest(fxrates.df.raw$DEXCAUS)
The result shows as follows:
Title: Augmented Dickey-Fuller Test
Test Results: PARAMETER: Lag Order: 1 STATISTIC: Dickey-Fuller: -0.8459 P VALUE: 0.3467
 

Since p >> 0.05, we cannot reject the null hypothesis. This suggests that there is unit root in this series and thus it is confirmed that the time series is non-stationary. 

FX Rate Prediction 

Foreign exchange rate series are known to be difficult to predict. For a time, the predictability is questioned since it seems untied to several economic fundamentals link. Thus, a random walk model is often used as a benchmark. In this article, we will implement a random walk model for demonstration purposes. 
A random walk model is formulated as 

F(t) = F(t-1) + a(t), 

where a(t) is the zero mean random noise. 
In R, we can use the following function to fit a random walk model. 

arima(data, c(0,1,0))

This basically means that we remove both the MA and AR parts and only retain the integral part, which is exactly the random walk model. 

The prediction is often backtested in a moving window fashion. For each time step t, the model is trained using data over [t-L-1, t-1], which is a window with length L. The prediction result is then evaluated by out of sample (OOS) data. Then, we move the window forward for every t and calculate the out of sample error. Here, we only use one sample as OOS data, which means that we use a window of historical data to predict the next day’s FX rate. 

There are many ways to evaluate the result of backtesting. Here, we adopted the R squared as a measure of the goodness of fit. The closer the R squared is towards 1, the more accurate the prediction will be. 
Combining all the ingradients, we now can write a function in R for making the predictions for one currency

pred.fxrate <- function (data.fxrate) {
  data.fxrate <- data.fxrate[order(data.fxrate$date),]
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1] }
  R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)/sum((mean(data.fxrate$rate[(L+1):N], na.rm =TRUE) – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)
  pred.df <- as.data.frame(data.fxrate$date[(L+1):N])
  pred.df$pred <- pred names(pred.df) <- c("date", "pred")
  plot(data.fxrate$date, data.fxrate$rate, type = "l")
  lines(pred.df$date, pred.df$pred, col="red")} 

Note that the lines that compute the R squared has the option na.rm = TRUE on. It is because the data contains null values. 

We can test the function using CA dollar using the data from 2014 to 2017. The R squared is 0.97. Seems that we have a decent model! 

https://blogs.oracle.com/R/resource/dexcaus.jpg 
Parallel Prediction

As mentioned at the beginning, there are quite a few currencies and we probably do not want to loop through them. A solution is use the “group apply” capability provided by Oracle R Enterprise (ORE). That allows us to store the data as a table in Oracle Database (in many cases, it is the original data location), then run the function we wrote above in parallel for each currency. 

First, we need to merge all the FX data together and change the schema as follows.

fxrates.df <- data.frame(date=character(),
                         country=character(),
                         rate=double())
date.col <- fxrates.df.raw$date
symbols <- names(fxrates.df.raw)[-1]
n.row <- length(date.col)
for( symbol in symbols){
  symbol.data <- as.data.frame(date.col)
  symbol.data$country <- rep(symbol, n.row)
  symbol.data$return <- fxrates.df.raw[,symbol]
  fxrates.df <- rbind(fxrates.df, symbol.data)
}
names(fxrates.df) <- c("date", "country", "rate")
fxrates.df <- fxrates.df[fxrates.df$date > '2014-01-01', ]
fxrates.df <- fxrates.df[order(fxrates.df$date),]

The data frame we obtained looks like:

date country rate
2014-01-02 DEXCAUS 1.0634
2014-01-03 DEXCAUS 1.0612
2014-01-06 DEXCAUS 1.0658
2014-01-07 DEXCAUS 1.0742
2014-01-08 DEXCAUS 1.0802
2014-01-09 DEXCAUS 1.0850

Then, we create the table in Oracle Database with ORE.

ore.drop(table="FX_RATE") # to remove the table if it already exists
ore.create(fxrates.df, table="FX_RATE")

After the table is created, we call the ore.groupApply function on the column ‘country’. That will run the function pred.fxrate on the FX rate of each currency, using at most four parallel executing R engines spawned by Oracle Database. 

res <- ore.groupApply(FX_RATE,
                      FX_RATE$country,
                      pred.fxrate,
                      ore.connect=TRUE,
                      parallel = 4) 

Another way to store the result is creating an object in the ORE R Datastore. For instance, we can add the following code into the function pred.fxrate.

R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2)/sum((mean(data.fxrate$rate[(L+1):N]) – data.fxrate$rate[(L+1):N])^2)
name <- paste("Rsq_",country,sep="")
assign(name,R.sq)
try(ore.save(list=name, name="Rsquares",append=TRUE))

Then, after running the ore.groupApply function, we can retrieve the objects through ORE functions as below.

https://blogs.oracle.com/R/resource/table_fx.jpg 

Based on the R squared, the results look decent and will be even better if we can access data about other economic fundamentals and build an ensemble model. Due to the scope of this blog, we will leave this exploration to the reader. 

Invoke R scripts from SQL side 

Another scenario may require storing the result, such as R squared scores, into a structured format as a table in the database. Or we may need to store the generated image in the database. These can also be done by calling the R functions using capabilities provided by OAA on the SQL side. 

Let us first look at how we store the R squared as a table. Suppose we want to build the model over each currency in SQL. We can first create a SQL function that has the group apply capability. Recall that we have all data stored in FX_RATE. All we need to do is that we create a group apply function and also supply the script that build the model. 

CREATE OR REPLACE PACKAGE fxratePkg AS
TYPE cur IS REF CURSOR RETURN FX_RATE%ROWTYPE;
END fxratePkg;
 
CREATE OR REPLACE FUNCTION fxrateGroupEval(
inp_cur fxratePkg.cur,
par_cur SYS_REFCURSOR,
out_qry VARCHAR2,
grp_col VARCHAR2,
exp_txt CLOB)
RETURN SYS.AnyDataSet
PIPELINED PARALLEL_ENABLE (PARTITION inp_cur BY HASH("country"))
CLUSTER inp_cur BY ("country")
USING rqGroupEvalImpl;

This function is a PL/SQL function that can do the group apply. You can view it as a counterpart of ore.groupApply. Next, we store the script that build the model in the database. 

begin
sys.rqScriptDrop('RW_model'); — call if the model already exists.
sys.rqScriptCreate('RW_model',
    'function (data.fxrate) { 
  data.fxrate <- data.fxrate[order(data.fxrate$date),]
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1]
  } 
   R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)/sum((mean(data.fxrate$rate[(L+1):N], na.rm=TRUE) – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)
  data.frame(CURRENCY=country, RSQ = R.sq)
 }');
end;

 Note that in order to form a table, we need to create a data frame to store each single result. With both the group apply SQL function and the script stored, we now can call it within SQL.

select *
from table(fxrateGroupEval( cursor(select /*+ parallel(t, 4) */ * from FX_RATE t),
cursor(select 1 as "ore.connect" from dual),
'SELECT ''aaaaaaa'' CURRENCY, 1 RSQ FROM DUAL', 'country', 'RW_model'));

Note that “aaaaaaa” is a way to declare the format of the column, which is a 7 character text column. Moreover, we can even store the plots generated by each FX model. We can modify the function as below. 

begin
sys.rqScriptDrop('RW_model_plot');
sys.rqScriptCreate('RW_model_plot',
    'function (data.fxrate) { 
      data.fxrate <- data.fxrate[order(data.fxrate$date),]
 
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1]
  } 
 
  pred.df <- as.data.frame(data.fxrate$date[(L+1):N])
  pred.df$pred <- pred
  names(pred.df) <- c("date", "pred")
  plot(data.fxrate$date, data.fxrate$rate, type = "l")
  lines(pred.df$date, pred.df$pred, col="red")   
 }');
end;

Then, we can all the new SQL function and generate an table of images.  

select * from table(fxrateGroupEval( cursor(select /*+ parallel(t, 4) */ * from FX_RATE t),
 cursor(select 1 as "ore.connect" from dual),
'PNG', 'country', 'RW_model_plot'));

The output, if viewed at SQL developer, is as follows. 

https://blogs.oracle.com/R/resource/table_blob.jpg 

Note that now the image is generated and stored as a BLOB (binary large object) in the table. We can double click on the BLOB item and view the image in pop-up window (make sure the view as image box is checked). 

https://blogs.oracle.com/R/resource/blob.png 
Conclusion

In this blog, we demonstrate the parallel training of multiple FX rate models using a benchmark random walk model. The implementation only takes a few lines of code. We can see the powerful functionality provided by the in-database technology enabled by Oracle R Enterprise. 

from Oracle Blogs | Oracle R Technologies Blog https://blogs.oracle.com/r/parallel-training-of-multiple-foreign-exchange-return-models