Parallel Training of Multiple Foreign Exchange Return Models

In a variety of machine learning applications, there are often requirements for training multiple models. For example, in the internet of things (IoT) industry, a unique model needs to be built for each household with installed sensors that measure temperature, light or power consumption. Another example can be found in the online advertising industry. To serve personalized online advertisements or recommendations, a huge number of individualized models has to be built and maintained because each online user has a unique browsing history. Moreover, such a model has to be updated in a frequent manner to capture the change of consumer behavior. When the number of models goes high, and even the algorithm is carefully designed and proved to be solid, it could be a challenge to implement in production. Especially for time sensitive applications, multiple model training cannot afford the extra delay caused by iterations through a huge number of models. A good example is the financial industry. In this article, we will show an example of fitting multiple foreign exchange (FX) rate models and demonstrate how we can leverage the powerful parallel computation capability provided by Oracle R Enterprise (ORE). 

FX Rate Data

The FX rate data can be obtained from Federal Reserve Economic Data . Instead of going online to fetch the data manually, a library in R called ‘quantmod’ provides a convenient way of downloading the data. Here is the code used for this purpose.

symbol = "DEXCAUS"

The symbol “DEXCAUS” means the FX rate of Canadian dollar to US dollar. In this example, we downloaded foreign exchanges rates for 22 currencies and focused on the time range from 1999 to 2015. 

for( symbol in symbols){ getSymbols(symbol, src="FRED") }
mergecode<-paste("merge(", paste(symbols,collapse=","),")", collapse="")
merged.df<-eval(parse(text= mergecode))
fxrates.df.raw <- data.frame(date=index(merged.df), coredata(merged.df))
fxrates.df.raw <- fxrates.df.raw[fxrates.df.raw$date>'1999-01-04',] 


Let us take a first look into the FX rate data. We plot the FX rate of the Canadian dollar to the US dollar: 

ggplot(fxrates.df.raw[fxrates.df.raw$date>'2015-01-01',], aes(date, DEXCAUS)) + geom_line()+ labs(x = "day", title = "CA dollar ~ US dollar FX Rate") + theme(plot.title = element_text(hjust = 0.5)) 

At a first glance, the series does not look to be stationary. To confirm it, we can run an Augmented Dickey–Fuller test to check if it has unit roots, which means that the series can have F(t) = ρF(t-1) + a(t) with ρ = 1. We can use the R library fUnitRoots to do the test. The null hypothesis is that the unit root exists. 

The result shows as follows:
Title: Augmented Dickey-Fuller Test
Test Results: PARAMETER: Lag Order: 1 STATISTIC: Dickey-Fuller: -0.8459 P VALUE: 0.3467

Since p >> 0.05, we cannot reject the null hypothesis. This suggests that there is unit root in this series and thus it is confirmed that the time series is non-stationary. 

FX Rate Prediction 

Foreign exchange rate series are known to be difficult to predict. For a time, the predictability is questioned since it seems untied to several economic fundamentals link. Thus, a random walk model is often used as a benchmark. In this article, we will implement a random walk model for demonstration purposes. 
A random walk model is formulated as 

F(t) = F(t-1) + a(t), 

where a(t) is the zero mean random noise. 
In R, we can use the following function to fit a random walk model. 

arima(data, c(0,1,0))

This basically means that we remove both the MA and AR parts and only retain the integral part, which is exactly the random walk model. 

The prediction is often backtested in a moving window fashion. For each time step t, the model is trained using data over [t-L-1, t-1], which is a window with length L. The prediction result is then evaluated by out of sample (OOS) data. Then, we move the window forward for every t and calculate the out of sample error. Here, we only use one sample as OOS data, which means that we use a window of historical data to predict the next day’s FX rate. 

There are many ways to evaluate the result of backtesting. Here, we adopted the R squared as a measure of the goodness of fit. The closer the R squared is towards 1, the more accurate the prediction will be. 
Combining all the ingradients, we now can write a function in R for making the predictions for one currency

pred.fxrate <- function (data.fxrate) {
  data.fxrate <- data.fxrate[order(data.fxrate$date),]
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1] }
  R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)/sum((mean(data.fxrate$rate[(L+1):N], na.rm =TRUE) – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)
  pred.df <-$date[(L+1):N])
  pred.df$pred <- pred names(pred.df) <- c("date", "pred")
  plot(data.fxrate$date, data.fxrate$rate, type = "l")
  lines(pred.df$date, pred.df$pred, col="red")} 

Note that the lines that compute the R squared has the option na.rm = TRUE on. It is because the data contains null values. 

We can test the function using CA dollar using the data from 2014 to 2017. The R squared is 0.97. Seems that we have a decent model! 
Parallel Prediction

As mentioned at the beginning, there are quite a few currencies and we probably do not want to loop through them. A solution is use the “group apply” capability provided by Oracle R Enterprise (ORE). That allows us to store the data as a table in Oracle Database (in many cases, it is the original data location), then run the function we wrote above in parallel for each currency. 

First, we need to merge all the FX data together and change the schema as follows.

fxrates.df <- data.frame(date=character(),
date.col <- fxrates.df.raw$date
symbols <- names(fxrates.df.raw)[-1]
n.row <- length(date.col)
for( symbol in symbols){ <-$country <- rep(symbol, n.row)$return <- fxrates.df.raw[,symbol]
  fxrates.df <- rbind(fxrates.df,
names(fxrates.df) <- c("date", "country", "rate")
fxrates.df <- fxrates.df[fxrates.df$date > '2014-01-01', ]
fxrates.df <- fxrates.df[order(fxrates.df$date),]

The data frame we obtained looks like:

date country rate
2014-01-02 DEXCAUS 1.0634
2014-01-03 DEXCAUS 1.0612
2014-01-06 DEXCAUS 1.0658
2014-01-07 DEXCAUS 1.0742
2014-01-08 DEXCAUS 1.0802
2014-01-09 DEXCAUS 1.0850

Then, we create the table in Oracle Database with ORE.

ore.drop(table="FX_RATE") # to remove the table if it already exists
ore.create(fxrates.df, table="FX_RATE")

After the table is created, we call the ore.groupApply function on the column ‘country’. That will run the function pred.fxrate on the FX rate of each currency, using at most four parallel executing R engines spawned by Oracle Database. 

res <- ore.groupApply(FX_RATE,
                      parallel = 4) 

Another way to store the result is creating an object in the ORE R Datastore. For instance, we can add the following code into the function pred.fxrate.

R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2)/sum((mean(data.fxrate$rate[(L+1):N]) – data.fxrate$rate[(L+1):N])^2)
name <- paste("Rsq_",country,sep="")
try(, name="Rsquares",append=TRUE))

Then, after running the ore.groupApply function, we can retrieve the objects through ORE functions as below. 

Based on the R squared, the results look decent and will be even better if we can access data about other economic fundamentals and build an ensemble model. Due to the scope of this blog, we will leave this exploration to the reader. 

Invoke R scripts from SQL side 

Another scenario may require storing the result, such as R squared scores, into a structured format as a table in the database. Or we may need to store the generated image in the database. These can also be done by calling the R functions using capabilities provided by OAA on the SQL side. 

Let us first look at how we store the R squared as a table. Suppose we want to build the model over each currency in SQL. We can first create a SQL function that has the group apply capability. Recall that we have all data stored in FX_RATE. All we need to do is that we create a group apply function and also supply the script that build the model. 

END fxratePkg;
inp_cur fxratePkg.cur,
out_qry VARCHAR2,
grp_col VARCHAR2,
exp_txt CLOB)
CLUSTER inp_cur BY ("country")
USING rqGroupEvalImpl;

This function is a PL/SQL function that can do the group apply. You can view it as a counterpart of ore.groupApply. Next, we store the script that build the model in the database. 

sys.rqScriptDrop('RW_model'); — call if the model already exists.
    'function (data.fxrate) { 
  data.fxrate <- data.fxrate[order(data.fxrate$date),]
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1]
   R.sq <- 1 – sum((pred – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)/sum((mean(data.fxrate$rate[(L+1):N], na.rm=TRUE) – data.fxrate$rate[(L+1):N])^2, na.rm =TRUE)
  data.frame(CURRENCY=country, RSQ = R.sq)

 Note that in order to form a table, we need to create a data frame to store each single result. With both the group apply SQL function and the script stored, we now can call it within SQL.

select *
from table(fxrateGroupEval( cursor(select /*+ parallel(t, 4) */ * from FX_RATE t),
cursor(select 1 as "ore.connect" from dual),
'SELECT ''aaaaaaa'' CURRENCY, 1 RSQ FROM DUAL', 'country', 'RW_model'));

Note that “aaaaaaa” is a way to declare the format of the column, which is a 7 character text column. Moreover, we can even store the plots generated by each FX model. We can modify the function as below. 

    'function (data.fxrate) { 
      data.fxrate <- data.fxrate[order(data.fxrate$date),]
  N <- nrow(data.fxrate)
  L <- 300
  pred <- rep(0, N-L)
  country <- data.fxrate$country[1]
  for(i in (L+1):N){
    model <- arima(data.fxrate$rate[(i-L):i-1],c(0,1,0))
    pred[i-L] <- predict(model,1)[[1]][1]
  pred.df <-$date[(L+1):N])
  pred.df$pred <- pred
  names(pred.df) <- c("date", "pred")
  plot(data.fxrate$date, data.fxrate$rate, type = "l")
  lines(pred.df$date, pred.df$pred, col="red")   

Then, we can all the new SQL function and generate an table of images.  

select * from table(fxrateGroupEval( cursor(select /*+ parallel(t, 4) */ * from FX_RATE t),
 cursor(select 1 as "ore.connect" from dual),
'PNG', 'country', 'RW_model_plot'));

The output, if viewed at SQL developer, is as follows. 

Note that now the image is generated and stored as a BLOB (binary large object) in the table. We can double click on the BLOB item and view the image in pop-up window (make sure the view as image box is checked). 

In this blog, we demonstrate the parallel training of multiple FX rate models using a benchmark random walk model. The implementation only takes a few lines of code. We can see the powerful functionality provided by the in-database technology enabled by Oracle R Enterprise. 

from Oracle Blogs | Oracle R Technologies Blog

Because it’s Friday: History of Australia

This one’s mainly for my Australian friends and family who, if they’re anything like me, didn’t realize that there had been so many changes since the colonial borders of Australia were first drawn. The state where I was born, South Australia, changed shape no less than four times @— I had no idea! (via Max Galka)

That’s all from the blog for this week. Monday’s a holiday here in the States, so we’ll be back on Tuesday. See you then!

from Revolutions

The U.K. Pleads With Congress to Change an Outdated Privacy Law to Help Fight Terrorism

Updating the Electronic Communications Privacy Act, which governs law enforcement access to data, would also help U.S. investigators.

from Mobile – MIT Technology Review

A True Hybrid Cloud Platform

Where are we today?   Since the launch of Qlik Sense Cloud in 2014 we have seen rapid growth in our cloud community, which now includes over 100,000 users worldwide.  We continue to expand and improve our offerings and today we offer multiple

from Jive Syndication Feed

Horizontal Bar Chart Extension

Last year I blogged about our  Mobile Friendly Horizontal Bar Chart that we use in most of our mashups in the Qlik Demo Team.   Since then, many things have changed. For a start, if you have a mashup that uses many objects, you will see the

from Jive Syndication Feed

Interview with Charlotte Yarkoni, Corporate Vice President Growth and Ecosystem at Microsoft | GALs

In this episode, Soumow sits with Charlotte Yarkoni, Corporate Vice President of the newly created Growth and Ecosystem division at Microsoft.  Charlotte talks about growth and adoption of Azure and her career journey and the different roles she has held in her career. 


GALs is a show about the women who work in Tech (at Microsoft or outside) from three ladies that currently work on the Channel 9 team. Golnaz Alibeigi, Soumow Atitallah, and Kaitlin McKinnon have started a new series featuring women in Tech who work in development, management, marketing and research who have interesting stories to share about their success in the industry and ideas on how to grow diversity in IT.

from Channel 9

TickSmith Releases a Python Tool for the New Generation of Financial Data Scientists

TickSmith, a leader in Big Data applications, released an open-source Python API feature to obtain data from its flagship TickVault big data platform.  Based on Hadoop technology, TickVault processes, stores, and analyzes massive amounts of capital market data. The addition of the Python API  toolkit to TickVault provides data scientists fine-grained access to historical exchange […]

from insideBIGDATA