Uncategorized – Bob's Blog

A few days ago @JozefHajnala pointed me to Oracle’s FastR which is an implementation of R on the Java Virtual Machine (GraalVM). This implementation aims to be completely compatible with the GNU R implementation we all know and love.

In September 2018 I gave a talk (GitHub, presentation) on using Rcpp for solving sudoku’s at Amsterdam SatRday. Point of the talk was to show the performance benefits that can be achieved using the Rcpp-package. This post describes my adventure of solving sudoku’s using FastR and compares performance between the Rcpp solver and the plain R sudoku solver ran using FastR.

Installation of FastR

The easiest way to get started is to obtain the GraalVM community edition from the GitHub releases page. Download the graalvm-ce tar.gz relevant for your platform and untar it and use the gu-tool to install and start R:

tar xzvf graal-ce-*-19.0.0.tar.gz
cd graal-ce-19.0.0/bin
gu install R
./R

If all went well, you will now see a somewhat familiar sight:

R version 3.5.1 (FastR)
Copyright (c) 2013-19, Oracle and/or its affiliates
Copyright (c) 1995-2018, The R Core Team
Copyright (c) 2018 The R Foundation for Statistical Computing
Copyright (c) 2012-4 Purdue University
Copyright (c) 1997-2002, Makoto Matsumoto and Takuji Nishimura
All rights reserved.

FastR is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information.

Type 'q()' to quit R.

A quick test drive

Installing packages is as easy as with GNU R so doing the first test doesn’t take long:

install.packages('microbenchmark')
sqr <- function(x) x * x microbenchmark::microbenchmark(sqr(runif(1000))) 
Unit: microseconds
            expr min  lq mean median  uq max neval
sqr(runif(1000)) 145 154  179    165 178 653   100

To be fair, this is not very promising. Running the same snippet in GNU 3.4.4 gives much better performance:

sqr <- function(x) x * x
microbenchmark::microbenchmark(sqr(runif(1000)))
Unit: microseconds
             expr min lq mean median uq   max neval
 sqr(runif(1000))  24 25  152     25 25 11919   100

Hmm, appears to me they should have called it SlowR instead. But before we dish out our final judgement let’s tweak the benchmark a bit (sqr.R in this gist):

library(microbenchmark)                                                                                                                                                                      
sqr <- function(x) x * x
f1 <- function() for (n in 1:1000) sqr(runif(1:n))
print(R.version)
print(microbenchmark(f1(), control = list(warmup = 100L)))

Running this snippet in both R’s motivated me to continue my research and actually made me excited to see what FastR has to offer:

            
> source('~/code/FastR/sqr.R')
...
language       R
version.string R version 3.4.4 (2018-03-15)                         
Unit: milliseconds
 expr     min       lq     mean  median       uq      max neval
 f1() 16.3209 17.15926 18.16575 17.4746 17.73782 51.27648   100

> source('~/code/FastR/sqr.R')
...
language       R                               
engine         FastR                           
version.string FastR version 3.5.1 (2018-07-02)
Unit: milliseconds
 expr      min       lq     mean   median      uq      max neval
 f1() 6.053445 6.374638 8.226994 6.765636 6.99983 30.11883   100

Solving sudoku’s

The sudoku below has been claimed to be the world’s hardest sudoku:

sudokuTxt <- "
 8 0 0  0 0 0  0 0 0
 0 0 3  6 0 0  0 0 0
 0 7 0  0 9 0  2 0 0
 0 5 0  0 0 7  0 0 0
 0 0 0  0 4 5  7 0 0
 0 0 0  1 0 0  0 3 0
 0 0 1  0 0 0  0 6 8
 0 0 8  5 0 0  0 1 0
 0 9 0  0 0 0  4 0 0"

On my machine, the R/Rcpp solver finds a solution within a second. The source can be found in the gist as sudoku.cpp and sudoku.R. Microbenchmarking this solution in GNU R gives

R> source('sudoku.R')
R> microbenchmark(cpp = solve2(sudoku, findChoicesCpp))
Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
  cpp 721.5412 735.6565 756.2961 748.2816 773.9773 835.9321   100

while plain R run using the FastR-engine is about two and a half times as fast:

$> ./R --vm.Xss5000k
R> source('plain_R.R')
R> microbenchmark::microbenchmark(
  solve2(sudoku, findChoices2), control = list(warmup = 20L)
)
Unit: milliseconds
                         expr min  lq mean median  uq  max neval
 solve2(sudoku, findChoices2) 257 260  293    266 271 1217   100

Caveats

Note in the example above the command line parameter to increase the stack size. If the stack size is not increased like that the function will crash.

Another important thing to note is that FastR will use more cores and will need more memory to run than GNU R.

Conclusion

Oracle FastR is an interesting project that can improve the runtime of R calculations by quite a bit and I will be watching its development closely. Resources use can be a problem, after running the benchmark, almost 4GB of RAM is used against less than 500mb for the process running the Rcpp code.

Category: Uncategorized

FastR sudoku’s

Installation of FastR

A quick test drive

Solving sudoku’s

Caveats

Conclusion

Hello world!