R-statistics blog

Speed up your R code using a just-in-time (JIT) compiler

This post is about speeding up your R code using the JIT (just in time) compilation capabilities offered by the new (well, now a year old) {compiler} package. Specifically, dealing with the practical difference between enableJIT and the cmpfun functions.

If you do not want to read much, you can just skip to the example part.

As always, I welcome any comments to this post, and hope to update it when future JIT solutions will come along.

Prelude: what is JIT

Just-in-time compilation (JIT):

is a method to improve the runtime performance of computer programs. Historically, computer programs had two modes of runtime operation, either interpreted or static (ahead-of-time) compilation. Interpreted code is translated from a high-level language to a machine code continuously during every execution, whereas statically compiled code is translated into machine code before execution, and only requires this translation once.
JIT compilers represent a hybrid approach, with translation occurring continuously, as with interpreters, but with caching of translated code to minimize performance degradation. It also offers other advantages over statically compiled code at development time, such as handling of late-bound data types and the ability to enforce security guarantees.

JIT in R

To this date, there are two R packages that offers Just-in-time compilation to R users: the {jit} package (through The Ra Extension to R), and the {compiler} package (which now comes bundled with any new R release, since R 2.13).

The {jit} package

The {jit} package, created by Stephen Milborrow, provides just-in-time compilation of R loops and arithmetic expressions in loops, enabling such code to run much faster (speeding up loops between 10 to 20 times faster). However, the drawback is that in order to use {jit}, you will need to use it through “the Ra Extension to R” (Ra is like R, only that it allows using the {jit} package). Sadly, the {jit} package will have no effect under standard R.  The package was not updated since 2011-08-27, and I am curious to see how its future might unfold (either continue on, merge with some other project, or sadly will go out of use).

The {compiler} package

The {compiler} package, created by Luke Tierney, offers a byte-code compiler for R:

A byte code compiler translates a complex high-level language like Lisp into a very simple language that can be interpreted by a very fast byte code interpreter, or virtual machine. The internal representation of this simple language is a string of bytes, hence the name byte code. The compilation process eliminates a number of costly operations the interpreter has to perform, including variable lookup and setting up exits for nonlocal transfers of control. It also performs some inlining and makes the language tail-recursive. This means that tail calls are compiled as jumps and therefore iterations can be implemented using recursion. (the source of this quote)

The compiler produces code for a virtual machine that is then executed by a virtual machine runtime system.  The virtual machine is a stack based machine. Thus instructions for the virtual machine take arguments off a stack and may leave one or more results on the stack. Byte code objects consists of an integer vector representing instruction opcodes and operands, and a generic vector representing a constant pool. The compiler is implemented almost entirely in R, with just a few support routines in C to manage compiled code objects.
The virtual machine instruction set is designed to allow much of the interpreter internals to be re-used. In particular, for now the mechanism for calling functions of all types from compiled code remains the same as the function calling mechanism for interpreted code.  (the source of this quote)

From the perspective of using JIT with R, the above means that the {compiler} package does not offer a jit compiler to a machine code, but it does offer it in order to turn it into byte code.

The byte compiler was first introduced with R 2.13, and starting with R 2.14, all of the standard functions and packages in R were pre-compiled into byte-code.  The benefit in speed depends on the specific function but code’s performance can improve up by a factor of 2x times or more.

In some early experiments, Dirk Eddelbuettel (our community’s R’s HPC guru) looked at what the {compiler} package can offer us when using the cmpfun() function.  e showed that the performance gain for various made-up functions can range between 2x to 5x times faster running time.  This is great for the small amount of work (e.g: code modification) it requires on our part, but just in order to give it some perspective, using Ra can speed up your code to be up to 25x times faster (as was also mentioned by Tierney himself in slides from 2010, see slide 20).  Moreover, by combining C/C++ code with R code (for example, through the {Rcpp} and {Inline} packages) you can improve your code’s running time by a factor of 80 (or even almost 120 to the worst manual implementation) relative to interpreted code.  But to be fair to R, the code that is used for such examples is often unrealistic code examples that is often not representative of real R work. Thus, effective speed gains can be expected to be smaller for all of the above solutions.

If you want to learn more on the {compiler} package, Prof Tierney wrote a massive 100+ page paper on R’s byte-code compiler, which also includes points for future research and development.

Using the {compiler} package as a JIT for R

Description

(From the manual “A Byte Code Compiler for R“)

JIT compilation, through the {compiler} package, can be enabled from within an active R session by calling the enableJIT() function with a non-negative integer argument (from 0 to 3), or by starting R with the environment variable R_ENABLE_JIT set to a non-negative integer.  The possible values of the argument to enableJIT and their meanings are:

R may initially be somewhat sluggish if JIT is enabled and base and recommended packages have not been pre-compiled as almost everything will initially need some compilation.

Example

The following example is an adaptation of the code taken from the ?compile help file.

Let us start by defining two functions we will use for our example:


##### Functions #####

is.compile <- function(func)
{
	# this function lets us know if a function has been byte-coded or not
	#If you have a better idea for how to do this - please let me know...
    if(class(func) != "function") stop("You need to enter a function")
    last_2_lines <- tail(capture.output(func),2)
    any(grepl("bytecode:", last_2_lines)) # returns TRUE if it finds the text "bytecode:" in any of the last two lines of the function's print
}

# old R version of lapply
slow_func <- function(X, FUN, ...) {
   FUN <- match.fun(FUN)
   if (!is.list(X))
    X <- as.list(X)
   rval <- vector("list", length(X))
   for(i in seq(along = X))
    rval[i] <- list(FUN(X[[i]], ...))
   names(rval) <- names(X)          # keep `names' !
   return(rval)
}

# Compiled versions
require(compiler)
slow_func_compiled <- cmpfun(slow_func)

Notice how in the last line of code we manually byte-compile our slow function. Next, let's define a function that will run the slow function (the raw and the complied versions) many times, and measure the time it takes to run each:


fo <- function() for (i in 1:1000) slow_func(1:100, is.null)
fo_c <- function() for (i in 1:1000) slow_func_compiled(1:100, is.null)

system.time(fo())
system.time(fo_c())

# > system.time(fo())
   # user  system elapsed
   # 0.54    0.00    0.57
# > system.time(fo_c())
   # user  system elapsed
   # 0.17    0.00    0.17

We see in this example how using the byte compiler on "slow_func" gave us a bit over 3x times speed gain. What will happen if we used the cmpfun() function on the function "fo" itself? Let's check:

fo_compiled <- cmpfun(fo)
system.time(fo_compiled()) # doing this, will not change the speed at all:
#   user  system elapsed
#   0.58    0.00    0.58

We can see that we did not get any speed gain from this operation, why is that? The reason is that the slow function (slow_func) is still not compiled:

is.compile(slow_func)
# [1] FALSE
is.compile(fo)
# [1] FALSE
is.compile(fo_compiled)
# [1] TRUE

The cmpfun() function only compiled the wrapping function fo (fo_compiled), but not the functions nested within it (slow_func). And this is where the enableJIT() function kicks in:

enableJIT(3)
system.time(fo())
#   user  system elapsed
#   0.19    0.00    0.18

We can see that "suddenly" fo has become much faster. The reason is because turning the JIT on (using enableJIT(3)), basically started turning any function we run into byte-code. So if we now check again, we find that:

is.compile(fo)
# [1] TRUE # when it previously was not compiled, only fo_compiled was...
is.compile(slow_func)
# [1] TRUE # when it previously was not compiled, only slow_func_compiled was...

This means that if you want to have as little modification to your code as possible, instead of going through every function you use and run cmpfun() on it, you can simply run the following code once, in the beginning of your code:

require(compiler)
enableJIT(3)

And you will get the speed gains the byte compiler has to offer your code.

On a side note, we can now turn the JIT back off using "enableJIT(0)", but it has still already compiled the inner functions (for example fo and slow_func). If we want to un-compile them, we will have to re-create these functions again (run the code that produced them in the first place).

To conclude: in this post I have discussed the current state of just-in-time compilation of R code, and shown how to use the {compiler} package for using JIT in R.

Exit mobile version