Over the past couple weeks, I’ve been considering alternatives to R. I’d heard Python was much faster, so I translated a piece of R code with several nested loops into Python (it ran an order of magnitude faster). To find out more about Mathematica 9, I had an extended conversation with some representatives from Wolfram Research (Mathematica can run R code, I’ll post a detailed review soon). And I’ve been experimenting with JavaScript and HTML5’s “canvas” feature.
JavaScript may seem like an unlikely competitor for R, and in may ways it is. It has no repository of statistical analysis packages, doesn’t support vectorization, and requires the additional layer of a web browser to run. This last drawback, though, could be it’s killer feature. Once a piece of code is written in JavaScript, it can be instantly shared with anyone in the world directly on a web page. No additional software needed to install, no images to upload separately. And unlike Adobe’s (very slowly dying) Flash, the output renders perfectly on your smartphone. R has dozens of packages and hundreds of options for charts, but the interactivity of these is highly limited. JavaScript has fewer charting libraries, but it does have some which produce nice output.
Nice output? What matters is the content; the rest is just window dressing, right? Not so fast. Visually pleasing, interactive information display is more than window dressing, and it’s more in demand than ever. As statisticians have stepped up their game, consumers of data analysis have come to expect more from their graphics. In my experience, users spend more time looking at graphs that are pleasing, and get more out of charts with (useful) interactive elements. Beyond that, there’s a whole world of simulations which only provide insight if they are visual and interactive.
Pretty legs, but can she type?
Alright, so there are some advantages to using JavaScript when it comes to creating and sharing output, but what about speed? The last time I used JavaScript for a computationally intensive project, I was frustrated by its slow speed and browser (usually IE!) lockups. I’d heard, though, that improvements had been made, that a new “V8” engine made quick work of even the nastiest js code. Could it be true?
If there’s one thing I rely on R for, it’s creating random variables. To see if JavaScript could keep up on R’s home court, I ran the following code in R:
start = proc.time()[3]
x = rnorm(10^7,0,1)
end = proc.time()[3]
cat(start-end)
Time needed to create 10 million standard Normal variates in R? About half-a-second on my desktop computer. JavaScript has no native function to generate Normals, and while I know very little about how these are created in R, it seemed like cheating to use a simple inverse CDF method (I’ve heard bad things about these, especially when it comes to tails, can anyone confirm or deny?). After some googling, I found this function by Yu-Jie Lin for generating JS Normals via a “polar” method:
function normal_random(mean, variance) {
if (mean == undefined)
mean = 0.0;
if (variance == undefined)
variance = 1.0;
var V1, V2, S;
do {
var U1 = Math.random();
var U2 = Math.random();
V1 = 2 * U1 - 1;
V2 = 2 * U2 - 1;
S = V1 * V1 + V2 * V2;
} while (S > 1);
X = Math.sqrt(-2 * Math.log(S) / S) * V1;
// Y = Math.sqrt(-2 * Math.log(S) / S) * V2;
X = mean + Math.sqrt(variance) * X;
// Y = mean + Math.sqrt(variance) * Y ;
return X;
}
So how long did it take Yu-Jie’s function to run 10 million times and store the results into an array? In Chrome, it took about half-a-second, same as in R (in Firefox it took about 3 times as long). Got that? No speed difference between R and JS running in Chrome. For loops, JS seems blazing fast (compared to R). Take another look at the demo simulation I created. Each iteration of the code requires on the order of N-squared operations, and the entire display area is re-rendered from scratch. Try adding new balls using the “+” button and see if your browser keeps up.
It’s only a flesh wound!
So have I found the Holy Grail of computer languages for statistical computation? That’s much too strong a statement, especially given the crude state of JS libraries for even basic scientific needs like matrix operations. For now, R is safe. In the long term, though, I suspect the pressures to create easily shared, interactive interfaces, combined with improvements in speed, will push more people to JS/HTML5. Bridges like The Omega Project (has anyone used this?) might speed up the outflow, until people pour out of R and into JavaScript like blood from a butchered knight.
Tags: javascript, monty python, python, r
And you can easily code a js page which submits the results of a calculation to a server. That way you could harvest cpu time from every visitor to a page.
Aslak did you hack my computer? I’ve just been looking into this way of distributed computing. Was reading about “web workers” and Ravan this afternoon.
If anyone is interested in going deep into this idea let me know.
How did python fair relative to R?
I haven’t tested creating Normals in Python. If no one else steps up I’ll translate Yu-Jie’s code or try NumPy. My python results came from a simulation similar to the one I did in JS (I used pyGame to output the graphics, it was VERY fast).
UPDATE:
Wrote this one-liner, tried to do it in the most “pythonic” way possible. make sure to “import numpy” first:
x = [numpy.random.normal() for i in range(10**7)]
Result: my laptop froze up for several minutes. Might be a memory (and not speed) issue since creating 10^6 normals happened in less than a second.
Why didn’t you do
x=numpy.random.normal(size=1e6)
?
That is an order of magnitude faster
@ram Your method is not only faster
x=numpy.random.normal(size=1e6)
but also solves the problem of running out of memory, too, since there is no “for loop” involved which is slower and memory intensive
x = [numpy.random.normal() for i in range(10**7)]
What about Julia (http://julialang.org/) it’s new and blazingly fast with a simple and powerful syntax.
The project I am following is Numeric Javascript:
http://www.numericjs.com/
Node (http://nodejs.org/) can execute javascript w/o a browser and is based on the Chrome v8 engine, so it should perform very well.
jStat (http://www.jstat.org/) will require a browser for graphing, but it aims to be[come] the javascript alternative to R.
You’re not the first to talk like this, see:
http://www.r-bloggers.com/the-best-statistical-programming-language-is-%E2%80%A6javascript/
Personally, I suspect that easy sharing of results (in the near future) will be achieve using Shiny. R and the likes are not scalable when the scale you need is more then what the “average” statistician needs today, hence – I am not sure a flood out of R is something to expect soon.
Still, you raise a good point of frustration, one which I hope will inspire people to find good solutions (through R, or Julia, or JS, or whatever).
Cheers,
Tal
Thanks Tal !!
Don’t forget Dirk’s Rcpp which is trully a game changer for R programmer.
(In a way R-blogger too…seriously it change the way people learn about R….thank you for that.)
Hello Ahmadou,
Thank you for your kind words regarding r-bloggers.
I also agree with Rcpp being VERY important for the present and future of R. As someone wrote recently – R can be viewed as a front end layer for C++. Which I am not sure how true it is – but Rcpp is definitely making this more of the case.
Tal
For fun I implemented an MCMC Bayesian “t-test” in Javascript (demo here: http://www.sumsar.net/best_online/ ) and I was surprised at how fast it was. I can now do bayesian computation on my Iphone. Whats missing from JS is some decent libraries of course…
Great stuff! Did you write the plotting functions as well?
For plotting i used Flot (http://www.flotcharts.org/) and for some of the probability distributions I used jStat (http://www.jstat.org/).
Honestly, if speed is so important that you start looking at languages like JS, then you might as well go the C++ or FORTRAN route. FORTRAN is pretty simple, actually, you can learn enough to implement most univariate routines within a few days. E.g. EM algorithm.
Though, for the sake of re-usability, Rccp is really the way to go. With Rcpp and related packages opens up the possibility of combing R and C++, where you let C++ do the heavy lifting and R the rest.
Julia might be an option… From what I have seen, it is mostly geared towards matrix calculations, which can be done quite simply using Armadillo. See Dirk’s recent paper: http://dirk.eddelbuettel.com/papers/RcppArmadillo.pdf
The problem is the amount of code and time you will waste. Imagine coding a random Forest or a Arima fitting method using Javascript.
Ughs!
You don’t need the “additional layer of a web browser” to run javascript – you just download node.js, a standalone JS interpreter, and you can run javascript files without one. You don’t get a DOM that you can manipulate, so visualisation is tricky, but you can run simple statistical stuff.
Nobody seems to be worried about numerical accuracy. I remember seeing some time ago a couple of seriosly flawed arithmetic calculations in JS that were pretty scary. Has this got any better with newer JS engines?
You should check out IPython Notebook. Couple it with Numba and you can get crazy speeds.
There is already some work done to let you use d3js for graphing which is probably the most flexible vis package out there.
The random number generators in products such as R, MATLAB and numpy are of significantly higher quality than those used by Javascript’s math.random().
Your post inspired one from me where I go into more detail on this issue
http://www.walkingrandomly.com/?p=4855
Finally, google ‘Numeric javascript’ for a nice looking javascript library for computation (I haven’t tried it though).
Cheers,
Mike
Hi Mike,
Thanks for providing more information on the differences between how R and JS generate their random numbers. BTW it looks like node.js has a Mersenne library.
Yes, you are nuts. But I like it!
Great article.I’d like to see more examples of statistical processing in JS.
Being able to run statistical model in JS is really great to write demo and tutorials!
With the help of numericjs library and existing github code samples I tested Baum-Welch estimation of Hidden Markov Model (HMM) with multi-dimensional gaussian observations.
See the demo here :
https://mzaradzki.github.io/probabilistic-javascript/demos/hmm.html
Despite my rough knowledge of javascript it turned out to be very easy to put it together.
Previously to run such demo I would have relied on a server side code, definitely not as convenient w.r.t. hosting solutions.
“Pretty legs, but can she type?” That is sexist and offensive. Please exercise better judgement.
What are you view on JS vs R in 2017?
I almost exclusively use JS for simulations now, except when I want to bang out something small for internal use.
Thank you for posting your experience, I have now concluded to use JS going forward and to pair it with https://nwjs.io to create a tailored solution for my needs.