"The next big thing", R, and Statistics in the cloud

A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”.

In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):

Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. […]
for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

Here are my two cents on the subject:

First, I agree with Dr. De Mars that R (out of the box) is not very (non programmer) user friendly – there is (almost) no point and click capabilities.  And while there are several projects offering a GUI layer interface to R (a good list of them can be found here), still non of them is in the level of refinement of what softwares like SPSS, JMP or SAS offers to users today.

But is traditional “point and click” the next “big thing”?  My suspicion is that the answer is – no.
Neither does Dr. De Mars thinks so, since her predictions for the next big thing are “Data visualization” and “Analyzing enormous quantities of unstructured data”. Both of which R is offering quite powerful solutions to (assuming that you will go through the learning curve).

Dr. De Mars question is a fascinating one – what IS going to be the next big thing?

I think that the next BIG thing is (becoming to be) “Statistics in the Cloud“.  This intuition came from (among other things) my review of the “Future of Open Source” Survey (see “conclusion 3”).

In the near future, I believe, we will see more statisticians and data analysts tapping into the opportunities that cloud computing offers them.  Here are some examples of what I came a cross (or covered) lately in the topic of cloud computing and R:

  1. Easy online collecting of data (via google forms)
  2. High-performance computing – Running a statistical package software on the cloud for accessing a powerful computer or running stuff in parallel.  The former can be done through services like Amazon cloud, Elastic R, and lately R-Node combines the running of R and Protovis on a server.  The later I don’t have experience in, but understand there are various solutions in R (a known company in the field is, of course, REvolution computing)
  3. Online statistical analysis/visualization of data – Having a web interface to a statistical analysis.  One wonderful example of that is Jeroen Oom’s (beautiful) web interface to ggplot2.  Such projects offer “point and click” capabilities through the internet (/cloud)
  4. Online interactive visualization of data.  I came a cross three people offering to develop solutions for doing this with R in this year’s Google summer of code, I hope something will come out of it

All of these are well connected to the emerging trend of “web of data”/“linked data web” that some are talking about. For example, here is a good Ted talk by Tim Berners-Lee (the inventor of the World Wide Web). Talking about building a web for open, linked data that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it together.

The same plea is given by Hans rosling in his famous Ted talk showing GapMinder.
Although at he same time, some R users are saying – “You don’t have to bother linking the data. I’ll do with just the data, really, just release it…

In conclusion, I don’t know what capabilities other projects/products offer for doing statistics in the cloud.  But it is clear to me that the R community is (not surprisingly) bringing very diverse and innovative solutions to the world.

Is R the next big thing?  I don’t think so.  But I do think that some of the next big things will be built with R.
* * *
I would love to know your thought about Dr. De Mars post, and also about what the “next big thing” is going to be (and what role will R have in it).

11 thoughts on “"The next big thing", R, and Statistics in the cloud”

  1. yeah she is saying R will not be the next big thing because it has no decent GUI (fair enough), but the next big thing will not be R with GUI or web-enabled R but instead visualization and data mining.

    Gee thanks for being so specific.

    Here is my prediction: the next big thing will involve computers, or maybe sequencing. I’ll have to get back to you on that one.

  2. actually, as a student you do learn statistics and econometrics if you use R. (On average) As opposed to STATA and SPSS you need to understand what you are doing as opposed to just being able to interpret the regression output.

  3. R is already the next big thing; it isn’t about point and click that is for applications not tools. R is about solving complex statistical problems in a collaborative way not cleaning data or presenting data or capturing user input, these are higher functions away from the calculation.

    Statisticians aren’t looking for point and click when they modelling extreme value theory problems or T-Tests or Bayesian ROC analysis or any other kind of statistical quest, they are looking for a solution.

    R-is extensible and evolving, it has a massive community and a statistical tool that is an applied solution with examples to world (business, physics, biology and so on) problems rather than a method.

    “the much greater cost of software is the time it takes to install it, maintain it, learn it and document”

    It takes about seven minutes to install R and I was able to learn it in an afternoon that is pretty straight forward in my opinion.

  4. This article is superb and amazingly written … I am going with R programming but still looking the alternate because of its instability in future but after going through your article I can stick myself to make more proficient in R. I used Jeroen Oom ggplot2 interface, it is fantastic and felt good to get its appreciation in your blog.
    Thanks dear for sharing your wonderful thoughts and clearing the scope and significance of R.
    “Is R the next big thing? I don’t think so. But I do think that some of the next big things will be built with R” this line is very touching…..

    1. Hello dear Kr Pawan, thank you very much for your kind words.

      It is so lovely to see that something I wrote over two years ago, can still be found of use to people as yourself.

      By the way, Jeff Horner recently wrote a very far sighted post regarding statistical computing, the cloud, and R. You (and other readers) might find it interesting to read:
      http://jeffreyhorner.tumblr.com/post/35782252672/innovation-in-statistical-computing

      Good luck in your R quests,

      With regards,
      Tal

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.