Your organization doesn’t have to be a Google, Facebook or Shell Oil to have a use for large volumes of data, like those that are sitting in a data warehouse or publically available online. Unfortunately, cleaning and aggregating this data to create intelligible visualizations can take weeks. One solution: the R programming language.
Using just a few lines of code, R can generate a data plot from a large dataset. As a result, it has become a critical language not only for data scientists, but for anyone doing analytic research.
“When I was using Excel, I didn’t realize how much time I spent just gathering data in a form that I could use. I’d go to Yahoo Finance, navigate to the historical data page, download the series I needed, import into Excel, convert prices into %G/L, then go about analyzing the data. All of that would easily take 5 – 10 minutes, for just one data series. In R, I can do all of that in two lines of code.” – CIO Franklin Parker, quoted in Quora
What is the R Language?
R is a free and open source programming language that was purpose-built for statistical computations and data analytics. It’s used for advanced analytics across a wide variety of verticals, for example, finance, healthcare, academics and manufacturing.
R is easy to install and runs on a variety of platforms. Because it is open source, it has thousands of packages (collections of functions) available for free download. RStudio, one of the most common IDEs for working with R, makes it easy to install packages, display information about how they work, and run scripts. A new user can get started very quickly by just running some of the most common scripts. There is plenty of online documentation and support from the R community for newbies also.
Being able to reproduce results is a fundamental requirement in data science. One benefit of working in R is that, as you manipulate a dataset in RStudio (or other GUI), every step of your process is saved. That makes it easy to find and reproduce your work after you’ve gone through several dozen iterations and want to recreate your process.
Another benefit of R often cited by its users is the quality of visualizations. There are many visualization packages, such as ggplot2, plot3D, shinydashboard, and shiny, that offer really attractive graphics. You can see some examples like the one below in this ggplot tutorial.
However, there are a couple of issues to be aware of when using R. Some users find the syntax difficult, especially compared with Python. And, since it was developed specifically for statistics, R is not an all-purpose language like Python. Perhaps for these reasons, R has seen a slight drop in popularity recently. It should be noted however, that other drawbacks cited by users, that packages go out of support or that documentation can be unclear, are also true for Python (and in general for any open source code.)
R For Excel Users
“Big Data is anything that breaks Excel.” – Vince Fong, Data Analyst on Quora
A generation of Excel experts may find that R is an excellent complement to Excel. If your workbook has so many rows that it becomes slow to open, it may be time to give R a try. R packages like tidyr make it easy to clean and structure a dataset by identifying outliers, validating and reformatting fields and enhancing the dataset with additional information.
Harness the Power of R with Orbit Reporting and Analytics
Orbit lets you apply R to your live data, so you can use advanced analytics on your existing data sources, be they EBS, databases or standalone csv files. You can build a query in Orbit, then run it as an R script. The resulting chart can be saved, shared, emailed and/or added to a dashboard. You get the benefit of R’s powerful statistics with access to all your existing database data.
If you want to take your data-guided decisions to the next level, Orbit Reporting and Analytics can help. Request your demo today!