SQL Server 2016 includes R Services and Advanced Analytics Extensions. R Services makes it possible to run R scripts within SQL Server. As a result, R analysis results can be delivered through Report Manager using SQL Server Reporting Services.
What is R?
R is a statistical programming language that is used extensively in scientific research for data analysis and predictive modelling. The language and packages are freely available through the R archive. Development is carried out in an integrated development environment such as RStudio.
The unique strength of R comes from the thousands of open source packages written for it by scientists and other volunteers. Many people have used these packages, and the bugs have been worked out of them. When issues do emerge in the open source code, they can be quickly addressed and resolved. R has gained critical momentum as the popular and reputable way to do this sort of analysis.
Parallel processing enhancements were added by Revolution Analytics, now part of Microsoft. Microsoft distributes the MKL math library and High Performance Analysis tools as part of Microsoft R Open and Microsoft R Server.
Packages, Packages and More Packages.
Detailed documentation is available within the development environment. For example, typing “?plot” in RStudio will pull up documentation on the plot command, which is part of the built-in “graphics” package. Other built-in packages include the “stats” package and the “datasets” package. The datasets package includes sample sets with a few hundred data points each. Type “?iris” to see documentation on one of these datasets, or “str(iris)” to see a summary of its structure and contents.
Downloadable packages generally focus on a specific problem, such as mapping genetic traits to particular locations on a chromosome. Others are tool packages, such as the ggplot2 package that offers enhanced data visualization. Here is how you would download the ggplot2 package from the archive from inside the RStudio development environment:
Finally, a package is loaded for use in the current session:
It is interesting to contrast three ways of accessing data:
Through a flat file
Through a connection to a database
From within SQL Server 2016 itself
Small data sets are commonly read in from flat files. For example, this code reads in a flat file with exchange rate data downloaded from the Federal Reserve (http://www.federalreserve.gov/releases/h10/Summary/):
The resulting recordset is called a “data frame” in R. Here is another way to plot the same data:
ggplot(d, aes(x = month, y = pound)) + geom_point();
Finally, SQL Server 2016 R Services makes it possible to run R scripts and access data directly from within the database instance. This offers security and performance benefits. It is implemented as a procedure that executes external scripts: sp_execute_external_script. There are a number of arguments to this procedure that specify the language, script, input and output datasets, and parameters. Learn more here.
Below is how it looks when the query and plot statements developed above in RStudio are folded into a stored procedure. There is no ODBC connection, since this is executed within the database instance. Data could be returned in a recordset, but this example returns a jpeg image as a varbinary(max) data type.
To distribute the result through Reporting Services, an SSRS project, report, and data source are created.
Next, a data set is created that points to the stored procedure.
An image object is set to display the resulting jpeg image.
The finished report is uploaded to Report Manager.
The report can be executed and saved to a PDF or other format using normal Report Manager functionality.
This form of reporting is exciting because it ties into the way many businesses already analyze and understand their data: through reporting. Data analytics reports can be made available to anyone in the company with Reporting Services, and distributed through an existing Report Manager.