Category Archives: R Programming

R Graphics: Multi-Graph Layouts

The layout() Function

The ability to manage multiple plots in one graphical device or window is a key capability to enhance data visualization and analysis.

The layout() function in base R is the most straightforward method to divide a graphical device into rows and columns.  The function requires an input matrix definition.  Column-widths and the row-heights can be defined using additional input arguments.  layout.show() can then be used to see multi-graph layouts and how the graphical device is being split.

Posted in R Graphics, R Programming | Leave a comment

R Basics

The R Kernel

The R kernel is compromised of scripts written in the R and C programming languages.  It includes a set of core function libraries, an interpreter to run R scripts, and a set of powerful graphical devices.  In total, these elements are more commonly referred to as base R.

Posted in R Programming | Leave a comment

R Programing

R shares many programming constructs with other programming languages, but it also offers coding and memory management efficiencies, which simplify scripting and model building.  Fortunately, R programming is easy to learn.  This chapter on R programming is structured as follows:

Creating R Functions
Local vs Global Objects
Conditionals
Iterations
Special Functions
Debugging


Back | Next

 

Posted in R Programming | Leave a comment

Building Layered Plots

Layered plots are one way to achieve new insight and actionable intelligence when working with complex data.  ggplot is well suited for layered plots.

Data Pre-Processing

To make graphs with ggplot(), the data must be in a data frame and in “long” (as opposed to wide) format.  Converting between “wide” and “long” data formats is facilitated with the reshape2 package.  Specifically, the melt() function converts wide to long format, and the cast() function converts long to wide format.  The following code block presents examples of the two data formats.

Posted in ggplot2, R Graphics, R Programming | Leave a comment

Conditionals in R

Conditionals are expressions that perform different computations or actions depending on whether a predefined boolean condition is TRUE or FALSE.  Conditional statements include if(), the combination if()/esle(), ifelse(), and switch().  Each statement supports source code branching by altering the control flow.  

The if() Statement
The if() statement is common in all programming languages.  The if() statement performs operations based on a simple condition:

Posted in R Programming | Leave a comment

Creating R Functions

Creating functions and object orientated scripts are the preferred way to use R.  R functions expand the capabilities of R. By nature, R scripts are a way to organize and save data, complicated expressions, or sequences of operations for re-use.  Well configured  R functions rely on proper use of R language concepts and object orientated structures.

R Scripts vs. R Functions

Scripts and functions have several distinguishing characteristics:

Posted in R Programming | Leave a comment

Data Sorting in R

Data sorting in R is simple and straightforward.  Key functions include sort() and order().   The variable by which sort you can be a numeric, a string or a factor variable.  Argument options also provide flexibility how missing values will be handled:  they can be listed first, last or removed.

Data Sorting Examples

It is also possible to sort in reverse order by using a minus sign ( – ) in front of the sort variable.  For example:

Posted in R Data Objects, R Programming | Leave a comment

R Data Subscripting

Intro to R Data Subscripting

R data subscripting is a key “motor skill” to extract data by row, column or element.  Subscripting can also extract data using logical conditions or pattern matching.  Subscripting is also used to assign values to data object elements.

The syntax for data subscripting can take several forms depending on data structure and data object type. Examples are provided below.

Posted in R Data Objects, R Programming | Leave a comment

R Dates and Times

Preprocessing work to maintain R dates and times requires synchronize of data and formats across data sources. R dates and times justify care and attention.

Current Date/Time in R

The function date(), Sys.date() and Sys.time() all return a character string of the current system data and time:

Each of these functions returns a slightly different result, which raises the obvious question how best to manage and format dates in large data objects?

Posted in R Data Objects, R Programming | Leave a comment

Debugging in R

There are several utilities for debugging in R.

Debugging with traceback()

Whenever a custom function generates an error, the traceback() function is a good way to focus initial problem solving.  The function lists the nested function calls currently being evaluated, starting with the function from which the error was returned and working outward to the original calling function.

Posted in R Programming | Leave a comment

What is R?

Statement of Purpose

Documentation on the R programming language has been developed to provide a comprehensive answer to question “What is R?”  The approach taken seeks to appeal to new users and the reliance on practical examples seeks to provide applied, long-term reference for seasoned users.

What is R?

R is an open-source implementation of the the S programming language, which was developed by Bell Labs “to improve data manipulation, analysis, and visualization.”

Posted in R Programming | Leave a comment

Iteration in R

Iteration is core to many calculations.  The use of iteration in R is common, but should be avoided whenever possible given vectorized methods that often achieve the same goal.

Iteration, or traditional looping, is a brute force approach to data management that is effective, but costly.  Every time a large data set enters an iteration loop, a copy of the data is saved to disk.  Thus, iteration consumes time and memory.  R supports the following vectorized looping functions: apply(), lapply(), tapply(), sapply() and by().  More traditional functions for iteration in R are described below.

Posted in R Programming | Leave a comment

Local vs Global Objects

Local vs global objects in R serve to distinguish temporary and permanent data.

Local Objects and Frames

Data objects assigned within the body of a function are temporary.  That is, they are local to the function only.  Local objects have no effect outside the function, and they disappear when function evaluation is complete.    

Posted in R Data Objects, R Programming | Leave a comment

Extract Data Tables from PDF Files in R

A new method to extract data tables from PDF files is introduced. Most of the data scraping tools available are browser-based.  The common tools are also manual in nature and limited to one table at a time. A solution is outlined to extract multiple tables at once.  The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.

Benefit Statement
The ability to train a machine to extract data tables from PDF files has several benefits:

Posted in Data, Misc Tricks, R Data Import, R Programming | Leave a comment

Popularity of R Programming Language

TIOBE IndexThe popularity of R is rapidly increasing and is well on its way to being a top 10 programming language.  The TIOBE index is a standard indicator of the popularity of all programming languages.  The TIOBE index confirms that a subset of languages – those for computational statistics and data analysis – are gaining increased attention. The clear winner of the pack is the open source programming language R.

Posted in Data Science, R Programming | Leave a comment