Category Archives: R Data Import

R Data Import


Quantitative analysis depends on the ability to load and manage many different types of data and file formats.  There are many R data import functions. Some functions ship with base R and others can be found in R packages.

Data Available in R

R is pre-installed with many data sets in the datasets package, which is included in the base distribution of R. R datasets are automatically loaded when the application is started.  A list of all data sets in the package is obtained using the following command:

Posted in R Data Import, R Data Objects | Leave a comment

Extract Data Tables from PDF Files in R

A new method to extract data tables from PDF files is introduced. Most of the data scraping tools available are browser-based.  The common tools are also manual in nature and limited to one table at a time. A solution is outlined to extract multiple tables at once.  The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.

Benefit Statement
The ability to train a machine to extract data tables from PDF files has several benefits:

Posted in Data, Misc Tricks, R Data Import, R Programming | Leave a comment

Aerosol Animation

Aerosol Optical Depth (AOD) defines the degree to which aerosols prevent the transmission of sunlight by absorption or scattering.  AOD is measured using an integrated extinction coefficient over a vertical column of air.  The extinction coefficient can be used to analyze solar extinction and the performance of solar power systems as a function of location and time.


Posted in Animation, Data, Dust, Meteorology, Modeling, Qatar RE, R Data Import, R Graphics, Resource Assessment, Saudi Arabia, Solar, Spatial Analysis | Leave a comment

Crop Raster Images in R

E020N40The maptools package has a pruneMap() function t0 crop map objects in R.  In practice, the function extracts data from SpatialPolygon or SpatialLine objects given a boundary box or specific area of interest.  Unfortunately, there is no equivalent function for high resolution, large data, raster images, which are common in many Earth Science applications.  The following post defines a custom function to crop raster images in R and to extract data from SpatialGridDataFrames.  The function is tested using a raster image from the Shuttle Radar Topography Mission (SRTM; shown at left).  The resulting data is then mapped using the image() function in R.

Posted in GDAL, GIS, R Data Import, R Graphics, Spatial Analysis | Leave a comment

Fast File Reads (fread) for Large Data

freadThe standard way to read text files into R is to use the read.table() command.  However, many users struggle with time delays when loading large data sets.  An alternative command that offers significant speed improvements is fread(), or fast read, which can found in the data.table package.  The following code loads a tab delimited file with a million elements and reveals that fread() reduces load time by almost 99%, as confirmed by the benchmark performance stats at left.  The function is still under development, but it is available for download and doesn’t suffer from stability issues.  Instead, expect argument structure and command syntax to change over time.

Posted in Faster R, R Data Import, R Packages, R Programming | Leave a comment

Binary Data In R

There are many reasons to work with binary data in R.  Solar resource data, solar PV performance data, and real-time grid monitoring data are typically stored and transmitted in binary data formats.  

In practice, the ability to access binary data in R is impossible in the absence of a vender or format specific “can opener” and a properly configured scientific programming environment.  As a result, many business applications often bypass binary data use altogether or, instead, rely on secondary sources and summary statistics with no ability to validate data integrity and accuracy.  

Posted in Data, Data Science, GDAL, R Data Import | Leave a comment