- Subscribe via RSS
Word Cloud LinksAnimation (3) CSP (3) Data (22) Data Science (6) Distributions (1) Dust (7) Economics (15) Engineering (11) Equipment Vendors (3) Faster R (2) GDAL (5) Germany (1) ggplot2 (20) GIS (5) Irradiance (17) Kuwait (1) LaTeX (11) Linux (2) Meteorology (16) Misc Tricks (3) Modeling (9) Natural Gas (2) Nuclear (1) O&M (2) Projects (5) Project Valuations (6) Qatar RE (13) R Colors (9) R Data Import (6) R Data Objects (16) R Data Syntax (6) Renewable Energy (14) RE Policy (7) Resource Assessment (19) R Graphics (19) R Packages (3) R Programming (22) Saudi Arabia (3) Scientific Computing (1) Solar (39) Spatial Analysis (6) Storage (3) UAE (4) Ubuntu (1) Website (4) WECC (3) Wind (5)
Category Archives: R Programming
A new method to extract data tables from PDF files is introduced. Most of the data scraping tools available are browser-based. The common tools are also manual in nature and limited to one table at a time. A solution is outlined to extract multiple tables at once. The solution combines the R programming language with the open-source Java program Tabula. The result is a convenient method that transforms documents into databases.
The ability to train a machine to extract data tables from PDF files has several benefits:
The popularity of R is rapidly increasing and is well on its way to being a top 10 programming language. The TIOBE index is a standard indicator of the popularity of all programming languages. The TIOBE index confirms that a subset of languages – those for computational statistics and data analysis – are gaining increased attention. The clear winner of the pack is the open source programming language R.
Source code access is one of the great benefits of R. Source code is available for base R and over 5,000 open source packages. There are many reasons to view source code: to know what software does when documentation is vague or incomplete; to combine code objects in custom scripts or libraries; and to change source code as needed. The following post defines the different types of R source code available and how to access R sources.
The standard way to read text files into R is to use the read.table() command. However, many users struggle with time delays when loading large data sets. An alternative command that offers significant speed improvements is fread(), or fast read, which can found in the data.table package. The following code loads a tab delimited file with a million elements and reveals that fread() reduces load time by almost 99%, as confirmed by the benchmark performance stats at left. The function is still under development, but it is available for download and doesn’t suffer from stability issues. Instead, expect argument structure and command syntax to change over time.
Beamer is a document class that is by far the most practical tool for making presentations involving data science, business analytics, or general research. It is widely used in most conferences and easily lends itself to data intensive reporting and repetitive batch processing.
A custom beamer template is presented that is easy to extend or modify. The benefits of the beamer document are numerous:
The standard function for correlation plots in R is pairs(), which generates a matrix of scatter plots based on all pairwise combinations of variables in a data object. The standard graph looks something like this after a little color enhancement:” Click to enlarge
The code behind this plot is simple:
main = "Anderson's Iris Data",
pch = 21,
bg = c("red", "green2", "steelblue4")[unclass(iris$Species)])
Pretty R is an online tool and r syntax highlighter that transforms R source code into HTLM code for website development. The result is easy to read R code for high quality web presentations. The Pretty R webpage is a good learning tool as it provides the HTML code details required to deliver syntax highlighting that complies with R documentation from inside-r.org.