- Subscribe via RSS
Word Cloud LinksAnimation (3) CSP (3) Data (22) Data Science (6) Distributions (1) Dust (7) Economics (15) Engineering (11) Equipment Vendors (3) Faster R (2) GDAL (5) Germany (1) ggplot2 (20) GIS (5) Irradiance (17) Kuwait (1) LaTeX (11) Linux (2) Meteorology (16) Misc Tricks (3) Modeling (9) Natural Gas (2) Nuclear (1) O&M (2) Projects (5) Project Valuations (6) Qatar RE (13) R Colors (9) R Data Import (6) R Data Objects (16) R Data Syntax (6) Renewable Energy (14) RE Policy (7) Resource Assessment (19) R Graphics (19) R Packages (3) R Programming (22) Saudi Arabia (3) Scientific Computing (1) Solar (39) Spatial Analysis (6) Storage (3) UAE (4) Ubuntu (1) Website (4) WECC (3) Wind (5)
Category Archives: Data Science
Robert Hyndman is the author of the forecast package in R. I’ve been using the package for long-term time series forecasts. The package comes with some built in methods for plotting forecast data objects in R that Ive wanted to customize for improved clarity and presentation. The following article achieves that goal and shares two scripts for plotting forecast data objects using ggplot.
The linear model is the most widely used data science tools and one of the most important. In addition, there is another basic tool known as the nearest neighbor method (NN). Both models can be used to go beyond prediction for classification. Feature classes are used by machines to recognize faces within a crowd, to “read” road signs by distinguishing one letter from another, and to set voter registration districts by separating population groups. This article applies and compares both classification methods
Best subset regression is an technique for model building and variable selection. The method looks at all combinations of independent predictor variables for use in a multiple regression model. Model developers and analysts will often struggle with variable selection, especially when the number of predictors is high. Ideally, each set of predictors is run and the best set is selected using a criteria for model performance. The following article provides custom functions for best subset selection that are fast and easy to use.
The popularity of R is rapidly increasing and is well on its way to being a top 10 programming language. The TIOBE index is a standard indicator of the popularity of all programming languages. The TIOBE index confirms that a subset of languages – those for computational statistics and data analysis – are gaining increased attention. The clear winner of the pack is the open source programming language R.
There are many reasons to work with binary data in R. Solar resource data, solar PV performance data, and real-time grid monitoring data are typically stored and transmitted in binary data formats.
In practice, the ability to access binary data in R is impossible in the absence of a vender or format specific “can opener” and a properly configured scientific programming environment. As a result, many business applications often bypass binary data use altogether or, instead, rely on secondary sources and summary statistics with no ability to validate data integrity and accuracy.
The standard function for correlation plots in R is pairs(), which generates a matrix of scatter plots based on all pairwise combinations of variables in a data object. The standard graph looks something like this after a little color enhancement:” Click to enlarge
The code behind this plot is simple:
main = "Anderson's Iris Data",
pch = 21,
bg = c("red", "green2", "steelblue4")[unclass(iris$Species)])