Just trying out some of the stuff on R programming as I read Machine Learning for Hackers by Drew Conway and John Myles White. The data and source codes can be found at https://github.com/johnmyleswhite/ML_for_Hackers (thanks @ericnovik)

The first chapter on “Using R” was a little too hard to follow for a R beginner like me, so I decided to learn the concepts and terms from chapter 2 (Data Exploration) first and get back to chapter 1 as needed.

Installing packages (the “simpler” / training-wheels version). The command by default generates GUI windows for you to select the CRAN mirror and packages you want to install. Probably will “graduate” to using the parameters proper to install though, the number of packages to scroll though is simply staggering…

> install.packages()

Getting help:

> help(c)

If we’re unsure which package to use, there’s a handy search functionality inbuilt. Say. we don’t know how to install stuff:

> ??install

Generating “test” vectors for testing is simple, just use the c command (combines values into a vector or list).

> c(1:5) [1] 1 2 3 4 5 > c(0:20) [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [19] 18 19 20

Line indentations are more for human readability than for R to recognize properly.

> my.mean<-function(x){ + return(sum(x)/length(x)) + } > my.mean function(x){ return(sum(x)/length(x)) } > my.mean(c(1:3)) [1] 2 > my.mean2 my.mean2 function(x) { return(sum(x) / length(x)) } > my.mean2(c(1:3)) [1] 2

Generating sequences at non-integer intervals:

> seq(0, 1, by = 0.2) [1] 0.0 0.2 0.4 0.6 0.8 1.0

Seems that parameters can be named and mixed around 😉 Of course, it would be better to keep them in their default order.

> help(seq) {...} Typical usages are seq(from, to) seq(from, to, by= ) {...} > seq(0, 1, 0.2) [1] 0.0 0.2 0.4 0.6 0.8 1.0 > seq(0, 1, by = 0.2) [1] 0.0 0.2 0.4 0.6 0.8 1.0 > seq(0, by = 0.2, 1) [1] 0.0 0.2 0.4 0.6 0.8 1.0

First practise with reading in a CSV file and getting the summaries:

> data.file data.file [1] "01_heights_weights_genders.csv" > heights.weights summary(heights.weights) Gender Height Weight Female:5000 Min. :54.26 Min. : 64.7 Male :5000 1st Qu.:63.51 1st Qu.:135.8 Median :66.32 Median :161.2 Mean :66.37 Mean :161.4 3rd Qu.:69.17 3rd Qu.:187.2 Max. :79.00 Max. :270.0 > heights summary(heights) Min. 1st Qu. Median Mean 3rd Qu. Max. 54.26 63.51 66.32 66.37 69.17 79.00

Trying out some visualization with the ggplot2 library:

> library(ggplot2) > ggplot(heights.weights, aes(x=Height)) + geom_histogram(binwidth=1)

Kernel density estimate plot:

> ggplot(heights.weights, aes(x=Height)) + geom_density()

Height KDE separated by gender:

> ggplot(heights.weights, aes(x=Height, fill=Gender)) + geom_density()

Faceted plotting (separate plots) is going to be VERY useful, among the other nicely inbuilt features.

> ggplot(heights.weights, aes(x=Height, fill=Gender)) + geom_density() + facet_grid(Gender ~ .)

Generating a normal distribution curve for plotting:

> ggplot(data.frame(X = rnorm(100000, mean = 0, sd = 1)), aes(x = X)) + geom_density()

## One thought on “Learning some R”