Learning some R

Just trying out some of the stuff on R programming as I read Machine Learning for Hackers by Drew Conway and John Myles White. The data and source codes can be found at https://github.com/johnmyleswhite/ML_for_Hackers (thanks @ericnovik)

The first chapter on “Using R” was a little too hard to follow for a R beginner like me, so I decided to learn the concepts and terms from chapter 2 (Data Exploration) first and get back to chapter 1 as needed.

Installing packages (the “simpler” / training-wheels version).  The command by default generates GUI windows for you to select the CRAN mirror and packages you want to install. Probably will “graduate” to using the parameters proper to install though, the number of packages to scroll though is simply staggering…

> install.packages()

Getting help:

> help(c)

If we’re unsure which package to use, there’s a handy search functionality inbuilt. Say. we don’t know how to install stuff:

> ??install

Generating “test” vectors for testing is simple, just use the c command (combines values into a vector or list).

> c(1:5)
[1] 1 2 3 4 5
> c(0:20)
  [1]   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
 [19]  18  19  20

Line indentations are more for human readability than for R to recognize properly.

> my.mean<-function(x){ + return(sum(x)/length(x)) + } > my.mean
function(x){
return(sum(x)/length(x))
}
> my.mean(c(1:3))
[1] 2
> my.mean2  my.mean2
function(x) {
  return(sum(x) / length(x))
}
> my.mean2(c(1:3))
[1] 2

Generating sequences at non-integer intervals:

> seq(0, 1, by = 0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0

Seems that parameters can be named and mixed around 😉 Of course, it would be better to keep them in their default order.

> help(seq)
{...}
     Typical usages are

     seq(from, to)
     seq(from, to, by= )
{...}
> seq(0, 1, 0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> seq(0, 1, by = 0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0
> seq(0, by = 0.2, 1)
[1] 0.0 0.2 0.4 0.6 0.8 1.0

First practise with reading in a CSV file and getting the summaries:

> data.file  data.file
[1] "01_heights_weights_genders.csv"
> heights.weights  summary(heights.weights)
    Gender         Height          Weight
 Female:5000   Min.   :54.26   Min.   : 64.7
 Male  :5000   1st Qu.:63.51   1st Qu.:135.8
               Median :66.32   Median :161.2
               Mean   :66.37   Mean   :161.4
               3rd Qu.:69.17   3rd Qu.:187.2
               Max.   :79.00   Max.   :270.0
> heights  summary(heights)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  54.26   63.51   66.32   66.37   69.17   79.00

Trying out some visualization with the ggplot2 library:

> library(ggplot2)
> ggplot(heights.weights, aes(x=Height)) + geom_histogram(binwidth=1)

Kernel density estimate plot:

> ggplot(heights.weights, aes(x=Height)) + geom_density()

Height KDE separated by gender:

> ggplot(heights.weights, aes(x=Height, fill=Gender)) + geom_density()

Faceted plotting (separate plots) is going to be VERY useful, among the other nicely inbuilt features.

> ggplot(heights.weights, aes(x=Height, fill=Gender)) + geom_density() + facet_grid(Gender ~ .)

Generating a normal distribution curve for plotting:

> ggplot(data.frame(X = rnorm(100000, mean = 0, sd = 1)), aes(x = X)) + geom_density()

Advertisements

One thought on “Learning some R”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s