Learning some R #3

Carrying on from last night: I could’t get the separating hyperplane (aka decision line) to draw with this code:

Apparently, the sample codes in git for chapter 2 do work. So what went wrong??!

heights.weights <- transform(heights.weights,
                             Male = ifelse(Gender == 'Male', 1, 0))

logit.model <- glm(Male ~ Weight + Height,
                   data = heights.weights,
                   family = binomial(link = 'logit'))

ggplot(heights.weights, aes(x = Height, y = Weight)) +
  geom_point(aes(color = Gender, alpha = 0.25)) +
  scale_alpha(guide = "none") + 
  scale_color_manual(values = c("Male" = "black", "Female" = "gray")) +
  theme_bw() +
  stat_abline(intercept = -coef(logit.model)[1] / coef(logit.model)[2],
              slope = - coef(logit.model)[3] / coef(logit.model)[2],
              geom = 'abline',
              color = 'black')

Seems that I wrote the logit.model part wrongly. Correcting the order for the Male ~ Weight + Height part did the trick.

heights.weights <- transform(heights.weights,
                             Male = ifelse(Gender == 'Male', 1, 0))

logit.model <- glm(Male ~ Weight + Height,
                   data = heights.weights,
                   family = binomial(link = 'logit'))

ggplot(heights.weights, aes(x = Height, y = Weight, color = Gender)) +
    geom_point() +
    stat_abline(intercept = - coef(logit.model)[1] / coef(logit.model)[2],
                slope = - coef(logit.model)[3] / coef(logit.model)[2],
                geom = 'abline', color = 'black')

Dissecting the transform command a little, how DOES it work?

> heights.weights[1:10,]
   Gender   Height   Weight
1    Male 73.84702 241.8936
2    Male 68.78190 162.3105
3    Male 74.11011 212.7409
4    Male 71.73098 220.0425
5    Male 69.88180 206.3498
6    Male 67.25302 152.2122
7    Male 68.78508 183.9279
8    Male 68.34852 167.9711
9    Male 67.01895 175.9294
10   Male 63.45649 156.3997

> heights.weights <- transform(heights.weights,
+                              Male = ifelse(Gender == 'Male', 1, 0))

> heights.weights[1:10,]
   Gender   Height   Weight Male
1    Male 73.84702 241.8936    1
2    Male 68.78190 162.3105    1
3    Male 74.11011 212.7409    1
4    Male 71.73098 220.0425    1
5    Male 69.88180 206.3498    1
6    Male 67.25302 152.2122    1
7    Male 68.78508 183.9279    1
8    Male 68.34852 167.9711    1
9    Male 67.01895 175.9294    1
10   Male 63.45649 156.3997    1

Ok, so it just created a column (“tag”) which is populated according to the contents of the first column (a factor, or categories).

With this, the glm command (generalized linear models) command makes more sense, though I still have little-to-no idea what the parameters mean even after reading the help pages…

Anyway, onwards to chapter 3, binary classification! We’ll see if there’s a need to learn glm in-depth by myself later.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s