Center for Analysis and Design of Intelligent Agents

See Myschool for original script (uncorrected)

The following R code should be copied task by task (i.e. comment and code) and exectued in R. Its purpose is to show a simple prototype for an action, you should think about it, understand and try some variations.

# T-701-REM 4 
# 2007-10-01
# Subjects from ch. 4.1, 4.3, 4.6 in ISwr.
# limit "haed" data to men
# suppose the workspace "haed.R" has been loaded
men <- subset(haed, kyn=="karlar")
str(men)
attach(men)
mean(cm)
sd(cm)
length(cm)
 
# Do we need to change our established opinion (null hypothesis, "H0")
# that average height of men is 180cm?
t.test(cm, mu=180)
 
# Computation of results element by element:
# Standard error of the mean is the standard deviation of the sample distribution
# of the mean (called SEM on pg. 82 in ISwR)
# Assume here that the Central Limit Theorem can be used 
stderr <- sd(cm) / sqrt(length(cm))
stderr
# The value t is a measure of how far the sample is from the established opinion 
tvalue <- (mean(cm)-180) / stderr
tvalue
# Is this a big value in this context?
# The p-value is the probabililty of a sample this far or further from H0 if H0 is correct
# Low p-value (less than conf.level with a default value of 0.05)
# indicates that H0 is wrong or we have a freak sample
# Compute p-value corresponding to this t for df =38-1
2 * (1-pt(tvalue, 37))
# Confidence interval for the mean
c( mean(cm)+qt(0.025, 37)*stderr, mean(cm)+qt(0.975, 37)*stderr)
# 
# Try several variations of conf.level, 
# including 1-0.2659205 (Note: 1 minus the p-value, corrected 2007-10-06)
# ############################
 
# Compare heights of men and women (to see how t.test works)
detach(men)
rm(stderr, tvalue) # some cleanup (added 2007-10-06)
attach(haed)
# Test if the diff is 0 (reason for the name Null Hypothesis!)
t.test(cm~kyn)
# Conclusion?
 
# Test if cm over 100 is a reasonable approximation to kg:
t.test(kg, cm-100, paired=T)
# Conclusion?
 
 
######################################
#
# Another way of comparing group means (can be used with more than 2 grps)
# ANOVA, ISwR, Ch. 6 (6.1 intro, 6.1.2, 6.5)
# compute a linear model.
# Use a categorical explanatory variable:
lm(cm~kyn)
summary (lm(cm~kyn))
anova(lm(cm~kyn))
# Compare "Mean Sq" of "Residuals" with
var(cm)
# Compare total of "Sum Sq" column formula on middle of pg. 112 and with
sum((cm-mean(cm))^2)
var(cm)*51
# Conclusion?
 
######################################
#
# The classical "best line" is a linear model too:
lm (kg~cm)
summary(lm(kg~cm))
anova(lm(kg~cm))
# One reason for the limited standard output of lm:
plot(cm, kg)
abline(lm(kg~cm))
 
# Exercise:
# Try to find out if the point (mean(cm), mean(kg)) lies on the line