See Myschool for original script (uncorrected) The following R code should be copied task by task (i.e. comment and code) and exectued in R. Its purpose is to show a simple prototype for an action, you should think about it, understand and try some variations. # T-701-REM 4 # 2007-10-01 # Subjects from ch. 4.1, 4.3, 4.6 in ISwr. # limit "haed" data to men # suppose the workspace "haed.R" has been loaded men <- subset(haed, kyn=="karlar") str(men) attach(men) mean(cm) sd(cm) length(cm) # Do we need to change our established opinion (null hypothesis, "H0") # that average height of men is 180cm? t.test(cm, mu=180) # Computation of results element by element: # Standard error of the mean is the standard deviation of the sample distribution # of the mean (called SEM on pg. 82 in ISwR) # Assume here that the Central Limit Theorem can be used stderr <- sd(cm) / sqrt(length(cm)) stderr # The value t is a measure of how far the sample is from the established opinion tvalue <- (mean(cm)-180) / stderr tvalue # Is this a big value in this context? # The p-value is the probabililty of a sample this far or further from H0 if H0 is correct # Low p-value (less than conf.level with a default value of 0.05) # indicates that H0 is wrong or we have a freak sample # Compute p-value corresponding to this t for df =38-1 2 * (1-pt(tvalue, 37)) # Confidence interval for the mean c( mean(cm)+qt(0.025, 37)*stderr, mean(cm)+qt(0.975, 37)*stderr) # # Try several variations of conf.level, # including 1-0.2659205 (Note: 1 minus the p-value, corrected 2007-10-06) # ############################ # Compare heights of men and women (to see how t.test works) detach(men) rm(stderr, tvalue) # some cleanup (added 2007-10-06) attach(haed) # Test if the diff is 0 (reason for the name Null Hypothesis!) t.test(cm~kyn) # Conclusion? # Test if cm over 100 is a reasonable approximation to kg: t.test(kg, cm-100, paired=T) # Conclusion? ###################################### # # Another way of comparing group means (can be used with more than 2 grps) # ANOVA, ISwR, Ch. 6 (6.1 intro, 6.1.2, 6.5) # compute a linear model. # Use a categorical explanatory variable: lm(cm~kyn) summary (lm(cm~kyn)) anova(lm(cm~kyn)) # Compare "Mean Sq" of "Residuals" with var(cm) # Compare total of "Sum Sq" column formula on middle of pg. 112 and with sum((cm-mean(cm))^2) var(cm)*51 # Conclusion? ###################################### # # The classical "best line" is a linear model too: lm (kg~cm) summary(lm(kg~cm)) anova(lm(kg~cm)) # One reason for the limited standard output of lm: plot(cm, kg) abline(lm(kg~cm)) # Exercise: # Try to find out if the point (mean(cm), mean(kg)) lies on the line