# Center for Analysis and Design of Intelligent Agents

### Site Tools

public:rem4:rem4-20:design_of_comparative_experiments_ii

#### Concepts

 Independent variables We select their values - the values are known when we start an experiment. Any independent variable must have at least 2 levels (values), so its effect can be evaluated. Levels Relating to an independent variable: The number of levels of an independent variable is equal to the number of variations of that variable used in an experiment. Dependent variables Values are measured during and/or after the experiment. Sample: subject selection from a “population” A representative subset, drawn from a population, of the phenomenon we are studying. Examples: a. Siggi, Maggi and Biggi representing human males. b. 10 lakes representing all freshwater on the Earth's surface. c. rust on bottom of doors representing the overall state of an automobile. A sample should be randomly chosen to (1) minimize spurious correlations and thus (2) maximize the generalizability of the results of measuring only a small subset of the phenomenon. Spurious correlation “false” correlation - correlation that implies a connection between things measured, where there is no causal relationship between them, in and of themselves. Between-subjects design If our control group in an experiment contains different instances than the experimental group. Within-subjects design When the instances in our experimental group serve as their own control group. Internal validity How likely is it that the independent variables caused the dependent variables? External validity How likely is it that the results generalize to other instances of the phenomenon under study? Type I Error Falsely rejecting the null hypothesis. The null-hypothesis states that the difference in the variation in the dependent variable(s) between levels of the independent variable(s) is not due to the independent variables. Falsely rejecting the null-hypothesis means that you thought there was an “effect” - your manipulations made a difference - when in fact they didn't. Type II Error Falsely accepting the null hypothesis. The null-hypothesis states that the difference in the variation in the dependent variable(s) between levels of the independent variable(s) is not due to the independent variables. Falsely accepting the null-hypothesis means that you thought there was no “effect” - your manipulations had no effect - when in fact they did.

#### True Experimental Designs: Procedure

 Identify the phenomenon to study Characterize it in ways that make it easy to study. Ask the right question(s) “A question well asked is a question half-answered.” Identify variables that matter Independent and dependent. Choose experimental design Based on the nature of the experiement, but some flexibility with regards to how detailed/reliable/etc. the results should be. Design the setup Identify all factors that could potentially confound your results. Execute the experiment Double-blind procedure: The experimenter does not know which group a subject belongs to and/or which level of an independent variable is being tested. Collect the data Use tables, graphs, as appropriate - very important to choose right presentation method. Apply statistical tests Make sure you select the right statistical test based on your design and your knowledge of the relationship between your sample and your population, and the distribution and means of the population that the sample is drawn from. Draw conclusions from statistical tests Use inference, based on probabilities, statistical significance. Write up the report

#### Some Statistical Methods for Experimental Designs: What to Use When

 Selecting between hypotheses Statistical tests help you figure out whether the difference (in means and distribution) observed in a dependent variable (as measured between two samples) is large enough to indicate a non-coincidence. To make this judgement, the “natural” variation in each group is used as a “baseline”. Significance level is a measure that tells you how non-coincidental you want your measure to be, to be considered as “significant”. p<0.05 and p<0.01 are most common (less than 5%, 1% probability of the result being random). What you study What you use Two factors varying along a continuum Correlation/regression measures Two factors, where independent variable has (or can have) a few discrete values t-test One dependent variable, multiple independent variables, each with two or more levels ANOVA - Analysis of variance Many dependent variables, many independent variables MANOVA (multiple analysis of variance) REF for M/ANOVA https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/hypothesis-testing/anova/

#### t-test

 A fairly robust test for simple comparison experiments Assumptions about population means and distributions can be violated without too much trouble. Sample size Good for small sample sizes Paired t-test Used for within-subjects designs Standard t-test For between-subjects designs One-tailed t-test If your hypothesis specifies in which direction your dependent variable will differ from the comparative (neutral) condition. Two-tailed t-test If your hypothesis only says that your dependent variable will be affected, but does NOT specify how.

#### Example of an Experiment: Fish

 Theory Temperature has an effect on cell growth of animals. This goes for fish as well. Motivation If we can find evidence for this we might be able to grow larger fish in captivity; larger fish means fewer people starving (or more revenue - or both). Fishing further South might be better for everyone, even those living in the North. Hypothesis That size of fish varies with ocean temperature. Experiment Comparing the size of fish in the Atlantic Ocean by taking a sample from various latitudes. Argument: Temperature falls the further North one goes; thus, fish at higher latitudes should be smaller. Sample 100 fish south of Iceland. 100 fish north of Iceland. Dependent variable Size of fish (continuous). Independent variable Latitude (two levels - South and North). Statistics Linear regression.

#### Example of an Experiment: Routers

 Theory Congestion on networks gets worse the smaller “visibility horizon” each node in a network has about traffic on other adjacent nodes. : Information about traffic, including past, present, and predicted. Motivation Knowing whether nodes from router manufacturer X or Y are a better purchase might be decided by looking at their implemented routing methods. Knowing how to set parameters on already-purchased routing nodes might be put on a more scientific ground. Experiment Comparing routers from ZYX and Cis. The former advertise their routers to be “network-aware” whereas the latter brag about being “perfect for P2P networks” because each node doesn't need to know anything about the rest of the network. Hypothesis Routers from ZYX will perform better at handling congestion than routers from Cis. Independent variables 1. Router type. 2. Traffic. 3. Network size. Dependent variables 1. Congestion. 2. Congestion recovery. 3. Routing efficiency. Statistics MANOVA

#### Linear Models: Regression Analysis

 Purpose of Regression Analysis Discover a function that allows prediction of the values of dependent variable y based on values of independent variable x Scatterplot Shows the distribution of y-values for given (sampled) x-values First-order linear function Y = A + bX Provides us with a single, straight line that gets as close to all the points in the scatterplot as possible (given that it is straight) Residual For each x,y point, the distance to the line How do we find the line? Least Squares Criterion: We select the linear function that will yield the smallest sum of squared residuals

#### Linear Correlation

 Given a linear function Given an X-score, the predicted Y-score is given by the line. However, in reality the Y-score rarely falls straight on the line. Need estimate of error We must estimate how closely real Ys (Y) follow the predicted Ys (Y') The measure most commonly used Standard Error of Estimate Formula for Std. Err. of Est. https://www.youtube.com/watch?v=r-txC-dpI-E (walk-through video) What it tells us How far, on average, real Ys fall from the line The smaller the Std. Err. of Est. is … … the better a predictor the line is Main limitation of linear models Assumes – apriori! – a linear relationship

EOF