Scales & Display of Data

Scales

When measuring variables… … we have to use some scale to do so. This applies to both independent and dependent variables.
Types of scales Nominal
Ordinal
Interval
Ratio
Different scales require different treatment The main things that are affected are statistics and data display.





Scale Type Description

Nominal Also called “classificatory”, because they classify. Example: Monkey, dog, human.
Ordinal Also called “ranking”. Example: Tall, medium, short.
It includes a “built-in” nominal scale PLUS the measurements can be compared and ordered, e.g. from shortest to longest.
Interval A sequence of measurements, with equal spacing. Example: Degrees Celcius.
It includes Nominal and Ordinal scales “built-in” PLUS a unit of measurement with an (arbitrary) starting and ending point.
Ratio All the properties of an Interval scale PLUS a fixed starting point. Example: Height, salary, weight.





Variables

Dependent variable The (main) variable we want to measure.
Independent variable A non-dependent variable that enters into our measurements/experiment.
Active independent variable Independent variable that we manipulate (for the purposes of the measurement).
Attribute variable Independent variable that we cannot manipulate.
Continuous variable A variable that has continuity in its measurement, e.g. minutes, hours, days.
Categorical variable A variable that is not continuous.





Types of Graphs

Scatterplot A.k.a. Scattergram. Distribution of measurement points in x dimensions (2 is most common).
Line diagram Standard “x-y plot” where a line connects the points.
Bar Chart A set of bars indicate the value of a variable for each value of another variable.
Pie Chart For a total set of data (100%), the % distribution for a dependent variable over a fixed set of (categorical) variable values.
Histogram A set of bars indicate frequency for each value of a (categorical) variable.
Cumulative graph The cumulative frequency of a dependent variable over the values of another variable.





Graphs: When to Use Which

Scatterplot Good for continuous variables, to show the relationship between two variables.
Line Chart Drawing a line between the values indicates relationship between each successive measurement, which implies that the independent variable is Interval or Ordinal.
Bar Chart When the independent variable is categorical; when we want to avoid implying that the order of measurements or independent variable values matters.
Pie Chart Good for showing distribution among a fixed and low number of dependent variable values, e.g. voting for a small number of political parties.
Histogram Can be used for both categorical and continuous variables.
Cumulative graph Good for showing a relationship between counts/events and an Ordinal or Interval variable; e.g. displaying how events are distributed (collect) over time.





Examples





Edward Tufte's Six Grand Principles of Information Display

The First Grand Principle: Enforce Wise Visual Comparisons, i.e., force answers to the question “Compared with What?”
The Second Grand Principle: Show Causality. We are looking at information to understand mechanisms. Policy reasoning is about examining causality. Napoleon was defeated by the winter, not the opposing army, as shown by the temperature scale o n the bottom of Minard's graph.
The Third Grand Principle: The World We Seek to Understand is Multivariate, as Our Displays Should Be. The Minard graph has six dimensions: size of the army, the two dimensional route of the march, the direction of the march, the temperatures and the dates.
The Fourth Grand Principle: Completely Integrate Words, Numbers and Images. Don't let the accidents of the modes of production break up the text, images and data. Just because the artists, technical writers and database people work in differen t buildings doesn't mean reports should be disjoint with text, graphs and images in different boxes or on different pages.
The Fifth (most important) Grand Principle: Most of What Happens in Design Depends upon the Quality, Relevance and Integrity of the Content. To improve a presentation, get better content. If your numbers are boring you have the wrong numbers. Design won't help, it is too late.
Page 18 of Envisioning Information by Edward Tufte shows a book by Galileo published in 1613 which reported the discovery of sunspots and the rings of Saturn for the first time. He wrote in Italian, not Latin, because he wanted to reach a wider audience than the scie ntific elite. His tone of writing is wide eyed, straight-forward, undiplomatic, sardonic and sounds a lot like the modern voice of Richard Feynman. The report of the discovery of sunspots has a simple drawing of the sun on each page to show daily obser vations. From these observations he learned that the sun was rotating as the spots moved across the page and changed apparent shape at the edges due to foreshortening. It is easy to make comparisons between the left hand and right hand pages because the y are within the eye span.
The Sixth Grand Principle: Information for Comparison Should be Put Side by Side, i.e., within the eye span, not stacked in time on subsequent pages, which is known as 'one damn thing after another', and also known as the computer interface. The computer interface is a low-resolution display device compared to paper, so we have a relentless sequentiality. The most common user question after a sequence of computer operations is “Where am I?” The lesson: get the biggest monitor of the highest resolution that you possibly can.

One of Tufte's students scanned Galileo's images and animated them so the sun of 1612 could be seen to rotate. At a couple points in the annimation the images skip forward because there was missing data due to clouds, or Galileo taking a day off.

A Jesuit rival of Galileo republished the sunspot data (see p17 of Envisioning Information). He used the single most effective tool of information design, the small multiple, which puts all 38 images within the eye span.





EOF