Table of Contents

Visualization and Data Presentation 2

The second lecture on visualization on October 20th mainly concerned Edward R. Tufte's general theory of graphics display put forth in his book “The Visual Display of Quantitative Information” (VDQI).

You can download the presentation slides here.

Here are some additional resources:

Questions

In the lecture there were two questions I could not respond to adequately, and I said I would answer them on the wiki.

Pulsar plot

One question concerned what was being depicted in the following figure taken from Tufte's VDQI book:

I don't really know anything more than what it says in the caption as I cannot access Hankins and Rickett's 1975 article that Tufte took the figure from. I did look up some information about pulsars (pulsating stars). They emit a beam of electromagnetic radiation at a fairly fixed interval. These pulses have a certain frequency spectrum that you could compare to a certain sound or color, but instead of matter or light, these waves are electromagnetic and at a much higher frequency. I'm not sure what the practical significance is, because the pulsar literature that I found with similar graphs isn't concerned with frequency, but other things like phase and longitude of the pulse. Perhaps I didn't look hard enough, but it is also possible that the field has changed focus since 1975 (which was only 8 years after the discovery of pulsars).

Each line in the graph presumably corresponds to a single pulse, which seems to be the general practice in the field. Apparently they are more interested in the differences between pulses, and less in how the magnitude of a certain frequency changes over time. If we use the color analogy again, this might make sense because the actual (perceivable) color depends on adding the entire frequency spectrum together, so the magnitude of any one frequency might not say much on its own (but this is just speculation on my part). The graph on top shows the average frequency spectrum / profile / “color”, and the graph on the side basically shows how the power of pulses varied over time.

Some resources on pulsars:

Rugplots

Another question asked about concrete examples of rugplots. As Tufte describes them, these consist of multiple dot-dash-plots connected by the dashes. Adjacent plots have one dimension in common and one that is different. This allows juxtaposition of relevant dimensions and comparison of variable pair relations in a way that a small multiple of scatterplots wouldn't. Furthermore, with sparse data, it is possible to track single datums through multiple variable dimensions.

This seemed like a nifty idea to me, but I have only been able to find one article that uses it: Studinger et al., “Determinants of Baroreflex Function in Juvenile End-Stage Renal Disease.” Figure 3 is a rugplot and seems like a natural way to depict stepwise regression, because there should usually be interesting relationships between subsequently added variables. However, I will admit that I don't fully understand from Table 5 why Figure 3 doesn't include more plots…

I will play around a bit with some of my own data sets and see if I can construct an illuminating rugplot.

I should also note that apparently people are also using the term “rugplot” to refer to single dot-dash-plots or even just to their “dash” parts. :-\ Furthermore, it appears that people have also invented completely new chart types that they called rugplots but have nothing to do with Tufte's. Eng & Salustri do cite Tufte and claim relation to his rugplots, but I don't see it. The rugplots by Hyde, Jank & Shmueli are completely unrelated (JSTOR articles are only available if you're on the university network; here is a more accessible tech report). You don't necessarily need to learn about these different “rugplots”, because they are designed for fairly specific applications. If you ever need to design new chart types for your own specific applications though, it can be interesting to see how these people went about doing that.