Strangers in Paris: census maps, layering, and data compression

Posted on 04/20/2011 by


I have been working on a blog series on census data visualizations for another site (which will be reproduced here when it’s all up). First of all, I am using that as an excuse to justify how little I have been posting in the last couple of weeks. However, there is also something I could maybe comment on. Through these posts, I have noticed that most of the good visualizations and maps of census data tend to display multiple variables on one space.
This is kind of an obvious point for census maps, but it is not entirely trivial. Take The Atlantic’s interactive map ‘The 12 States of America’ which shows how median income has changed over the last 30 years. They first display categorical information about the general demographics of an area (‘White and rural; sparse populations; an economic base of farming and agribusiness’ is termed ‘Tractor Country’) and layer over this information how the incomes of individuals in these areas have changed.

Now, layering is an important part of almost all spatial statistics. Where does it come from?
An early example of displaying multiple variables over a single spatial area is a 1896 print by Jacques Bertillon (1851-1922).

(Image via Infovis)

An epidemiologist, demographer, and statistician (he also created the precursor to today’s International Statistical Classification of Diseases), he used rectangle areas on a map to display both the population of neighborhoods in Paris, percent foreign population in those areas, and the absolute number of foreigners.
So how do we get from Bertillon to The Atlantic’s map? First of all, color helps, as does the interactive nature of the map. What is particularly useful, however, is how ‘The 12 States’ compresses information about various demographic features of an area (including race, population density, etc.) by grouping census areas by those features, and then layers economic information over it.
In the end, then, I think we should pay attention to how well we are able to compress information in order to display more with less. Yet this is not always as easy as making your graphics more interactive.
I recently read a blog post by Drew Conway on ‘Five Rules for Data Visualization.’ His first rule in particular stood out to me: “The viz must be able to stand alone.”His example comes from the difference in tipping between men and women:
As he writes,

Why is the chart on the right better? First, it has more explanatory value. By splitting the data into two parts we are able to see the x-axis shift for men, i.e., in general they are tipping on higher bills. Also, we are able to use color in a more valuable way; rather than using it to distinguish between sex we can use it to highlight outliers and note general trends. Next, by reducing the amount of data in each plot the information is conveyed more efficiently. Finally, it achieves our ultimate goal, which is always to provide more answers than questions.

When we are layering large amounts of information with multiple variables, we should keep in mind this lesson. Census maps rarely employ the technique of displaying multiple images next to each other, and even less frequently reduce the amount of data in order to make a conclusion.
Now, although I have been advocating for the inclusion of more information in these kinds of graphics, to me I think the point is that one can reduce the amount of information that is displayed with the same visual unit of analysis by instead using a different visual unit (like ‘Tractor Country’ and other categories) that is more data-compressed—and this is more or less what Bertillon is doing with the layering on his map as well.

Bertillon, J. (1896). Cours élémentaire de Statistique administrative. Paris: Société d’Editions
Scientifiques. (map).