Visualization vs. data mining

Posted on 02/02/2011 by


I recently read a post on Fell in love with data by Enrico Bertini, a researcher at U. Konstanz in Germany.  He makes the excellent point that while data mining and data visualization have historically been at odds, they can no longer stay separated due to the demands of the data-rich society we are creating today.

I think it’s no mystery that in some way or another visualization and data mining have always been, and still are, somewhat in competition. The way I see it is that from the one hand dataminers see visualization as a too soft discipline, lacking of enough formalism and with the big original sin of having very poor evaluation methods in its toolbox. From the other hand visualizers think data mining is too rigid and narrowly focussed on a plethora of insignificant small deltas to algorithms that nobody will ever understand.

Among other things, he suggests that data mining is simply too opaque, relying on black boxes and the ‘voodoo science’ of parameter-setting (in my experience, algorithms seem to be chosen at random until something fits the data. Researchers then work backwards to explain the results).  Visualization…well, it needs data mining techniques because there is just so much data.

(The amount of data storage, from The Economist)

I think the social sciences as a whole need to take heed of this discussion.  Some sociologists routinely look at large datasets culled from even larger surveys, while others design fairly functional images of social networks in order to demonstrate the complexity of subjects.  The two never meet out in the open.  Looking back at some more famous papers in large-N sociology and political science, there is a noticeable absence of visual representation, while image-heavy papers (admittedly, there are not as many) skimp on data presentation.

(A fairly typical data table from a fairly typical International Political Economy paper, “The Globalization of Liberalization: Policy Diffusion in the International Political Economy“)

I think some of my frustration comes from being exposed to a fair number of very aesthetically-pleasing images from physics and CS people, who are trained to visualize data as a method of simplifying representation of their subjects.  Shouldn’t this be a part of social sciences training?  Are there good data visualization of large-N studies in social sciences that I just don’t know about?