A “New Data Epistemology”

Posted on 05/17/2015 by


Does visualizing data change the way you think about that data?  I believe it does.  But so does what data we are collecting in the first place.  This will be a lengthy post.  But stick with it.  In the end, I hope to propose what I term the “new data epistemology.”  It is something I am still conceiving, an idea that is still inchoate, but I want to make this blog into a place to explore these thoughts.  So let’s get started.

Data enters our brain through the eyes; once it travels through the optic nerve what our brain does with that data, where it is sent and how it is processed, is different depending on how it is presented.  This changes our perception and how we act on that data.  In this way, how we show data fundamentally changes the meaning of the data itself.  This has been a focus of my blog in the past.

Yet it now makes me think of Harvard’s Strategic Data Project.  Part of its mission is to, “bring high-quality research methods and data analysis to bear on strategic management and policy decisions, [and] to transform the use of data in education to improve student achievement.”  As a public school teacher and a researcher, I am always cautious of any ‘data-driven improvements,’ and many of my colleagues could relate to this.  Ultimately, I will conclude this post by arguing that what is problematic is that by privileging a certain kind of data, visualized in a certain kind of way, it entrenches a deficit-based mindset that is both unjust and counterproductive to making meaningful school improvements.

sdplogobgIt is a really nice logo, though.

But for now, what interests me is a little more subtle.  While this mission seems to rely on how the data is used, what it does not address directly the implications of what data is collected and the ways in which it is presented.  This can be generally stated as a problem with the epistemology (versus the ontology) of data.

But let’s first get to what epistemology means in this context.  More generally, when we talk about epistemology it refers to how knowledge is obtained, or more broadly what the implications are of obtaining knowledge in one way instead of another.  Ontology, on the other hand, is concerned directly with the truth itself—what data exists, rather than how you know it.  So we could say that ontology is concerned with what is the truth and epistemology is concerned with the methods of figuring out that truth.  This, of course, ignores the larger philosophical taxonomy of the study of knowledge, but that is not really what I am looking at here.  What I see is that this notion of strategic data is conceiving the term ‘strategic’ in a very narrow sense: it attempts to use traditional data to influence policy strategically.  This is alright, I suppose, and it is a cause that I may have supported in the past.  But what I think is more interesting is when we stray from the ontological to the epistemological.  Instead of gathering the same data and analyzing it a new way, what about looking at what data we are collecting and how, exactly we are analyzing it?  What does this say about our notions of truth in the first place?

To get back down from the abstract, here, let’s talk about student achievement data.  This is a major focus of the Harvard program.  It has identified numerous gaps in achievement between low- and high-income students, between racial groups, and so on.  It has demonstrated that the experiences of low-income students make them less prepared to graduate, to go to college, to succeed, etc.  Method after method of analysis used to analyze that data reveals a pattern of deficiencies among those students, some more finely-grained than others, but all pointing to that ‘gap.’

But to me, this says less about gaps or achievement or whatnot, but rather more about the basic premise that researchers are parameterizing.  Researchers across the country focus on the premise that students are deficient in something and that policies are needed to fix that deficiency.  Where are the quantitative studies of student’s strength, resilience, and forms of knowledge not valued by the dominant culture?  Where are the hypotheses that ask not how the students are failing the system, but in precisely what ways the system is failing to leverage the assets of students?  And most importantly, where are the actual voices of those students, studying knowledge on their terms? [1]

Ultimately, taken together, what this means that by valuing one research method, one unit of analysis, one analytical lens, over another, you are prioritizing the strategic implementation of data that is predicated on flawed assumptions.  This is the classic danger of parameterization in social sciences—no matter how well-constructed your statistics may be, if they are measuring the wrong things their conclusions might be precise but they are not valid.  And this is exactly what is going on in quantitative education research.  Parametrizing deficit-based thinking through statistics obfuscates the larger implications of what data we privilege in the first place.

OK, so let’s get back to the Harvard statement.  What I want to take note of is the difference between three things:

(1) applying and using new data collection and analytics methods in news ways,

(2) showing that data in new ways, and

(3) collecting and visualizing a new kind of data in a new kind of way.

The Harvard program does the first.  Many others do the second—the rise in Big Data startups is astonishing, and we should take note of how many emphasize visualization as a method of compressing their data.  What we need is a program that not only constructs new data, but strategically visualizes that new data.  A data based on phenomenological, ethnographic, experiences of lived people and experiences.  A data based on the real knowledges of students.  And as a corollary, what we also need is a new type of visualization, that does what visualizations should—compress data into patterns more effectively transmitted to the human brain—but that also serves to strengthen the meaning of the data.

I guess I should maybe rephrase my initial statement here about Harvard—every program strategically visualizes data, but few actually scientifically approach the epistemology of that visualization.  The way our brain processes data will influence our actions, and so the data we choose and the way we show it should be equally valued.  When we keep looking at standardized test data—over and over again, with new statistical tools and new policy recommendations—we lose something more than just our time.  We lose the perspective of participants in the system that produces those scores, and we lose the strategic power of presenting the data of their experiences in new ways.

I’m not sure what I want to call this.  You know, at one point I bought in to this idea that big data, new data visualization, whatever you want to call it—that it was somehow revolutionary.  I looked at the historical, social, and institutional features of big data in this blog, and I thought that the ways in which we could present data could change our world.  And they can—but by doing so they are changing it through aggressively maintaining the status quo.

This makes me think of the show Silicon Valley, a satire of its eponymous location.  On the show, startups talk about ‘changing the world’ so frequently that it becomes a running gag.  I recently read an interview in the Washington Post with Catherine Rampell, who covers Economics for the New York Times in which she is asked whether Silicon Valley can actually change the world.  Her response is reproduced below:

Catherine: I think there’s a big difference between “changing the world” and “social justice.” Richard’s fictional file compression algorithm might indeed be capable of dramatically transforming the ways businesses operate and consumers spend their money. Those changes in turn could lead to major gains in economic growth, probably mostly enjoyed by rich countries like the United States, but likely with some spillover into developing countries too…But that’s not the same thing as saying Silicon Valley companies are focused on improving the lives of the poor, or helping liberate the oppressed, or eradicating disease or whatever. And indeed the tech industry, like the finance industry, has been criticized for luring the country’s best and brightest away from such worthy goals with big VC infusions offered in exchange for creating the next generation of sexting apps (which again, could change lots of people’s lives, but you’d be hard-pressed to argue that social welfare has improved as a result).

This hits the nail on the head.  Big data may change the world—but so does building a house or cutting down a tree or starting your car or eating cake and so on.  Some may be bigger than others, but they all change the world.  What matters is not that we change it, but how we change it.  And I think, like a lot of young people growing up with the huge explosion of technology, I made a mistake, in a significant way.  Big data is not just another type of data—I think it is important to recognize that the amount of data available to us today has already changed our everyday lives in ways that we have not begun to fully comprehend.  It has changed the world.  And I think by now we know it has another, darker, side: the loss of privacy and of growing corporate control of the internet.  But there is another, more subtle loss here that I had not really comprehended.

By pronouncing big data as revolutionary, it precludes the notion that it might actually be regressive.  Looking simply at the Harvard Strategic Data Fellowship, fellows are required to be, “Entrepreneurial change agents.”  I think the evangelical presumption that we should collect more data is a deeply conservative notion—it is predicated on the assumption that the data we are collecting is the right data in the first place, but if only we had more of it, we could make real changes.  What if we need a new kind of data?  And a new way of visualizing that new kind of data?

This is what I am proposing.  Data is not collected “for” or “against” social justice.  But it is collected and presented with presumptions about how our society works, and has implications for equity in our society.  To ignore the effects of big data and pronounce them somehow ‘socially sterile’ would be to ignore the reality that every teacher can see. A “new data epistemology” would focus on an awareness of the sources, methods, conclusions, and visualizations of data in an attempt to understand how the construction of those elements both reflects and changes current social structures.

It is social science, data science, social justice, graphic design…it is all thrown in together, and I’m not sure what it looks like yet, but I am sure we need it, and we need it now more than ever to counter the subtle yet insidious ‘strategizing’ of deficit-based data.




1. I should acknowledge the many studies from individuals throughout the country that are looking at this very thing.  I am thankful to be in a graduate program that has exposed me to numerous alternate narratives and approaches to school research; what bothers me is that these are not part of the process of ‘strategizing’ data and thus few, if any, large institutions are pushing to act on their findings.