Week 2: Clean data, messy ethics

This week’s readings introduced several methodological and ethical questions with regards to digital history projects, which I’m interested in exploring more in the coming weeks. Much of this aligns with the concerns of critical librarianship and critical archives work, which (among other things!) address historic and ongoing structural inequities encoded in, and enacted by, cultural heritage institutions. In libraries, this might take the form of challenges to classification and subject headings; more broadly, these critical approaches seek to complicates the ideas of “neutrality” in the acquisition, arrangement, and description of collections, as well as the decades or centuries of ideas and politics that shaped our systems. 

I mention this because it resonates so strongly with Lara Putnam’s concerns about decontextualized research, the encroachment of commercially-inspired search algorithms into library catalogs and databases, and the choices made to collect or digitize and the ensuing “blind spots.”1 Putnam raises critical questions about the larger effect of the reliance on digital sources in transnational historic research, vis-à-vis local histories and knowledge, discovery through “side-glancing,” and of essential friction built into traditional research processes.2 Digital collections effectively decouple archives and sources from place-bound collections, and the growth of text-searchable materials and OCR have expanded the possibilities of researchers to engage questions outside of limited, ahistorical, or arbitrary boundaries of geography. And while the allure and real power of being able to conduct research across multiple archives and disparate sites cannot be discounted, what is lost when these sources are available at a click, from the comforts of a home office?

Putnam’s concern with this ease goes far beyond a longing for “doing things the old way,” and introduces crucial questions about the loss or elision of context, localized knowledge, and critical challenges that can happen with on-the-ground archival research. The openness and scale of digital collections (to which I would include web collections) also obscures or creates a false sense of representation. More sources doesn’t necessarily mean more voices or stories are included, especially when digital collections are built on traditional archives that already tend to prioritize or privilege the elite. “Who wasn’t publishing papers or pamphlets, or wasn’t reading them, or was far from the people who did? […] All stand in the shadows that digital sources cast.”3

I was also taken with Putnam’s discussion of “experiential friction,” the “things [that] happen in archives and libraries and on the way to them.”4. In decoupling an archive from its physical location and histories, a digital collection has the effect of rendering sources and archives as inert, removed from any present-day concerns or politics that would otherwise be visible to an on-site visitor. This “drive-by” transnationalism is summed up bleakly:

It has become much easier for North-based historians to publish about places they have never been and may know very little about.

Lara Putnam, “the transnational and the text-searchable: Digitized Sources and the Shadows They Cast”5

I was thinking of this quote while exploring the Trans-Atlantic Slave Trade Database, and reflecting on my own experiences living and working some of the cities named in those records. The scale and atrocity of the African slave trade is brought into sharp relief through the sheer numbers and details made available through the database, and not enough can be said of its value for ongoing research (not to mention the impact of naming over 90,000 enslaved persons)6 But what is lost when a researcher is encountering these numbers, names, and places outside the place where it happened, as clean and orderly data in a spreadsheet? What happens with the removal of the messy (literal) weight of documents and ledgers, or of doing research in a space named in honor of a pro-slavery politician and white supremacist? The relative ease and access of these sources doesn’t preclude historians or researchers from asking hard questions or addressing gaps in understanding or sources, but can disguise those place-based conditions that can cast a different light on the sources and materials themselves.

Outside these issues of context, Putnam and Jonathan Blaney & Judith Siefring also introduce questions related to the economics of digital collections (or at least to the funding of these projects) that might be carried further. How will the perception of the ubiquity and totality of digital collections and sources affect funding for archival research in universities and granting bodies? How can historians advocate for funding of place-based research when it’s assumed that these sources are all available digitally, or can be digitized by-request for a fraction of the price of travel and study on-site? Blaney and Siefring’s startling research on the non-citation of digital sources complicates this further; if historians ARE primarily using digital archives, why aren’t they citing them?7 Evidence of usage, whether through clicks or citations, is vital to continued development and funding of digital collections. So what happens when the value of physical, place-based research is depreciated by institutions, but the digital counterparts of these archives are under-recognized by the historians grown dependent on that access?  

Moving beyond my comfort zone of the theoretical, this week was also a chance to get into the work of digital history. I was lucky enough to have some previous training on data cleaning with OpenRefine, but coming into it with a more open set of questions pretty quickly revealed my disciplinary inexperience! How does one handle missing spots of data, and how much do your methods have to account for these gaps? Should I be looking at this as an early modernist, and if so, what’s the difference between “Katherine” and “Catherine”? And why are there so many Johns?  

Closing this week out with more questions than answers, as usual!

  1. Lara Putnam, “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast,” American Historical Review 121, no. 2 (April 2016): 389. https://doi-org.mutex.gmu.edu/10.1093/ahr/121.2.377
  2. Putnam 380.
  3. Putnam 391
  4. Putnam 395.
  5. Putnam 397.
  6. David Eltis, “Trans-Atlantic Slave Trade: Understanding the Database,” Slave Voyages: The Trans-Atlantic Slave Trade Database, https://www.slavevoyages.org/voyage/about, accessed Sept. 7, 2020.
  7. Jonathan Blaney and Judith Siefring, “A Culture of non-citation: Assessing the digital impact of British History Online and the Early English Books Online Text Creation Partnership,” Digital Humanities Quarterly 11, no 1 (2017), http://www.digitalhumanities.org/dhq/vol/11/1/000282/000282.html.

← Previous post

Next post →


  1. This quote you use by Putnam, “it has become much easier for North-based historians to publish about places they have never been and may know very little about” reminded me of an argument which has been going on for much of the existence of the ethnomusicology field. We often discuss the insider vs. outsider dichotomy, and how that effects fieldwork and subsequent writing on the subject. Putnam is correct that digital history and initiatives have allowed people to “travel” for their research from their home office, but as others in class pointed out, sometimes we need to be in person at archives to see the scribble, the paper indentation, or to speak with an archivist whose expertise and institutional knowledge can make a huge difference in the research work. My concern is that for historians studying recent history, we cannot rely solely on the internet for our materials, that visiting a site, an archive, and group of people, is a completely different and important research methodology.

  2. Robert Carlock

    I have also considered the consequences of decoupling archives from their local areas. It is a consideration that reflects how historians are taught to approach primary sources. When analyzing a source, it is important to recognize its context: when it was created, who create it, what biases may have influenced its creation. Since archives typically contain holdings relating to their local area (or in the case of national archives, to the nation it resides in) historians usually have to encounter the locality in order to access the archives. Even if historians don’t fully immerse themselves, being able to experience an area provides important context that can alleviate some confusion, or provide insight into a local quirk. We are trained to consider the context for primary sources, and visiting archives forces us to physically encounter a small piece of that context.

    When framed alongside digital history, however, I think visiting archives loses some of its importance. I personally believe sacrificing the experience of the local culture might be worth it for the ability to “side-glance” history on a broader scale, at least for transnational (or even national) history. While micro- and local histories may still benefit from visiting archives directly, being able to create a broader base of evidence through digital access can create a more robust perception for a project.

    It is a complicated question. Should history be an intimate field, where historians should immerse and familiarize themselves with their content to the extent it feels personal? Or should historians aim for creating a more all-encompassing history that attempts to create a more inclusive history? Without time and resources, it seems impossible that it can be both in less than a lifetime, even for a small project.

    Like you said, more questions now than when we started!

  3. I’ll be honest, I’m biased against digital sources! And I believe many who did not grow up reading everything online are. For me, Blaney and Siefring seem to assume something almost nefarious for the reason why online citations are fewer than text sources. I think it echoes nicely with their initial OED/Wikipedia example…online sources are viewed as more unstable. As proof, the graveyard of broken links is enough to deter authors from citing sources that may not be as “permanent” as print. I view them interchangeably, and never thought about someone down the line researching and tracking citations in this way. With the new ways archived resources are becoming more permanent, I’d assume that increases in digital citations will follow.

  4. Like you, I feel like I’m left with more questions than when I began the readings last week. Putnam’s quote that you highlight, “It has become much easier for North-based historians to publish about places they have never been and may know very little about.” really sums up her point well. But still I wonder, how is this ultimately shaping the research we do? I like your comparison of the quote and how it applied to your thinking of these of Trans-Altantic Slave Trade Database. Although a comprehension dataset of information, you pose an important question about what is possibly lost when accessing and looking at this information outside where it happened. What do we as researcher possibly misunderstand?

    I agree with your observations about the dataset we used with OpenRefine. It makes me question the methods other historians use to clean up messy data and how do they address the gaps found in data such as the types of discrepancies we saw in our dataset. What methods are utilized and are they ethical?

  5. What most stood out to me is how you brought up digital collections and the possibility that there will be more of a lack of funding and grants due to the development of them being online and less of a need for the in person study. While digitization is extremely beneficial for getting to see and learn about different collections that you would not have as easy of an access to due to location, it is still crucial to get to go in person and examine something that is a main focus of study. This is a really good point and I wish I had an answer or any insight on how this will continue to develop alongside technology.

    Citing would also be important in this sense of the digital collection. If so many websites and online articles are not being cited when they are the primary source of information it begs the question of how often would digital collections be cited? Would they often be cited as though people were visiting them in person because they look “neater and more reliable”?

    Also, yes! Why are there so many Johns?

Leave a Reply

Your email address will not be published. Required fields are marked *