Week 10: text encoding and the historiographer

“It’s not a programming language. You’re not trying to learn C++ in a week. It’s basically XML, right? You remember that from library school, don’t you? They definitely talked about it in your digital preservation class. You should remember this.”

Reader, I did not.

For this week’s activity, I decided to try and mark up a page using the TEI, as a personal challenge and also to consider how the TEI might work for documents with more embedded and referenced images. However, even “TEI Lite” was a bit of a beast, even when it seemed to be starting from a place of minimal assumed knowledge on the user’s part. Technical language is inescapable here, but I found that these tutorials (not unlike one of the articles we read this week) tended to over-explain some terms and ideas while leaving others vague or as an assumed kind of knowledge. I could feel my control over this starting to slip in the “Names, Dates, and Numbers” section as soon as referencing strings versus regular elements were introduced. Some of this is familiarity (or a lack of) with the needs of digital manuscript access or analysis, and the language used within TEI to distinguish the many, many elements within a document. So far, my professional work and research experience haven’t involved much work with original manuscripts or their digital surrogates (marked up or not). In principle, the ideas make sense and have clear and crucial uses, attested to by the TEI’s longevity, adoption, and volume of contributors and updates as much as individual projects like Transcribe Bentham.1 When crossed with a less-than-stellar understanding of how code attributes work in the guidelines, though, I felt even more adrift.

Given that, I chose just a single page from the art guide book Art Treasures of America, published in 1880 by the art historian and critic Earl Shinn.2 This three-volume publication was Shinn’s attempt to document the “treasures” of art held within the U.S. (meeting “certain obvious criteria of merit and value”), and includes photogravure reproductions of select works, profiles of collectors, examinations of artists and individual works, and lists of items and their known location. (The three volumes were digitized by the Frick Art Reference library, and included in their Google Arts & Culture exhibition, “Documenting Art Collections in Gilded Age New York.” 

Page 23 from "Art Treasures of America," titled "The Collection of Mrs. A.T. Stewart." An etching by J. Veyrassat is pictured at the top, a reproduction of the painting "The Horse Fair" by Rosa Bonheur.
From The Art Treasures of America. This section documents the collections of a Mrs. A.T. Stewart.

I selected a page with a combination of common elements in the guides: a large reproduction of an artwork, a smaller illustrated image embedded in the text, and an accompanying essay. The “TEI Lite” guidelines for describing an image were straightforward enough, but figuring out how to describe association of the smaller inset image with the essay was less clear, and will probably require looking at the larger TEI. Once I started reading the essay passage, I also realized that encoding the text would require more than just identifying structural and bibliographic features, or basic elements (speakers, quotes). The essay makes reference to specific artists included in the Art Treasures—that is, artists whose work is known to be held in the U.S.— but also noted someone with whom the artist had studied who isn’t necessarily referenced elsewhere in the volume. Place names (both vague and specific) are given, including “the Mediterranean” and “Morocco,” but here there’s a tension between the text and history: do these paintings represent those places, or are they assumed locations or inventions of the artist? (This example is timely, as we recently discussed the particular modes of Orientalism in Gilded Age artworks in another seminar; so how to treat this name?) The large facsimile at the top of the page references both the engraver (an often-overlooked position) and the original artist, and is the analytical focus of the text, whereas the smaller inset image takes more of an decorative role. And how might one differentiate a patron from a later collector? Here, Christopher Warren’s article on the ODNB was at the forefront of my mind.3 A work like the Art Treasures of America is a historiography as much as a historic document, an attempt to set the standards and narrative of American art at a formative moment. So how do you ensure that your work to encode doesn’t simply replicate those narratives, while also providing a useful framework for analysis to scholars?

Screenshot of a TEI document showing markup for a page from "The Art Treasures of America"
Documentation of a futile effort.

For a text like this, I can see the real value of TEI’s extensibility and customization…but it might have to be a project that waits for a later day (or a TEI workshop).

  1. See Tim Causer and Valerie Wallace, “Building a Volunteer Community: Results and Findings from Transcribe Bentham” Digital Humanities Quarterly 6, no. 2.
  2. Earl Shinn, The Art Treasures of America , Being the Choicest Works of Art in the Public and Private Collections of North America, vol. I (New York: G. Barrie, 1880).
  3. Christopher N. Warren, “Historiography’s Two Voices: Data Infrastructure and History at Scale in the Oxford Dictionary of National Biography (ODNB),” Journal of Cultural Analytics (2018).

← Previous post

Next post →


  1. Terence V

    The TEI struggle is REAL. Even just looking at the “TEI Lite” how-to page basically felt like an impossible task. Where exactly is the specific tag that I need to mark this name/place/thing!? How am I supposed to account for [problem X/Y/Z]!? Encoding, as you’ve described as well, isn’t an easy task – but that difficulty gave me a greater appreciation for those who have mastered (or at least are proficient with) the skill.

    Also, thanks for the video. I’ve added it to my collection of “reaction” videos 😀

  2. Cassandra

    Yes and a hearty AMEN, a TEI workshop is necessary. It seems best to try to master HTML, first.

  3. Nicole Grewell

    Stephanie, I always enjoy reading what you have to say. It sounds like this module was really challenging, and your comments did not necessarily persuade me to want and pick this up. Nonetheless, you turned the struggle into an educational post that showcased what you were able to try. Since I haven’t done this module yet and also have no background in coding, I can’t offer much more feedback!

  4. I focused more on HTML during the text encoding optional module-week. TEI and HTML are like cousins, two mark-up languages; however, I thought HTML is the best for the web. I assume HTML-TWine and Gephi are the most practical new things I will take from the Clio Wired class that I will use in my future work (well, never say never, what if it’s open-refine or OCR?).
    Going back to your blog-writing: I agree with your following statement: “I found that these tutorials …tended to over-explain some terms and ideas while leaving others vague or as an assumed kind of knowledge.”
    This week my optional module was about Network Analysis. Weingart’s Demystifying Networks, I think, fits your observation. The overexplaining sometimes made me want to say, “OK, OK, I got that part, what’s next?”. And then sometimes I was like: “hold on, how did he get from there to here”?
    Looks like, I am not the only one who developed such a perception. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *