Oh boy oh boy, is this week’s discussion near, dear, and feared in my heart! I was so happy to see Dr. Safiya Noble’s work on the reading list, as her contributions are crucial to information science/LIS studies, and have heavily informed my own approaches to teaching search and research in library settings (and specifically the need for critical interventions in artists’ and designers’ research processes). A few years ago, a colleague and I organized a workshop for cartoonists and illustrators that interrogated stereotypes and the use of Google Images in visual research and development. Beyond the tendency to find the same, stale images that everyone else might be drawing on (literally), we discussed how these searches tend to reinforce our preconceived ideas of people and things, with no counter-narrative to challenge what we see. In this case, our intent wasn’t necessarily to ask artists to stop using Google Images, but to consider the questions they’re asking of themselves when using these tools (in short: “doing research” is not enough, and you need to understand how your biases perspectives are going to interact with, and be amplified by biased search systems, and how to read against those results.)
There’s another kind of insidious creep of “the algorithm” that worries me, though, noted briefly in Sharon Block’s article.1 While Block focuses on the specifics of JSTOR, Matthew Reidsma’s Masked by Trust investigates the adoption and proliferation of Google-like search interfaces and algorithms by library discovery systems, which are usually the products of commercial vendors. These systems are largely driven by proprietary software, without any real transparency to their operation, or the company’s use of user data. Because these discovery systems are implemented by libraries (with all their perceived values of “neutrality” and work for a larger, public good), they’re assumed to operate from that same position of objectivity, and don’t receive the same critical scrutiny of commercial search engines like Google. (And in case I haven’t said it here enough: libraries are not, and have never been, neutral.)
After reading Block’s article and her questioning of JSTOR’s (pretty troubling) handling of topic modeling, I was curious to see if our own, new library discovery system provided any information on how it ranks searches and handles subject terms. The vendor provides a brochure that, while not extensive with technical specs, gives some useful information on its treatment of user searches. 2 Things start off as expected: the system checks search terms against existing metadata fields first (title, author, subject headings), with some sophisticated use of synonym/word stem expansion. The next steps start to feel a little uncomfortable, though. When users search, they’re provided with autocomplete options (not unlike Google’s own suggestion feature), based on popular searches in the system’s logs and for content available through that library. Items are also ranked based on an item’s “academic significance”:
“The item’s academic significance is calculated from factors unrelated to the query, such as whether the item was published in a peer-reviewed journal, how many times it has been cited, and what type of material it is, for example a journal article is considered more significant than a newspaper article.”
Finally, materials are ranked based on publication date, with some further options to provide customization based on a user’s personally-set subject or disciplinary interests. At Mason, this means you can tailor your search to prioritize content from within your own field, filtering out the “noise” of irrelevant research or authors.
There are a few things that we could investigate here, given this week’s concerns about ethics and bias. One of the most obvious (and probably unsurprising) issues from the start: this is not a system designed for historians, or users looking for primary source content, or… really, anything that might fall outside of traditional academic publishing paradigms. The presumed “standard” user seems to be a non-specialist who might not start with a subject database, but nonetheless is looking for a fast and direct way to connect with new research. In fact, these discovery systems are often the first way that new students or researchers connect with our library’s catalog and resources. For this reason, similarities to Google are probably intentional, and usually welcome: because the library system looks and acts like what new users are familiar with, they may be more likely to return and fully exploit the library’s resources.
But as with Google, how much do our users know about how this system works? How much do they care? When they log in as instructed by librarians (and as required to access full-text content), do they know if and how their searches are connected to their student data? Do librarians know if their searches are connected to their personally-identifying data? One important feature of many older library catalog and circulation systems was their short-term memory; once a patron returned an item, it was wiped from the record. Searches, likewise, were not connected to anything that might identify a user. Librarians have fought (and been threatened with jail) to protect user privacy; your reading history is your own, even when the PATRIOT Act might try to say otherwise. How does that change in this new environment, though, when searches from the system logs are used to suggest results to other users?
Beyond concerns of data privacy, how do these kinds of systems reinforce disciplinary bubbles or cognitive/confirmation biases, reinforcing only those kinds of results that users expect to see, without any troublesome interdisciplinary challenges? Or information hierarchies that privilege traditional systems and sources of publishing? What does it mean when the company building the discovery system is also providing much of the content? Even our vendor seems to recognize that there’s a concern about algorithms; the word is avoided entirely in its discussion of “intelligent ranking technology.”3 For librarians, this puts us in an awkward position when it comes to teaching students or new users, especially when those requests often come from faculty who might ask us to simply “teach them how to use the databases.” How can we support a learner’s critical engagement with our systems, while also supporting their immediate class or research needs? Block and Noble call attention to the issues in external systems, but how do we address this problem when it’s now built into the systems that manage and deliver our own library content?
(To try and wrap this ramble up in a spooky pre-Halloween analogy:
- Sharon Block, “Erasure, Misrepresentation and Confusion: Investigating JSTOR Topics on Women’s and Race Histories,” Digital Humanities Quarterly 14, no 1 (2020).
- ExLibris Primo, “Primo Discovery: Search, Ranking, and Beyond,” Exlibris, March 2015, accessed 16 October 2020.
- ExLibris Primo, “Primo Relevance Ranking: Technology Overview,” exlibrisgroup.com, Proquest, accessed 16 October 2020.