Greetings, readers! In today’s post, we’re doing some library science and getting our hands dirty by digging into online cataloguing and data models. Don’t say I didn’t warn you!
I’ve just returned from the annual Schoenberg Symposium on Manuscript Studies in the Digital Age at the University of Pennsylvania. It was an inspiring gathering of manuscript scholars and digital humanists, thinking about how we can collaborate and facilitate each others’ work.
The theme of this year’s symposium was “Hooking Up” – in the context of the symposium, the term refers to the concept and practice of “linked data.”
Some of you may know that in addition to my work as Executive Director of the Medieval Academy of America and my manuscript research, I am a Professor of Library Science at the Simmons University School of Library and Information Science in Boston. In my annual class, “The Medieval Manuscript from Charlemagne to Gutenberg,” we spend a lot of time discussing the history of cataloguing and classification theory and thinking about how to apply those concepts to the modern digitization and cataloguing of medieval material. In the context of Library Science, “linked data” means forging digital connections between standardized referents in order to 1) avoid inefficient duplication of data entry and 2) ensure consistency.For example, if you are cataloguing a manuscript of Chaucer’s Canterbury Tales (lucky you!), you have to make a choice about how to refer to the author. Are you going to call him “Chaucer,” “Geoffrey Chaucer,” or “Chaucer, Geoffrey”? The choice you make will have important implications for your library patrons. To make your online record “discoverable,” or easily find-able by users, you have to use what are called “authorities,” standardized names and titles that are established, in the US, by the Library of Congress. There are several international authority files as well, brought together in a meta-authority file known as VIAF.
That’s a major oversimplification of the concept of authorities, but it’s important background for what I really wanted to write about.
When I started this blog back in 2013, I wanted to use this space to explore the burgeoning world of online access to medieval manuscripts in North America. Back in 2013, if manuscripts were being catalogued online at all, it was almost always as part of the library’s general online catalogue (known as an OPAC (Online Public Access Catalogue)) using the standard data model (also established by the Library of Congress) called MARC (MAchine-Readable Cataloging – check out the Wikipedia entry for a brief introduction). MARC was developed in the 1960s specifically for printed books, NOT for handwritten documents and other unique materials. And so it doesn’t work very well for those rare, unique objects.
There are lots of reasons why MARC is problematic for cataloguing unique objects, but here’s one of the most important: the structure of a MARC record is incompatible with a unique object such as a medieval manuscript.
The conceptual framework underlying a MARC record is replication. If you’ve just purchased a paperback copy of the third edition of the Norton Critical Edition of Moby Dick, you’ll want to input it into your library’s OPAC. This paperback edition has 736 pages and measures 5.6 x 9.3 inches. A different edition of the work will have a different number of pages and different measurements, but EVERY copy of THIS edition will have the same number of pages and the same dimensions. So to input this edition into your database, all you have to do is visit the Library of Congress backend database and import the correct “Bibliographic Record” (“Bib Record” for short – the metadata for a particular edition of a particular work) into your OPAC. Once you’ve added your Bib Record, you then indicate how many copies of that edition are in your library and their call numbers (in “Item” records hanging off the Bib Record), and you’re done! Here’s an example of the third edition in the Yale University OPAC (called Orbis).
A Bib Record is by definition NOT unique. The Item in your library might have unique features (a bookplate or autograph, for example), but the Bib Record that holds that Item Record is not. It applies to every copy of that edition, not matter where those copies live. This is why the MARC structure is fundamentally at odds with manuscript cataloguing: every manuscript is unique. Each manuscript of The Canterbury Tales has a different number of leaves and different measurements from every other manuscript of the text (among other unique features). A Bib Record for a medieval manuscript can therefore only have one Item Record associated with it, which defeats the purpose of MARC architecture.
Because of this reproduce-ability, the central aggregator for MARC records, OCLC, automates the creation of lists of locations for each Item associated with a particular Bib Record in the aggregated catalogue, WorldCat. The WorldCat record for our edition of Moby Dick, for example, lists 108 locations in the Boston area. That’s super-helpful…if you or someone in your family happens to need this exact edition, you can easily find a copy at a library near you. However, this automation is a real problem where medieval manuscripts are concerned. For any particular manuscript, there simply cannot be more than one location. And yet, we find records like this one for a Book of Hours, listing FIFTY-NINE locations! This is an impossibility – the Bib Record represents a specific manuscript in a specific location, but the aggregator has mistakenly associated dozens of other Books of Hours with this one, because they have the same title. As a result, there is no way to know which actual manuscript this record represents. This record – which had ONE JOB TO DO – has failed. It has not allowed me to locate the manuscript.
It occurred to me today, though, that there is one situation where the MARC structure might be quite helpful for dealing with manuscripts: single leaves in different collections that were originally part of the same manuscript. I’ll use the Beauvais Missal as an example.
There are many Beauvais Missal leaves to be found in WorldCat. The problem is that you can’t easily find them. A search for “Beauvais Missal” and “Latin” retrieves nine records, one of which is a printed book. The eight remaining records are indeed Beauvais Missal leaves. They have eight different titles and six different dates, ranging from 1150 to 1450 (spoiler alert: it’s actually ca. 1290). Because I happen to know that Otto Ege assigned to this manuscript the exact date of 1285, I know that a record with the title “Missal” and the date “1285” is pretty likely to be from this manuscript as well: a search for “missal,” “Latin,” and “1285” finds ten records, nine of which are Beauvais Missal leaves (and one of which, at Loyola University Chicago, I didn’t know about until today! That makes 108…). A search for “Otto F. Ege” and “missal” retrieves additional records, including a few records for Ege’s “Fifty Original Leaves of Medieval Manuscripts” portfolios, in which Beauvais Missal leaves are no. 15. Finally, a search on THAT title finds even MORE records. Many of these results are duplicates, appearing in multiple results lists. It shouldn’t be this difficult. You see where I’m going with this: if each Beauvais Missal leaf shared a common Bib Record, you, as the cataloguer, could import that Bib Record into your OPAC and hang your own Item (that is, your Beauvais Missal leaf) off of that Bib Record. But that would only be possible if someone somewhere was creating those standardized Bib Records in the Library of Congress database so that local OPAC cataloguers could find and import them. That seems an unlikely prospect.
All of this doesn’t mean you CAN’T use MARC for manuscripts, especially if you don’t have any other options. But you have to be aware of the limitations of the data model and the square-peg-round-hole-ness of stuffing manuscripts into MARC. In other words, proceed with caution. If you MUST put your manuscripts in your MARC-based OPAC, I recommend following the model developed by Yale University, whose records are also designed to be ingestible by Digital Scriptorium. Here’s a good example, chosen COMPLETELY at random (for the MARC cataloguers out there, the secret to discoverability and this jury-rigged interoperability has to do with the 500s, 650, and 690 fields – select MARC View for details).
The good news is that new models have developed in the years since I started this Manuscript Road Trip and are catching on. Many collections now use integrative systems such as CONTENTdm or LUNA, systems that integrate digital images with data models that are flexible and more appropriate for unique material like medieval manuscripts. Such systems may also be compatible with IIIF functionality, enabling image as well as linked-data interoperability. The records can also be ingested by WorldCat, as with this Beauvais Missal leaf belonging to Western Michigan University. The institutional LUNA-based record is here. Even though such records can be ingested by OCLC, they use a different architecture than MARC records, resolving the Bib/Item problem.
As more and more institutions migrate to image/data systems, especially those with IIIF functionality, we will see vast improvements in discoverability, access, and interoperability of online medieval and otherwise-unique material. Let’s get to work!