Tag Archives: Collections as Data

September Partner Spotlight: University of Utah

A picture of the University of Utah Willard J. Marriott Library from the Southeast.

Welcome to MWDL’s first partner spotlight! We’ll be writing one of these overviews for each of our partners over the coming months so be on the lookout for an email from our metadata assistant, Keegan Dohm.

In late summer I met with Jeremy Myntti, Head of Digital Library Services at the U, to talk about what new projects, directions, and transitions are being embarked on at the Marriott Library. We discussed new data visualization projects, collection acquisitions, new mindsets for approaching data, and the books Jeremy recently edited (The Sudden Position Guide to Cataloging & Metadata and Digital Preservation in Libraries).

New Methods, New Mindset

The U’s Digital Library Services and Digital Matters departments have been developing several small pilot projects exploring the concept of “Collections as Data”. That’s the moniker given to the new-ish approach to digital collections and metadata that arose in response to the somewhat widespread digitization of records and the rise of computational research methods in the humanities over the past couple of decades.

A problem emerges however, because we began digitizing records long before computational methods became commonplace, our digital archives are still closer to the traditional library model. Since we haven’t caught up with all the social and historical scientists turned programmers, they resort to reverse engineering ‘web scraping’ programs that automatically download records one at a time, or else give up and find other data sets. “Collections as Data” is about figuring out how to prepare and present these collections in ways they can be engaged by data visualization tools and analysis.

In their first project, members of the U’s Digital Library Services and Digital Matters teams (Rebekah Cummings, Anna Neatrour, Rachel Wittmann, and Lizzie Callaway) went deep into collections of mining oral histories, a primary focus of many Utah collections. They struck gold with the project title, dubbing it “Text Mining Mining Texts”.

Word cloud of terms that appeared frequently in the collections of mining oral histories

The word cloud above is a topic model produced by scanning through text from a portion of the mining oral histories. The topic model can provide really profound insight into what’s really going on in these historical periods. For example, it spurred the team to inquire about the usage of ‘strike’ in the histories; they discovered that it referred to not just miners striking, but striking out racist real estate laws as well. Though only a test case, this project certainly illustrates the benefits of making collections easier to access in bulk formats. A determined researcher with enough time might observe generational language variations using network analysis on the syntactic structures in each document and comparing them to more recently recorded interviews. This project along with other Collections as Data projects will be discussed in an article to be published in Information Technology and Libraries (ITAL) this December.

New Collections

In the meantime the team at the University of Utah is continuing to engage in projects like this. Recently, Rachel Wittmann incorporated location metadata from their brand new Harold Stanley Sanders Matchbooks collection into an interactive map using ArcGIS. Rachel also wrote an excellent newsletter about the collection here.

Check out the interactive version of this map here!

Ongoing Work

Alongside all these new approaches the Special Collections and Digital Library Services teams are continuing the ongoing work of preserving and processing new and old collections. Of note, the Manuscripts Division of Special Collections was awarded a grant from Utah State Archives to finish processing the materials in the Kennecott Copper Corporation records. The last couple of months saw the completion of that project with the remaining 189 cartons of materials successfully organized. These records give researchers access to stories of the numerous ethnic communities who migrated to Utah over time, seeking the opportunity of the mining industry. Now Anna Neatrour has been awarded funding from the U’s Digital Matters to begin transcribing the text from these records to make them more accessible.

Another large undertaking that could eventually tie back into the Collections as Data concept is the captioning and transcribing of the Audio Visual collections. Even for collections with only a few videos, this can be a daunting task as timing video captions can be a time-consuming process. Jeremy Myntti and Molly Steed have been heading this project with funding from the Marriott Library’s Jumpstart Grant Program.

Thanks for reading our first partner spotlight and be on the lookout for the sequel posts in the coming months!