Resources

Public Collections Data from GLAM Institutions

Many institutions have made collections publicly available as bulk downloads or via APIs. A comprehensive listing is beyond the scope of this site, but here are a few high-profile examples:

The Carnegie Hall performance records as Linked Open Data
The New York Philharmonic performance history records
130,000 metadata records from the Museum of Modern Art’s collections
The Rijksmuseum has made hundreds of thousands of records and high-quality images from its collections available via an API
The Linked Jazz project – includes some exploratory visualizations as well

Tools for Extracting Collections Data

The goal of Collections Lab is to encourage research, learning, and experimentation with collections as data. To this end, we are collecting and sharing scripts that make it (relatively) easy for users to extract collections data for digital collections for various institutions, including the Library of Congress Digital Collections and the Internet Archive. Find these on GitHub here: https://github.com/digitalcollectionslab/scripts

Crowdsourcing Collections Data

Crowdsourcing methods and platforms are another way of generating/collecting data from collections. GLAM institutions or scholars can call on academic communities and the general public to enhance collections data to facilitate discovery, research, and teaching. This might be as simple as adding description to a formatted spreadsheet, or may involve using a platform for crowd- or scholar-sourcing description contributions. Zooniverse is a popular platform for crowd-sourcing transcription and tagging of textual or pictorial images. Transkribus is another hosted solution for collaborative transcription and incorporates Handwritten Text Recognition.

Always Already Computational: Collections as Data

A grant-funded project involving many libraries and memory institutions, dedicated to “encouraging reuse of collections that support computationally-driven research and teaching.”
So far, this effort has been dedicated primarily to theoretical discussions, but it’s worth following.