UC DLF: Participatory Digital Collections

Below are the slides and speaking notes for our Collections Lab presentation on “Participatory Digital Collections: Opportunities for Enhancement, Experimentation, and Transformation,” presented at the 2018 UC DLF at UC Riverside.

Participatory digital collections and the Collections Lab

When we talk about “participatory collections,” we are often discussing the projects that facilitate user engagement through crowdsourcing of new data, such as transcriptions, tagging, and description. For this talk, we expand on this understanding of “participatory” to include all forms of engagement that encourage the creation and/or reuse of collections data for teaching and research, as well as efforts to contextualize or reinterpret collections through new interfaces and access points.

The UCLA Library Collections Lab is a space to promote more participatory collections and encourage the library and its users to consider our collections as data. Here, we experiment to enhance our collections and build out our “collections as data” program through both crowdsourcing, computational, and design approaches. We don’t currently have the open digital library system of our dreams (although we’re working on that) - until we do, the Collections Lab is a space for:

Collections Lab & BuildUCLA

Machine-learning experiments: Book Annotation Classification Engine

One of our BuildUCLA / Collections Lab experiments is the Book Annotation Classification Engine, a

With so many institutions coalescing around IIIF, we wanted to start thinking of other ways libraries and users could “act” on our digital collections in this sphere. The ability of IIIF to pull disparate collections together using IIIF manifests and viewers really opens up the possibilities for users to engage across multiple collections and institutions.

One of our current projects with UCLA’s Clark Library is the Annotated Books project, an NEH funded project to digitize, describe, and publish annotated books from the Clark collections. In addition to the book catalog data, the descriptions also include data about the annotations themselves. This got us thinking – how great would it be if we could get a machine to find all the annotations for the researchers and maybe even classify them? — and could we do this within the IIIF ecosystem?

Pete and I had just started working together on BuildUCLA, so a joint BuildUCLA / Collections Lab venture was inevitable. We recruited some talented computer science undergrads and a few colleagues to start experimenting. The resulting product will (hopefully) allow users to apply the trained models to any IIIF collections that have manifests enabled. These collections then can become platforms for machine-learning based discovery and extraction of latent collections data.

To make this work, we need loads of training data. I’m new to machine-learning, so did not initially realize how much training data was needed to get us started. For the training data, we started by making use of existing digital collections to create datasets.

The students have been busy with the collection, organization, and cleaning of collections data to prep for the work ahead. They are also experimenting with various approaches and are seeing some small successes while they learn more about machine-learning and computer vision approaches.

We are also employing crowdsourcing to build training data sets using the Zooniverse platform. We plan to reach out to communities that might be interested in book annotation and we invite anyone to contribute by tagging annotations on uploaded page images to help generate additional training data. Of course, the data will be made available for anyone to use.

To Pete…

Discussion: Now that we’ve talked about some of our projects and how we see these contributing to more participatory / dynamic digital collections, we’d like to move into some discussion on how we might set ourselves up for success. This often begins with selection: what criteria are we considering when we decide to digitize? Many of our digital collections have been curated/built on principles of “special-ness” - those high profile, unique, or beautiful items – or to increase access to hidden collections or provide surrogates for materials that get lots of traffic or are in need of preservation. These are great reasons to digitize; however, we’d like to discuss ways that we might extend these criteria to foster more participatory collections – What new criteria might we consider when selecting items or collections for digitization? And what criteria might we use when considering whether to adopt, build, enhance born-digital collections or those that were previously digitized? Here are a few points to get the conversation started:

Digital Collection Criteria to promote more participatory collections:

We might also discuss what it takes to encourage the types of user participatory engagement/behaviors that the panelists touched on in their presentations. For example: