ALA Annual 2011: Intellectual Access to Preservation Metadata

Intellectual Access to Preservation Metadata: Real-life tales of using PREMIS

Links to the presentations and business meeting minutes are available on ALA Connect.

The first speak was Rebecca Guenther, from the Library of Congress, who spoke on “Understanding and Implementing the PREMIS data dictionary for Preservation Metadata.”

Guenther defined preservation metadata, which includes:

  • provenance
  • authenticity
  • preservation activity
  • technical environment
  • rights management

She provided a history of PREMIS, the de facto standard for preservation metadata: the data dictionary first issued in 2005, PREMIS 2.0 followed in 2008, with a small revision (2.1) in January 2011.  PREMIS is designed to be a comprehensive view of information needed to support digital preservation with guidelines and recommendations to support the creation management and use.  It covers administrative metadata and information to manage an object for preservation purposes.  It includes technical metadata, such as information on actions done on an object and relationships to other objects, e.g., how compound objects are put together and identifying derivatives from the original.  It includes rights metadata that is associated with preservation.  It fits into OAIS reference model mostly into preservation description information

PREMIS:

  • is a common data model for organizing and thinking about digital preservation
  • is a checklist for core metadata when setting up a digital repository
  • can provide guidance for local implementation
  • is a standard for exchanging information packages between repositories

But it is NOT an out of the box solution

  • Semantic units, not metadata elements
  • It has no business rules
  • It does not include all technical metadata, relying on other schemas for format-specific metadata
  • It only includes preservation rights, not access rights

Guenther showed a diagram of PREMIS data model

  • Intellectual entities, e.g., book, photograph, website, that has one or more digital representations
  • Objects: at the file level, e.g., chapter in a book or representation, bitstream. When these are put together they create an intellectual entity
  • File and representation: these can be the same thing, or a representation can bring together many files to understand it as a whole
  • Events are what document digital provenance.  PREMIS data dictionary has a list of event types, e.g., an ingest event, migration event.
  • Agents are person, organization, software program associated with an event.
  • Rights statements satisfy preservation rights documentation, e.g., what preservation action can be undertaken.
  • A lot of technical metadata is under object characteristics.

Current state of PREMIS

  • De facto preservation metadata standard, and mandated by the country of Spain.
  • Some implementation fairs have talked about use and ways to improve
  • Editorial committee sets standards and goals, using feedback, and includes international membership.
  • Current activities: integration with other standards like METS, new documentation and tools, new release coming of draft PREMIS OWL ontology
  • See the implementation registry

What does it mean to implement PREMIS?

  • Keeping preservation metadata as defined in PREMIS data dictionary, regardless of form, or names, so long as they are mapped
  • approaches to implementation can be phased, e.g., only implement objects, then implement other parts
  • It’s not designed for people to fill in by hand
  • You don’t have to control all levels of objects, maybe just files
  • DO plan to track actions on objects for preservation purposes
  • METS is useful as exchange package, and PREMIS fits into this

Tools

  • A list of tools to generate PREMIS is listed on the PREMIS website
  • http://id.loc.gov: has three different PREMIS controlled vocabularies available

In conclusion, PREMIS is a critical piece of digital preservation infrastructure. It is international, cross-domain, and consensus created, and provides a building block for a successful digital preservation strategy.  The data dictionary is focused on implementation. Preservation metadata will be crucial for the future, even if it doesn’t help access today.

The second speaker was Peter Van Gaderen, President of Artefactual Systems, Inc., who spoke on “PREMIS in Archivematica.”

Van Gaderen spent some time describing his company’s product: Archivematica.  Archivematica is an open-source digital preservation system.  It is in alpha stage, with clients testing it from Canada and the U.S., and requires significant technical support.  It is designed to help with day to day processing and electronic records accessioning and designed around microservices: each performs small tasks on a set of files related to digital curation, which together handle the digital preservation process. The workflow uses a watch directory process, so complex workflows can be chained together, and silo processing jobs to different clients. Archivists and librarians monitor the objects as it goes through the processing.  It’s based on the OAIS reference model.  Tools related to digital forensics are still being tested.

Because it is based on the OAIS reference model, the Archivematica workflow: focuses on generating SIPs from objects from the outside world and creating AIPs for archival storage and DIPs for dissemination. It can identify files that are in “at risk” formats, and create “best bet” file format for the AIP.  It keeps both the original object and normalized preservation copy, with all of the technical and descriptive metadata about the object. Ulimately Van Gaderen wants an interoperable AIP structure to be able to interchange packages between systems. A tool called ACE checks the AIPs for stability and bit-rot.

Archivematica and PREMIS

As Priscilla Caplan pointed out, PREMIS is useful for repository design, evaluation, and exchange of AIPs between respositories (Priscilla Caplan, Understanding PREMIS). It provides authencity by establishing integrity and identity.  It maintains the chain of custody, keeps records secure, documents activities, and describes the records.  To do this, metadata must be stored in a standardized format. Within Archivematica, semantic units of PREMIS are managed as SQL metadata while going through Archivematica ingest and output as XML. Archivematica uses a Bagit package for managing files and can generate PREMIS records in METS. The events controlled vocabularies are used by Archivematica.

Archivematica hasn’t implemented rights metadata yet, but believe that by using the rights extension almost any rights can be expressed, including usage rights and access restrictions. This area should be expanded as new changes come out saying more of what you CAN’T do as well as what you CAN do (PREMIS 3.0). The AIPs are in XML and Archivematica will index so all will be searchable. Archivematica uses the PREMIS performance check-list to make sure they are conformant with PREMIS.

As an open source product, Archivematica is still evolving, and Van Gaderen encouraged people to explore the product.

The final speaker was Andrew Hart, University of North Carolina at Chapel Hill, who spoke on “UNC PREMIS in the Carolina Digital Archive.”

Hart began with background about the Carolina Digital Repository. It is based in the library, but partnership with campus information technology and the School of Information and Library Science.  It’s a repository in a very broad sense, and is designed to handle a wide range of objects, including individual photos, complex datasets, images of human remains with complex rights issues, and digital objects managed by the library. It is operational and a mixture of dark and public content.  Hart described the architecture as being like a snowball rolling downhill: raw information is wrapped in multiple layers of information added by the repository, and then that information is wrapped as well.

Hart displayed a diagram of the repository’s underlying structure. It uses Fedora with an underlying iRODS grid.  A lot of work at UNC is figuring out to have Fedora talk to the iRODS grid.  PREMIS is accommodated nicely in Fedora, but there are challenges pushing information up from iRODS.

The PREMIS elements are focused on the events entities.  In the repository, metadata that can be exported currently is MODS, Dublic Core, and PREMIS Events metadata.  It uses the the identities as defined in id.loc.gov.

One major challenge that Hart sees is how to put the PREMIS information to work. Jjust having the information doesn’t mean you know what to do with it, and knowing what to do doesn’t mean doing it, but you need to start somewhere and keep developing.  PREMIS isn’t an end in and of itself, but steps to take along a long path without end.  How does a problematic event in PREMIS catalyze action? Hart would like to create PREMIS report that reflects the full range of what PREMIS does and like to automate processes to make sure that on a regular basis information is documented and reported on.

More information can be found at the Carolina Digital Repository blog.

Reported by Kristin Martin

About admin

Kristin Martin is the Metadata Blog Coordinator for the Metadata Interest Group. She is the Acting Electronic Resources Librarian and Metadata Librarian at the University of Illinois at Chicago.
This entry was posted in ALA Annual 2011. Bookmark the permalink.

One Response to ALA Annual 2011: Intellectual Access to Preservation Metadata

  1. Pingback: PREMIS, An Overview with Links | Literary Stew

Leave a Reply

Your email address will not be published. Required fields are marked *