ALA Midwinter 2012: Intellectual Access to Preservation Metadata

Metadata for Datasets and Dataset Management: Tools and Approaches for Access and Preservation

Joan Starr, California Digital Library

Intellectual Access to Preservation Metadata Interest Group

ALA Midwinter Meeting, Jan. 21, 2012

 The presentation began with an overview of the California Digital Library, which serves all of the University of California, and provides services by licensing agreements, union bibliographic tools, and data curation and management tools. It shares resources, services, and provides solutions for managing digital assets.

Requirements for dataset description

  • Start by asking the researchers: why are their requirements? Different domain areas are starting to come forward with their requirements.
  • New descriptions are primarily in the area of access: track impact of research, lay groundwork for reuse, ensure fairness, accountability, and transparency.
  • Libraries have their own needs: we want to ensure the preservation of our institutions scholarly assets
  • Libraries have an ally because the funders for research now require a data management plan

How do we describe datasets?

  •  There is an astounding amount of variety among different research domains
    • locator persistent identifiers – this is universal
    • access date and time is very important for dynamic data sets, data could be coming from streams of data of millions of points per second
    • Recommendations are still being developed
    • For persistent identifiers: DataCite
      • Founded two years ago by a group of libraries
      • Mission is to help you find, access, and reuse data
      • Organization is run by the members
      • Assigns digital object identifiers (DOI) to datasets
      • DOIs have been traditionally assigned to scholarly papers
      • DOIs are governed by the International DOI Foundation, which governs the uniqueness the DOIs.
      • What is a persistent identifier?
        • It looks like an alphanumeric string that never changes, that is associated with the location of an object, and it may optionally have some other metadata
        • DOIs can be described in their native form or as an actionable link
        • The string will always stay the same, regardless of where the data is moved, as long as the metadata behind the link is updated

EZID and Metadata

  • Tool to let you create DOIs and manage them over time, and manage metadata
  • URL is http://n2t.net/ezid
  • There is a help tab where without an account you can create test DOIs and test ARKs (use of ARks is recommended until research is ready to be published because they are less permanent than DOIs)
  • There is an API, and many users use the API to batch ingest items.
  • DataCite required elements are based on the Dublin Core Metadata Set
  • Universal redesign, automated link checking is underway
  • The “gory details” of the DataCite Metadata set, V. 2.2
    • small required set = citation elements
      1. Identifier – currently only accepts DOI
      2. Creator
      3. Title
      4. Publisher – the entity who makes the data available to the community of researchers, so it might be a distributor, data repository, traditional publisher, university repository, etc.
  • DataCite wants to remain domain agnostic, it’s hard to please everyone, but it can be used by all
  • Optional descriptive set of 12 elements that add additional information
    • domains can add more information here
    • Provide support for a number of external fields
    1. Subject (with schema attribute)
    2. Contributor (with type & name indentifier attributes)
    3. Data (with type attribute)
    4. Language
    5. ResourceType (with description attribute)
    6. AlternateIdentifier (with type attribute)
    7. RelatedIdentifier (with type & relation type attributes)
    8. Size
    9. Format
    10. Version
    11. Rights
    12. Description  (with type attribute)
    • will be adding with v. 2.3: IsIdenticalTo

The importance of Data Management Planning

  • Need a lifecycle approach, as done at CDL
  • CDL Curation and Publishing Services creating tools to support this
  • Data management needs to be done all the way
  • Identifiers should be assigned as soon as possible, and allows researchers to keep track of data
  • Managing versioning: anytime there is a major version change, dataset needs to be reregistered. The challenge is: what is a major version change? Different domains may define this differently. Also challenging is how to deal with minor version changes – should these be reregistered too?

More links to materials are available in the slides (will be posted on ALA Connect soon)

Posted by Kristin Martin

About admin

Kristin Martin is the Metadata Blog Coordinator for the Metadata Interest Group. She is the Acting Electronic Resources Librarian and Metadata Librarian at the University of Illinois at Chicago.
This entry was posted in ALA Midwinter 2012. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *