ALA Midwinter 2012: Intellectual Access to Preservation Metadata

Metadata for Datasets and Dataset Management: Tools and Approaches for Access and Preservation

Joan Starr, California Digital Library

Intellectual Access to Preservation Metadata Interest Group

ALA Midwinter Meeting, Jan. 21, 2012

Â The presentation began with an overview of the California Digital Library, which serves all of the University of California, and provides services by licensing agreements, union bibliographic tools, and data curation and management tools. It shares resources, services, and provides solutions for managing digital assets.

Requirements for dataset description

Start by asking the researchers: why are their requirements? Different domain areas are starting to come forward with their requirements.
New descriptions are primarily in the area of access: track impact of research, lay groundwork for reuse, ensure fairness, accountability, and transparency.
Libraries have their own needs: we want to ensure the preservation of our institutions scholarly assets
Libraries have an ally because the funders for research now require a data management plan

How do we describe datasets?

Â There is an astounding amount of variety among different research domains
- locator persistent identifiers â€“ this is universal
- access date and time is very important for dynamic data sets, data could be coming from streams of data of millions of points per second
- Recommendations are still being developed
- For persistent identifiers: DataCite
  - Founded two years ago by a group of libraries
  - Mission is to help you find, access, and reuse data
  - Organization is run by the members
  - Assigns digital object identifiers (DOI) to datasets
  - DOIs have been traditionally assigned to scholarly papers
  - DOIs are governed by the International DOI Foundation, which governs the uniqueness the DOIs.
  - What is a persistent identifier?
    - It looks like an alphanumeric string that never changes, that is associated with the location of an object, and it may optionally have some other metadata
    - DOIs can be described in their native form or as an actionable link
    - The string will always stay the same, regardless of where the data is moved, as long as the metadata behind the link is updated

EZID and Metadata

Tool to let you create DOIs and manage them over time, and manage metadata
URL is http://n2t.net/ezid
There is a help tab where without an account you can create test DOIs and test ARKs (use of ARks is recommended until research is ready to be published because they are less permanent than DOIs)
There is an API, and many users use the API to batch ingest items.
DataCite required elements are based on the Dublin Core Metadata Set
Universal redesign, automated link checking is underway
The â€œgory detailsâ€ of the DataCite Metadata set, V. 2.2
- small required set = citation elements
  1. Identifier â€“ currently only accepts DOI
  2. Creator
  3. Title
  4. Publisher â€“ the entity who makes the data available to the community of researchers, so it might be a distributor, data repository, traditional publisher, university repository, etc.
DataCite wants to remain domain agnostic, itâ€™s hard to please everyone, but it can be used by all
Optional descriptive set of 12 elements that add additional information
- domains can add more information here
- Provide support for a number of external fields
1. Subject (with schema attribute)
2. Contributor (with type & name indentifier attributes)
3. Data (with type attribute)
4. Language
5. ResourceType (with description attribute)
6. AlternateIdentifier (with type attribute)
7. RelatedIdentifier (with type & relation type attributes)
8. Size
9. Format
10. Version
11. Rights
12. DescriptionÂ (with type attribute)
- will be adding with v. 2.3: IsIdenticalTo

The importance of Data Management Planning

Need a lifecycle approach, as done at CDL
CDL Curation and Publishing Services creating tools to support this
Data management needs to be done all the way
Identifiers should be assigned as soon as possible, and allows researchers to keep track of data
Managing versioning: anytime there is a major version change, dataset needs to be reregistered. The challenge is: what is a major version change? Different domains may define this differently. Also challenging is how to deal with minor version changes â€“ should these be reregistered too?

More links to materials are available in the slides (will be posted on ALA Connect soon)

Posted by Kristin Martin

ALA Midwinter 2012: Intellectual Access to Preservation Metadata

About admin

Leave a Reply Cancel reply

Categories

Archives

Meta

ALA Midwinter 2012: Intellectual Access to Preservation Metadata

Share this:

About admin

Leave a Reply Cancel reply

Categories

Archives

Meta