Metadata Interest Group Meeting
ALA Midwinter Meeting
Jan. 9, 2011 8-10 AM
Links to the actual presentations will be provided when they are available.
The first speaker was Corey A. Harper, who presented on, “Linked Library Data: 2010-2011 Update.” His talk was designed to provide an update on linked data activity since ALA Annual 2010. Coverage included:
- Preconference announcement of the WC3 incubator group
- National library activities in linked data
- Dublin Core 2010 in Pittsburgh
- Archives and museums activities in linked data
- Authority work
- IFLA work
- Linkypedia – not quite linked data but illustrates why the linked data movement is so important
- Vision for using library linked data
Harper began with the four principles, as announced by Tim Berners-Lee:
1. Use URIs as names for things
2. Use http URIs so people can look up the names
3. When someone looks up a URI, give back information
4. Include links to related URIs
Once this announcement was made, growth started to expand exponentially for linked data, which Harper illustrated by showing how the cloud of linked data has grown. Libraries really came on board between 2009-2010. At this point, the cloud has grown so much that it is no longer manually maintained, but automatically generated, published by the Comprehensive Knowledge Archive Network.
Libraries all know the value of library data: bibliographic and authority–these have been the first areas for libraries to look at in terms of publishing linked data. Linked data provides standardized set of mechanisms to expose the data in a way that plays with other data on the web and is usable with other data on the web and so that the libraries can use other non-library data. Examples of library data that available:
- Bibliographic ontology
- RDA (Resource Description and Access)
- FRBR (Functional Requirements for Bibliographic Records) – both official and unofficial (some bibliographic data has been developed outside of libraries)
- ISBD (International Standard Book Description)
- VIAF (Virtual International Authority File) and MADS (Metadata Authority Description Schema)
Harper then provided an update on what entities have been publishing linked data since April 2010. His list included national libraries, such as the German national library, Hungarian national library, and British national library, which has just made its data available as an RDF download. Getting the data out there is the big first step for libraries. Additionally, there has been a lot of authority work done in relations to linked data. VIAF has been revamped in terms of how it published linked data, based on work at DCMI 2010 conference. It now clusters authority records, and offers different views of the same data. Harper showed an example of how Bob Dylan is represented as a subject heading and a FoaF (Friend of a Friend) name. This allows data to be available in both a library version and a FoaF representation At http://id.loc.gov, there have been additions of the MARC code lists: countries, geographic areas, and languages. There have also been early effort to manage precoordination in subject headings with being expressed in RDF.MADS as RDF. It is right now an open draft for public comment. There is a mirror some of the library of congress data at http://lcsubjects.org, that is also trying to manage precoordinated headings.
At the Worldwide Web Consortium, a new “incubator group” has been created to address library linked data: W3C LLD XG (Library Linked Data): Incubator groups are discussion groups designed to come up with the strategic directions for a specific issue, with the usual outcome a report with recommendations to the W3C. The membership of the group is researchers, consultants, and librarians, including a lot of national libraries. As part of their work, they have collected over 50 use cases to find out what kinds of applications library linked data could support: publishing bibliographic data, dealing with authority data, archival needs, etc. They are mining use cases for functional requirements and design patterns, with a report due to be issued in Summer 2011. All deliberations of the group are public and people are welcome to follow along: http://www.w3.org/2005/Incubator/lld/wiki/Main_Page.
Harper next provided a brief overview of initiatives and activity in relation to RDA. All RDA elements, roles, and vocabularies have been registered in the open metadata registry and are represented as SKOS. Additionally, IFLA FRBR and ISBD elements are all registered. IFLA is reviewing and consolidating all of the FRBR reports to reconcile conflicts and update FRBR, with work to represent it as RDF. Ultimately, RDA would like to be a multilingual work.
- The Open Metadata Registry has continued to grow. It was formerly the National Science Digital Library Registry, but its scope has grown beyond that. The Open Metadata Registry provides a vocabulary service and allows users to take URIs and assign different views of the same data, allowing the registry to become international in scope.
- Linked Open Copac Archives Hub (LOCAH) is a UK-based project, funded by the JISC. The project is working on making available EAD data from the Archives Hub and bibliographic data (MODS) from Copac (which themselves are both JISC-funded services) as linked data. (thanks to Pete Johnson for clarifying).
- Europeana Project and Europeana Data Model (EDM) is a project to represent museum objects across Europe. It builds upon the OAI-ORE (Open Archives Initiative for Object Reuse and Exchange) model. The project is trying to aggregate the different descriptions of digital surrogates and link them to the actual resource in question.
Harper ended his talk ends with a description of Linkypedia: http://linkypedia.inkdroid.org/. Although not linked data, it is a alpha project done by Ed Summers on his spare time. Summers is harvesting all of the links used in documentation on every Wikipedia article. A lot of the citations point to library/museum/archives information. Linkypedia is designed to find out what articles are citing a particular source (e.g., how many Wikipedia articles cite the NARA website?) Soon libraries will be able to enter their specific cite to find out what articles are citing it. There is an additional set of views to see what other citations are contained within the same article. The principles of the project: topical hubs, aboutness, shared interest, and ways to link cultural heritage community information together. It follows the same principles of linked data.
Some discussion following the formal presentation noted the following:
- Publishers are also working with linked data, for example the New York Times, BBC and Reuters. Large publishers and news agencies are starting to get into this space.
- Rhonda Marker noted a connection between linked data and new requirements of the National Science Foundation to develop data management plans. As a result, libraries need to be more explicit about the rights management and the relationships between data sets, journal articles, and other related bits. Linked data principles should help manage those relationships. Harper responded that Dublin Core has a new working group that is looking at metadata provenance: history and change history of the metadata itself, may help with managing datasets by documenting how they have been curated.
The second speaker was Oliver Pesch of EBSCO Information Services, (filling in for Mike Giarlo), who spoke about “Institutional Identifiers: NISO I2 Working Group.”
Pesche began by providing some working group history. It was founded in 2008 and chaired by Grace Agnew of Rutgers University and Pesch, and composed of members from all sectors of the library supply chain. The Mission of the I2 Working Group is to create a robust, scalable, interoperable standard to uniquely identify institutions and describe the relationships between them. As requirements, it should be lightweight to manage, re-usable by business sector registries, and interoperable with legacy applications.
Pesche explained why such an identifier is important. The identity of an institution is critical to any information model, and needs to be global, interoperable, unambiguous, and unique. The Working Group is trying to develop a central registry to assign the identifier and store core metadata to identify the institution, provide look-up services to see if an institution has been identified, and provide an API for programmers. Distributed sets of business applications would be able to use this data. Pesch showed a chart of how the central registry could be used by various registration agencies, with business applications being developed by the registration agency.
There is a draft list of metadata elements to identify institutions. There is a main identifier, variant identifiers, affiliated institutions, etc. In determining the identifier, the Working Group looked at a lot of existing identifiers for institutions, e.g., OCLC symbols, MARC codes, SAN (standard address number), ONIX, etc. The closest they found is the International Standard Name Identifier (ISNI), which provides public identification of any entity involved in creation, production, management, and content distribution chains. The actual identifier is a 16 digit number with check character. ISNI is working with VIAF to leverage ISNI in the VIAF authority files. An alternative is to use the http GET function to use a base URL to identify institution or create a REST-ful URL.
Pesch ended the presentation by providing some scenarios for how the registry could be used for ILL request, and ordering subscriptions to different portions of an institution. More information about the I2 Working Group can be found on NISO’s site: http://www.niso.org/workrooms/i2.
Posted by Kristin Martin