Recent Trends in Catalog Architecture: ALCTS Catalog Form and Function Interest Group

Apologies for being a tardy blogger, but the good news is that by now, all of the powerpoints for these fine presentations are posted and linked to from the CFFIG wiki page

ALCTS Catalog Form and Function Interest Group
Recent Trends in Catalog Architecture
Saturday Jan. 16 2010 10:30 a.m.

Chair Richard Guajardo introduced 4 presenters who described applications that draw metadata from the ILS and other sources for use in discovery interfaces. These presentations were varied but all concerned the architecture and functionality of multiple layers – “what happens (or needs to happen) in between” to transform, combine, and synchronize metadata.

LENS: Catalog records and Additional Data Sources in the Aquabrowser Implementation at the University of Chicago, presented by Frances McNamara, University of Chicago.

This was a technical overview of what happens between metadata sources and the Aquabrowser discovery interface.

McNamara described the aggregation of resources as “stone soup”: in addition to 5.7 million MARC records from the catalog, they combine SFX and Metalib exports, Hathi Trust records, EAD finding aids, Dublin Core for digital image collections, results of library website crawls, and others, plus enhancements (summaries, tables of contents, etc.), plus “user lists” from the discovery system, and item availability information from the catalog (updated dynamically).

  • Everything is transformed into a common format in an “interim database”.
  • Merging of records between print and electronic versions takes place (use of identifiers such as OCLC number and ISSN in bib records is important).
  • U. of Chicago is able to avoid synchronization issues by recreating the database nightly.

Unfortunately there was time for only one or two questions between presentations; in hindsight it would have been interesting to discover what metadata services were supported by Aquabrowser and whether local modifications or locally developed tools were used.

To Fix A Leaky Sink: Envisioning The Potential of Discovery Layers, presented by Joshua P. Barton & Lucas Wing Kau Mak, Michigan State University

This presentation was more of a thought piece about metadata architecture and strategy for “next gen catalogs” as they move toward “one stop shop” discovery interfaces, based on some challenges Michigan State is encountering with its implementation of Innovative’s Encore discovery layer.

They’re trying to think beyond their former approach of “everything has to be in the (ILS) catalog” toward an architecture where the discovery layer(s) accept metadata from wherever it resides, rather than having repetitive metadata created for different tools (as has happened when some of the image collections for which they have Dublin Core metadata were loaded into the ILS).

Some downsides mentioned include loss of control, with no say in normalization or controlled vocabulary. A lot depends on the discovery layer vendor, what services they offer and how much. De-duplication does not occur in the Encore system, and the duplicate image set metadata is difficult to deal with separately from other image repository metadata. Getting metadata straight from outside sources might involve facilitating connections between the discovery layer and outside systems.

This presentation gave a useful overview of factors to consider in looking to change the metadata architecture behind a “next gen” discovery layer, but didn’t offer conclusions. It was interesting that they mentioned a need for both authority control and mapping of subject headings as features that need to be part of the architecture with “next gen” discovery systems.

Automated Metadata Repurposing Using eXtensible Catalog Software, presented by Jennifer Bowen, University of Rochester River Campus

Bowen prefaced her presentation with a comment on the previous one: “I have ideas that could address some problems Joshua and Lucas talked about”. eXtensible Catalog is a set of open-source software tools that was developed with funding from the Mellon foundation with contributions from partner institutions; the XC Foundation will be launched next month to maintain it. Although the toolkit is still being actively worked on, many tools are already available for free download; download links and more information can be found at http://www.extensiblecatalog.org/ .

XC software currently provides 3 types of services that can be downloaded and used individually: Connectivity (tools to gather metadata from source systems and if necessary, transform it to make it available via OAI-PMH, and a separate NCIP toolkit for circulation status metadata), Metadata Management services, and a User Interface based on Drupal, with plans for a learning management system module. There are also plans for authority control features in the Metadata Management module.

Bowen’s presentation focused on the middle, metadata management layer. This is designed to allow scheduling of a sequence of operations on batches of metadata. The initial set of services is designed to work on MARC metadata, but transformations could be built using XSLT for any kind of metadata.

Services include normalization, transformation (all records are transformed to a common XC schema), de-duplication and aggregation.

  • Among the 20 normalization functions for MARCXML are language code validation and normalization of OCLC numbers
  • The transformations include conversion of records into “FRBR sets”.
  • The user interface provides a facet panel of services for navigation of metadata operation setup
  • Metadata is retained for staff review of record sets and error reports.

This well thought out tool set can be extended by developers and, in addition to supporting its own discovery interface, could provide the metadata management layer needed between the catalog and other metadata sources, and the discovery tools used by libraries.

Equality of Retrieval – Levelling the Metadata, presented by Aaron Wood, University of Calgary

The issue arising from the University of Calgary’s implementation of a metadata aggregation service (Summon from Serials Solutions) that was the focus of Wood’s presentation: how to prevent the local institution’s collections (print and digital) from becoming marginalized in search results when combined with a much larger number of full text resources (licensed journal articles etc.).

In University of Calgary’s case, less than 2 1/2 million metadata records, from ILS, institutional repository, digital collections, archives (EAD) and museum, are only a fraction of the total 225 million records. Relevance ranking is based on word frequency, and the much larger full text article data (average 15 kb compared to 1.5 kb for a MARC record) skews the results even more.

University of Calgary is aiming get more representation of records for its local resources through enhancement of indexed terms that appear in facets.

  • They have improved on the basic mapping (to Summon’s internal MODS format) to capture more data from their MARC records.
  • Working with Dublin Core remains a challenge – it’s difficult to handle controlled vocabulary terms; they are looking at qualified Dublin Core and other options.
  • Wood foresees a need to draw from richer resources elsewhere and to merge data for print and full text, where available, to create an “Uber-record” optimized for discovery; this kind of service may not be possible for an individual institution, but would be something libraries could exert pressure to make happen.
This entry was posted in ALA Midwinter 2010. Bookmark the permalink.

One Response to Recent Trends in Catalog Architecture: ALCTS Catalog Form and Function Interest Group

  1. Andrew Nagy says:

    It’s great to see Aaron talking about Summon with the Catalog Form and Function interest group.  The implementation at University of Calgary has been quite incredible considering all of the different institutional repositories from their collections that have been ingested into Summon to offer discovery of not only the University’s subscription content, but also their cultural collections.  To clarify, the relevancy ranking within Summon is based on term frequencies as Aaron said, but many other algorithms are used as well, such as field weightings and inverse term frequencies to counterbalance any skewing from the full text content.  Additionally, each record in Summon has a static rank that allows for balancing records from one collection against another to ensure all records are treated equally.

Leave a Reply

Your email address will not be published. Required fields are marked *