Metadata Interest Group, Sunday, June 26, 2011
Rather than specific presentations, the Metadata Interest Group organized a series of concurrent discussions at round tables, each focusing on a different topic related to metadata.Â A summary of the discussion at each of the tables follows.Â I sat at the Metadata Quality Control table, so I have the most detail about that table.
Metadata Quality Control
Discussion covered a wide number of areas that touched upon the topic.Â First, we defined quality control as in general, making sure record is free of errors in descriptive metadata and that technical metadata is correctly recorded. Many libraries use students to input metadata, so the discussion first centered on quality control for student-generated metadata.Â Student work is generally reviewed and checked more frequently than cataloger work.Â At Baylor both student workers and catalogers are encouraged to take ownership of their own quality control.Â A database maintenance librarian checks catalogerâ€™s work and answers questions.Â To reduce errors, catalogers and students create metadata in one shift and review for quality control the beginning of their next shift.Â It can be difficult to keep an eye of different people because many people in many places working on metadata. Â The University of North Texas has a system to randomly check metadata records to for quality control, but it is a homegrown system and not able to be shared with others.
Quality control also is necessary for catalog clean-up and implementation of new discovery tools.Â For example, Saint Joseph University discovered that they needed to update MARC to make facets to work in their new implementation of Summon.Â At Penn State, as the library is trying to build search interfaces to run across all collections, they have had to do retrospective work to bring older metadata up to current standards.Â The Triangle Research Libraries Network is performing a pilot project to put digital collections in SearchTRLN, their consortial catalog, and finding a number of inconsistencies between institutions.Â It also leads to the unresolved question: Do we want collections in the catalog?
Maintaining metadata is another challenge.Â Structures set up for maintenance in a MARC environment are still being experimented with in non-MARC metadata.Â Most libraries do not create name authority records when cataloging in non-MARC metadata. The University of Alabama uses Schematron to make sure technical metadata is correct.Â They have experimented with MADS in a separate file and would like to put the VIAF URIs directly into the MODS records for ease of updating.Â North Carolina State University has a name authority file for its Electronic Resource Management system, but no good way to update ERMS records using the authority file.Â There are problems syncing metadata that moves between systems, such as metadata extracted from the Archivistâ€™s Toolkit used to generate MARC records.Â If the MARC record is updated in OCLC, there isnâ€™t an automated way to enter changes in the Archivistâ€™s Toolkit. The eXtensible Catalog Metadata Services Toolkit will review MARC records and generate a list of list of problems with records, but they must be corrected using another program.Â Some libraries have been experimenting with embedded metadata as a way to transport metadata from one system to another, but no one had a production system in place.
The National Agricultural Library gets a lot of metadata from publishers and would like to have some way to be alerted of new data and be able to claim it if a record set isnâ€™t received on time and complete.Â MarcEdit is one tool for formatting and cleaning up MARC metadata.
The group also discussed electronic theses and dissertations.Â At NCSU, students submit ETD metadata in DSpace, which is transformed to MARC using XSLT and cleaned up.Â Several universities use LDAP to pull information in the records, including department, student name, and advisor name.Â Baylor enhances records with keyword subject headings (not LCSH).Â Records are sent to OCLC and the OCLC number is inserted back into DSpace. The University of Alabama transformed data from ProQuest into MODS.Â Some libraries, including Baylor, perform authority control. The University of Arkansas fully catalogs ETDs using student transmittal form, which comes with a student birthdate that is entered into an authority record.
The discussion also covered handling the quality of metadata for non-collection images, e.g., event photographs of events held at the library.Â Libraries have done different things to try to promulgate metadata standards outside of the library.Â The University of Virginia created a guide for faculty creating their on metadata, Penn State manages non-collection images within a content management system, and NCSU provided a list of controlled vocabularies that could be used.Â Also at NCSU, there has been work done on a data normalization project for GIS metadata.Â There were not enough resources to perform complete do quality control, so they chose a middle ground of keywords, and tried clean up into a â€œsort-ofâ€ controlled vocabulary, identifying some synonyms, singular vs. plural, acronyms, etc.
In summary, we are still lacking systematic ways to perform quality management behind the scenes with our current systems.
Metadata Schema Selection
Most people in this group use DSpace or CONTENTdm and their schema choices are dictated by the software.Â Within ARTStoreâ€™s ShareShelf, users can use any metadata schema that you want, but that is unusual.Â Questions to ask when selecting a schema: how will you arrange this collection?Â What kinds of information do you need?Â What content management system do you have?Â What kind of work do you have to do to make it play well?Â Who will be doing the work of manipulating and correcting metadata and what kind of a background will they have?Â At one institution, collection curators had a large say in what kind of schema was chosen because they already had a well-developed descriptions of their objects.Â It is also important to identify specific needs in collections and make sure that attributes of certain objects are brought out.Â When receiving metadata in a locally developed schema, be wary of fields that are implemented or entered in such a way that they wonâ€™t make sense beyond the local context, e.g., flags for yes/no in a local metadata implementation are not helpful outside of their collection.
The following questions were discussed by the group: when getting text, video, audio, do you treat it in a like manner or do you break out by material type?Â Rights and access issues are especially problematic in video/audio/data sets.Â In an interesting example, Rice University faculty didnâ€™t want their faculty recitals shared because they felt like they played poorly.Â The group spent a long time discussing data sets, as they are a huge unknown that no one quite knows how to deal with.Â Mandates for data management from the National Science Foundation and National Institutes of Health will require developments in this area.Â Different disciplines affect the metadata elements that researchers want to share and disseminate in relation to their datasets. In general, it is best to have metadata people involved from the beginning of a project to be able to create the best metadata.
Metadata Creation and Migration Tools:
These two tables covered a lot of the same material so their report is merged into one. Electronic and dissertations were also discussed here.Â The University of Virginia merges information from students and professors into ETD description.Â Texas A&M discussed using Verio and moving MODS records into their ILS, but there was a lot of hand-holding to get records in shape.Â Diacritics are a problem when trying to move from one system to another. Libraries are looking for workflows for embedding metadata into files, and ImageMagic was suggested as a tool.Â The group also discussed holdings data: how do we share our holdings without all having local copies of the complete records?Â Holdings data in the ILS is typically in MARC Format for Holdings Data (MFHD) and not accommodated by vendors.Â What kinds of tools do we need for metadata, and what tools would benefit a wide range of institutions?Â Current tools are very tied to local situations and not easily shared. The question of how we help build an infrastructure where we can share things in the cloud came up, with a suggestion that the Digital Library Federation or Research Libraries Group might beÂ place to start these conversations.
As a migration strategy, linked data can match up data from different silos, thus avoiding time-consuming migration. Participants also raised the possibility of migrating MARC data into whatever new metadata schema that arises from the Bibliographic Team Framework
This concluded the discussion portion and was followed by the business meeting.
Reported by Kristin Martin