Workflow Tools for Automating Metadata Creation and Maintenance

Workflow Tools for Automating Metadata Creation and Maintenance
Saturday, July 11th, from 10:30 a.m. – 12:00 noon. Sponsored by ALCTS, co-sponsored by LITA.

As digital projects become less peripheral and more integral to library
operations, institutions must begin to address the implications of this
change. With the increasing amount of digital content libraries are
expected to create and maintain, data curation has emerged as a key
objective. Intended for librarians who are involved with the
development and management of metadata, this session will present
examples of current work and discussion opportunities for
collaborative development of tools among institutions.

Slides available at ALA Conference Materials Archive.

Herding Cats
Ann Caldwell, Coordinator, Digital Production Services, Brown University.

“Herding cats” seems like a fascinating title. As Caldwell explained,
she spent lots of time working with faculty. Besides one-to-one basis
meetings, she recently worked with the entire engineering division to
assist their re-accreditation process. During this process, a set of tools
were developed to allow faculty and users to easily contribute digital
objects to Brown’s repository. (In the accreditation case, the “cats”
may also refer to the digital objects, “materials needed to be
deposited for the accreditation team: syllabus/outline, website,
homework, lab reports and graded student work, project and graded
student work, exams and graded student work and student
assessments…”) The set of tools include a file uploading folder system,
a MODS editor and a file tracking system.

Caldwell emphasized two problems in dealing with digital objects in
Brown and the tools developed to tackle these problems. The first one
is to keep track of materials. The solution is “Project Manager”, a
previously developed system that can help track the engineering
accreditation materials as well. This pretty sophisticated tool “tracks
projects, equipment, software, users, as well as processes”. The
second problem is to create metadata which is not a problem for
digital services but for faculty and bibliographers. They wanted to
create a user-friendly metadata editing interface that can hide the xml
encoding. The result is a MODS editor: it can list a couple of fields
including required fields, add restrictions to the names (e.g. define
type: personal, corporate, conference and the role) and to the date,
and allow viewing of the raw xml file. Before metadata can be added,
they built a file uploading folder system to assist file deposit. The
professors can create communities and have their personal folders.
An example of a community can be all the classes a professor teaches.

She discussed the project workflow including both behind the scene
and in front of the scene processes. The first half can be summarized
as: authentication, user uploading file, “item digitized” and recorded
in the tracking system, JHOVE validating the file and MIX record
created, tracking system marking “metadata created”, a bundle folder
created, and MODS record saved to this bundle folder. Now the
bundle is ready to load. An API program will automate file uploading
and allow querying of the folder. Next, the bundle will be scripted into
a METS record. Currently they are developing a Fedora ingester “that
will suck in the METS record and spit out a FoXML record” to create
the Fedora object. Finally, the system will “detect new Fedora object
and automatically update SOLR index”.

Caldwell indicated that the Engineering Department will continue
using the system to deposit their digital materials for future
accreditation due to the success of the project, and other
departments might try the same in the future.

Using Schematron for Analyzing Conformance to Best Practices for EAD, TEI, and MODS(and some other thoughts on workflow tools)
Jenn Riley, Metadata Librarian, Indiana University Digital Library Program.

Riley works with System design and programmers. She helps to design the system and one of her visions is to make metadata creation systems work well. Her experiment is an example of those tools that could improve workflow in a larger environment.

She first provided some context on why they implement Schematron to analyze file conformance against guidelines. She suggested that one of the biggest challenges in text encoding is metadata consistency, a component of quality. While it is easier to make data centric xml (e.g. MODS, DC…) consistent (because you already have the fields and you can decide what to put in), it is much harder to encode document centric xml (e.g. EAD, TEI…) consistently. In the latter case, the text is available already and needs to be marked up. At Indiana University Libraries, they do lots of TEI and EAD encoding. They work with xml directly instead of using Archivist’s Toolkit. There are some tools available to help achieve consistency, such as schema validation in XML editor, tag libraries, xml templates, examples, workflow documentations and guidelines. In developing the Schematron plug-in, they got inspiration from RLG EAD report card, which takes EAD guidelines and defines them in a machine readable way. It is an online tool downloadable to a server or a desktop, and it will report problems of the xml documents against those guidelines.

Riley elaborated on the Schematron plug-in in Indiana University Libraries and how it works. They added Schematron checks into the xml editor Oxygen and check files against their local guidelines. The schematron technology was wrapped as a Java plug-in in Oxygen. They call it XTF Validator. Actually it only performs one additional layer of validation. She showed some of the errors and warnings produced by the validator and said that the correct expressions could be copied and pasted to the original xml file. In her TEI poem sample, the plug-in reports that the page break needs to have an id attribute and the id attribute must match a certain pattern.

She further explained how schemetron technology works and how to implement the package. Schematron is an xml assertion language, and it can make an assertion on how an xml document should look like. It organizes into patterns, patterns have rules, and rules have context (e.g. an EAD header). Assertions exist within rules. A user can define rules and tests which were written in XPath language and the tests will generate error reports. The schemetron website is at http://www.schematron.com and the software is downloadable. It runs under XSLT 1.0 and 2.0 processors, both of which include a set of stylesheets running in sequence. The result is a schematron validator file. When running the instance document against the validator file, an xml report will be produced. This report can also be rendered as html pages and several reports can be combined on a repository level.

After showing Indiana University’s experiment with Schematron, Riley put Schematron in a larger context and showed an example from DLF Aquifer. DLF collects MODS from different institutions and has guidelines for those MODS files. They come up with levels of adoption (and requirements) and make the guidelines machine readable. An interface using schematron technology was created for contributors to check their records.

Finally, Riley discussed some general issues related to metadata workflow and tools. For example, what should new workflow look like? She emphasized on automating, streamlining and validating. She suggested that tools should be configurable, modular, connected with other tools and sharable among different institutions and environments. She also touched on some related issues such as usability of the cataloging tools and user interfaces.

This entry was posted in ALA Annual 2009. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *