Intellectual Access to  Preservation Metadata Interest Group Meeting
June 23, 2012, 4:00-5:30 PM
ALA Annual Conference
The Metadata Interest Group held a series of two presentations on the topic of embedding and extracting metadata from digital objects. Full presentations, when available, are posted on ALA Connect: http://ala12.scheduler.ala.org/node/386.
Chris Lacinak
President, AudioVisual Preservation Services
Editing and Embedding Audio-Visual Metadata with MetaEdit
Building upon a foundation laid by the Federal Agencies Digitization Guidelines Initiative (FADGI), Lacinak presented two tools—AVI MetaEdit and BWF MetaEdit—built by a team at AudioVisual Preservation Services for embedding, editing, and exporting technical metadata for AVI (audio) and BWF (video) files.
Lacinak likened embedded metadata—metadata that is stored inside the same file, or a container that also stores the audiovisual essence to which the metadata refers— to a wallet. That is to say, embedded metadata isn’t analogous to your resume or bio, but like your ID provides the the critical information you need in case of catastrophe.
Embedded metadata already exists in born-digital files. Its significance to preservation is multifold, from providing a means to verify authenticity and integrity to, in some cases, providing some form of backup. Lacinak quipped, “To ignore it is to effectively de-catalog.” Still, he acknowledged how easy it is to ignore the existing embedded metadata in text documents, audio, video in both our personal and professional lives managing our digital files.
Take for example a photo of a scene on Airforce One. Its filename isn’t descriptive, but embedded metadata includes the name of the photographer, GIS information, camera settings in the digital object itself. The question becomes how do we make this embedded metadata work for us?
Files are encapsulated in file formats, which provide structure to the data composing a digital object. In the case of A/V file formats, both WAV files and AVI file format rely upon the RIFF file structure composed of “chunks,” including an INFO list chunk (various metadata elements) and a data chunk (the audio itself).
Lacinak presented two tools, BWF MetaEdit and AVI MetaEdit, to help people who work with A/V digital content to manipulate embedded metadata in the INFO list chunk to serve professional needs. AVI MetaEdit is for use with (audio) AVI files, while BWF is for use with (video) Broadcast Wave Format files.
Although there are some differences between the two tools, for the purposes of the presentation Lacinak emphasized their shared foundation and shared features. From a design perspective, both applications are built to support description recommended by the FADGI guidelines, and both are informed by the concept of PREMIS and PBCore events. From an implementation perspective, both applications are free and open source, work across platforms, offer both command-line and GUI interfaces, and support both singular and batch processing.
With the option of a simple spreadsheet-like interface, it’s possible to populate with CSV or XML to embed metadata in a file, edit existing embedded metadata, and extract embedded metadata. Other key features include the following:
- Validation—the ability to check integrity at the more granular level of file “chunks” rather than the full file.  Hash values can be stored externally or internally as a new chunk.
- Process history—a set of metadata elements that capture the “process history” of a digital object; particularly well-suited for reformatted digital objects but can be used for born-digital, as well.
- Quality control—the ability to read files that are illegal without writing illegal files and the ability to detect file truncation (as from an incomplete download or FTP session).
For more information, see the FADGI Guidelines, BWF MetaEdit on SourceForge, AVI MetaEdit on GitHub, or NARA’s Digitization Services Products and Services page.
Joan DaShiell
Product Manager for Digitization Services, Backstage Library Works
Discover the Technical Metadata in your Still Image Digital Files
Tipping his hat to the second presentation, Lacinak observed “At every point in the chain of the still image life cycle, people are aware of need for embedded metadata, starting with photographers who are rights holders. The A/V community sees the still image community as a beacon of hope!”
Dashiell began her presentation by posing the 5 reporter questions about metadata. Who should capture? What data? Where is it? Why do we want it? How do we get to it? Dashiell discussed these questions at a more granular level for each of the 5 different kinds of metadata and from the perspective of a digitization vendor for libraries.
Descriptive metadata is typically captured prior to digitization by the library working with Backstage Library Works in the form of Dublin Core, MODS, and/or MARC.
Structural metadata, such as the relationship between files in complex digital objects, is established during the digitization process. Backstage Library Works uses the CCS Doc Works system to create a METS file and push or pull metadata into the file. In addition to containing modules for descriptive and administrative metadata, a METS wrapper describes structure by pointing to, for example, the TIFF files and ALTO files that make up the still image and OCR text of a digitized issue of a newspaper.
Technical metadata is often found embedded in the digital file itself, such as TIFF, EXIF, or XMP tags. Dashiell noted that one practice at Backstage Library Works is to embed a file name as a tag, as well, because they’re so easily corrupted across operating systems. Other kinds of technical metadata include the following:
- Object info—identifier, file size, format, compressions, fixity, etc.
- Image info—height, width, color related data, etc.
- Capture info—device dependent (scanner, camera)
- Assessment info—spatial metrics, image color encoding, target data
Administrative metadata, including preservation metadata, is typically created after digitization and ingest to a repository.
…
After the conclusion of the two presentations, the Q&A session turned into a lively audience discussion. One attendee, a preservation specialist whose purview included both physical and digital preservation, compared embedded metadata to microfilm targets. Another attendee likened the file and embedded metadata, associated METS object, and system management features such as randomly generated unique IDs in a mediated storage system to “Dante-esque” concentric circles of preservation spaces. High stakes indeed!