FAST and not so FAST faceting in digital collections

My library recently migrated away from a vendor-based digital asset management system to a homegrown system built with open source components. For additional background, check out this article in D-Lib, which addresses an aspect of the migration. We also recently published an article focusing on the tools and processes we used to migrate our metadata. While we did do some metadata clean-up prior to migration, there’s still a great deal of work to do with metadata remediation and enhancement after the migration. In our new system, one of the features I was particularly eager to try out was the ability to add custom facets for different collections. I developed a workflow for a part-time student to work on enhanced faceting. We’ve been experimenting with adding FAST headings to many of our oral history collections such as Interviews with Jews in Utah and the Carbon County Oral Histories. Right now, the default display shows all the facets in place for a collection, but showcasing just the top facets by record count with an option to expand is part of the future development plan for the system.

There are a few collections where just going only with FAST headings didn’t make sense, and I thought I would highlight them in this post and ask to see what other people might be doing with custom facets for their digital collections. One of the developers at the library, Alan Witkowski, implemented custom faceting for our Sanborn maps collection, where patrons can browse by year and by location. A librarian, Jessica Colbert, recently completed metadata enhancements in our Football Videos collection, which blends FAST headings for the teams, along with high interest facets specific to that collection such as “Away Games” and “Losses”.

Just having the ability to easily create custom facets for digital collections when we weren’t able to do that before is opening up new possibilities for digital collections at the University of Utah. For those of you who have added FAST headings to your digital collections, have you also run into situations where you wanted to add some additional faceting terms? What were your strategies for doing so? Feel free to share here, in a future guest post or in comments on this blog!

Anna Neatrour is the Digital Initiatives Librarian at the University of Utah J. Willard Marriott Library. You can find her on twitter as @annaneat.

Posted in General | Tagged , , | Leave a comment

Twice the Metadata is Twice the Fun!

Join the ALCTS Metadata Interest group for exciting programming at both our annual program and annual meeting!

Annual Program

The ALCTS Metadata Interest Group will be sponsoring Metadata Migrations: Managing Methods and Mayhem on Sunday June 25th from 3-4 pm in Room W185bc.  During this time, come hear experiences from the front lines with presentations from Maggie Dickson-Metadata Architect from Duke University Libraries; and Gretchen Gueguen-Data Services Coordinator from DPLA.  Looking forward in seeing you all in Chicago.  Do not forget to add this event to your ALA Conference Scheduler.

Maggie Dickson
Metadata Architect
Duke University Libraries

Title: Looking Back, Moving Forward: Remediating 20+ Years of Digital Collections Metadata

Abstract: In 2015, DUL began the process of migrating its digital collections to the Duke Digital Repository, a Fedora/Hydra/Blacklight-based platform. In preparation for this migration, we undertook a large-scale analysis and remediation of metadata describing approximately 112,000 items, created over the course of twenty years, by many different people, and using many different schemas and standards (or not). We formed a task group to make decisions, identify and engage stakeholders, and guide the workflow. This involved reviewing existing properties and values and evaluating the adoption of standards and vocabularies, with an eye toward linked open data and sharing our resources with the DPLA and beyond. The remediation itself (which at the time of this proposal is ongoing) is being completed using OpenRefine, scripting, and many good old spreadsheets. This presentation will describe the process, its challenges and successes, and future directions.

Gretchen Gueguen
Data Services Coordinator
Digital Public Library of America

Title: The Never-Ending Migration

Abstract: What if all you did was migrate metadata from one system to another? In a sense, that is what metadata mapping at DPLA is like. The first 2.5 million records were harvested and mapped in 2013 from 500 initial partners. Since then DPLA’s collection has grown to nearly 15 million records from more than 2000 contributing institutions. Since the project relies on metadata harvesting and synchronization, metadata is continually being harvested and mapped. This presentation will explore the tools and techniques that DPLA uses to analyze and map metadata from a variety of standard and bespoke metadata formats into a normalized application profile. Recently DPLA has been developing a new open source tool that can be used by anyone to harvest and map and analyze metadata from common data sources such as OAI feeds. Work on the creation of these tools as well as data quality efforts at DPLA will be reviewed.

Annual Meeting

Join the ALCTS Metadata Interest Group in Chicago for our meeting at ALA Annual 2017 at McComick Place, Room W102A, Sunday, June 25, 8:30 AM – 10:00 AM. We will have a presentation by the ALCTS/LITA Metadata Standards Committee on evaluating metadata standards, followed by our business meeting and election. Please join us!

Evaluating Metadata Standards – Principles into Practice

Jenn Riley, Lauren Corbett, and Erik Mitchell will present on their work in the Metadata Standards Committee in applying the principles (http://metaware.buzz/2016/08/04/principles-for-evaluating-metadata-standards/) to an example standard (the NISO Sample Tag Suite).  The principles for evaluation were developed in 2016 to give metadata communities a common tool to explore standards design.  The team will discuss the process for identifying standards to evaluate and approach to reviewing standards as well as the outcomes, lessons learned and next steps for the metadata principles.

Executive Committee Elections

The ALCTS Metadata Interest Group has the following offices open for election:

  • Vice-Chair/Chair Elect (Vice-Chair 2017-2018, Chair 2018-2019)
  • Program Co-Chair (2017-2019)
  • Secretary (2017-2019)

Terms are two years and begin following ALA Annual 2017. Officers must be able to commit to attending both ALA Midwinter and ALA Annual during their terms.

Elections will be held during the Metadata Interest Group meeting on Sunday, June 25th, 8:30 am to 10:00 am, McCormick Place W179b.

Anyone interested in standing for election to one of these offices is invited to get in touch with Mike Bolam (mrbst20@pitt.edu) and/or Liz Woolcott (liz.woolcott@usu.edu) prior to ALA. Please feel free to contact us if you have any questions or wish to announce your intent to run in advance. Additional nominations will be taken prior to the election at the meeting.

For more information on the roles and responsibilities of the positions, see our announcement on ALA Connect: http://connect.ala.org/node/266318.

Posted in ALA Annual 2017, Conferences | Leave a comment

Reminder: ALCTS Virtual Preconference, June 6-7

Join ALCTS for “Diverse, Inclusive, and Equitable Metadata,”  a virtual program in two sessions:

  • Session 1, Outreach and Inclusivity in Digital Libraries and Institutional Repositories, Tuesday, June 6, 2017, 1:00 p.m. – 2:00 p.m. CT
  • Session 2, Metadata Creation and Remediation in Zine and Digital Library Collections, Wednesday, June 7, 2017, 1:00 p.m. – 2:00 p.m. CT

For more information, or to register, visit http://www.ala.org/alcts/events/ac/2017/vc.

Posted in ALCTS Virtual Preconferences | Leave a comment

Metadata Librarian’s Little Helper: OpenRefine Reconciliation Services

This is the third in our series of follow-up posts by Midwinter Lightning Talk presenters.


When our archive opened to the public two years ago, the catalog of nearly 5,000 records were findable by keyword search and little else. The data was devoid of authorities and controlled vocabularies, and had not been compiled into finding aids. Attempting a migration from our legacy records management software to ArchivesSpace involved a great deal of cleanup, and authority reconciliation proved to be the most challenging part.  A specific feature in OpenRefine called Reconcile-csv helped with this task.

Since our legacy metadata lacked authority control, the issues were predictable: corporate and personal names were inconsistent, acronym-filled, and sometimes included related terms in parenthesis. Also, the records did not link to each other, so the same name could be formed differently in an authorities record, collection record and accession record. Cleaning up the names was going to be a large task since it meant addressing inconsistencies everywhere that a name appeared. Thankfully,  OpenRefine’s reconciliation service can automate some of this work: by plugging in a URL, the application will match data in your spreadsheet against a controlled vocabulary on the web, such as the Library of Congress (LC) Authorities.

Our plan was to export batches of 100 names from our catalog’s name authority file, which consisted of 1,700 names. Next, we would reconcile against LC Authorites in OpenRefine. Another great thing about the reconciliation services is that it can pull in other values from the controlled vocabularies, such as URIs. In doing this, our cleaned authority records could incorporate linked data.

OpenRefine’s reconciliation service is an amazing feature, but our metadata was in such rough shape that this stage took longer than anticipated. The name reconciliation matched or suggested matches for half of our names, and the remaining names either did not exist in LC Authorities, or they were so messy that no match could be found. Also, the names that were matched had to be evaluated for accuracy to make sure that our Michael Smith is the same person as the matched Michael Smith. On average, the reconciliation service took two minutes to run on 100 records, but the evaluation stage took an hour.

Our name cleanup also required us to standardize local names according to RDA. Since the bulk of our collection consists of university records, we had many variant spellings of university departments and offices. With an intern’s help, I made these edits in our name database, and added them, with a unique identifier, to a Google sheet that was to become our local name authority.

Now, our name authority records were clean, and ready to be imported into ArchivesSpace. But we were far from finished – all of the names within our collection and accession records were still a mess. Did this meant that we had to repeat the entire reconciliation process for these other record types? It did not, thanks to Reconcile-csv, a free reconciliation service developed by Open Knowledge Labs.

Two documents were the by-product of the work that we had just done: (1) a list of names from our catalog matched to LC authorities with their identifiers, and (2) a list of RDA-formed local names. Putting these together essentially gave us a master CSV of authority-controlled names. Now, we could use reconcile-csv, which matches data against a local CSV file. So, instead of matching our remaining messy names against the millions of entries in LC Authorities, then manually cross-referencing names against our local name authority, we simply matched against our master spreadsheet of 800 authoritative (local and LC) names that had been reconciled and evaluated for accuracy. This time, our match rate was higher and more accurate than with the LC Authorities reconciliation. As a result, our evaluation stage took significantly less time – 15 minutes per 100 records instead of an hour.

The Open Knowledge Labs website offers a simple 3-step instructions for downloading and running reconcile-csv from the command line. It behaves like any other OpenRefine reconciliation service: you can specify the column to be matched, view matches and suggested matches, and pull in other values from your master spreadsheet, like URIs.  Reconcile-csv is great for metadata that requires a lot of authority reconciliation, since compiling a segment of authoritative terms will make that reconciliation go much faster. For that reason, it’s really helpful for subject term reconciliation as well.

Resources:

Reconcile-csv: http://okfnlabs.org/reconcile-csv/
Reconcile-csv in GitHub: https://github.com/okfn/reconcile-csv
Reconcilable data sources for OpenRefine: https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources
How to use OpenRefine reconciliation services: https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation
Slides: https://goo.gl/3N0Gw3

Greer Martin, Discovery & Metadata Librarian, Illinois Institute of Technology

Posted in ALA Midwinter 2017, Conferences | Leave a comment

Metadata Interest Group Meeting at ALA Annual 2017

Join the ALCTS Metadata Interest Group in Chicago for our meeting at ALA Annual 2017 at McComick Place, Room W102A, 8:30 AM – 10:00 AM. We will have a presentation by the ALCTS/LITA Metadata Standards Committee on evaluating metadata standards, followed by our business meeting and election. Please join us!

Evaluating Metadata Standards – Principles into Practice

Jenn Riley, Lauren Corbett, and Erik Mitchell will present on their work in the Metadata Standards Committee in applying the principles (http://metaware.buzz/2016/08/04/principles-for-evaluating-metadata-standards/) to an example standard (the NISO Sample Tag Suite).  The principles for evaluation were developed in 2016 to give metadata communities a common tool to explore standards design.  The team will discuss the process for identifying standards to evaluate and approach to reviewing standards as well as the outcomes, lessons learned and next steps for the metadata principles.

Email

Posted in ALA Annual 2017, Conferences | Tagged | 1 Comment

Metadata Migrations: Managing Methods and Mayhem

Are you preparing to migrate out of a legacy system?  Do you have questions about metadata remediation, repurposing, or enhancement?  Of course, you do and we are here to help.  During ALA Annual in Chicago, The ALCTS Metadata Interest Group will be sponsoring Metadata Migrations: Managing Methods and Mayhem on Sunday June 25th from 3-4 pm in Room W185bc.  During this time, come hear experiences from the front lines with presentations from Maggie Dickson-Metadata Architect from Duke University Libraries; and Gretchen Gueguen-Data Services Coordinator from DPLA.  Looking forward in seeing you all in Chicago.  Do not forget to add this event to your ALA Conference Scheduler.

Title: Looking Back, Moving Forward: Remediating 20+ Years of Digital Collections Metadata

Presenter: Maggie Dickson, Metadata Architect, Duke University Libraries

Abstract: In 2015, DUL began the process of migrating its digital collections to the Duke Digital Repository, a Fedora/Hydra/Blacklight-based platform. In preparation for this migration, we undertook a large-scale analysis and remediation of metadata describing approximately 112,000 items, created over the course of twenty years, by many different people, and using many different schemas and standards (or not). We formed a task group to make decisions, identify and engage stakeholders, and guide the workflow. This involved reviewing existing properties and values and evaluating the adoption of standards and vocabularies, with an eye toward linked open data and sharing our resources with the DPLA and beyond. The remediation itself (which at the time of this proposal is ongoing) is being completed using OpenRefine, scripting, and many good old spreadsheets. This presentation will describe the process, its challenges and successes, and future directions.

Title: The Never-Ending Migration

Presenter: Gretchen Gueguen, Data Services Coordinator, Digital Public Library of America

Abstract: What if all you did was migrate metadata from one system to another? In a sense, that is what metadata mapping at DPLA is like. The first 2.5 million records were harvested and mapped in 2013 from 500 initial partners. Since then DPLA’s collection has grown to nearly 15 million records from more than 2000 contributing institutions. Since the project relies on metadata harvesting and synchronization, metadata is continually being harvested and mapped. This presentation will explore the tools and techniques that DPLA uses to analyze and map metadata from a variety of standard and bespoke metadata formats into a normalized application profile. Recently DPLA has been developing a new open source tool that can be used by anyone to harvest and map and analyze metadata from common data sources such as OAI feeds. Work on the creation of these tools as well as data quality efforts at DPLA will be reviewed.

Posted in ALA Annual 2007, Conferences | 1 Comment

Metadata Interest Group: Call for Nominations 2017

The ALCTS Metadata Interest Group has the following offices open for election:

  • Vice-Chair/Chair Elect (Vice-Chair 2017-2018, Chair 2018-2019)
  • Program Co-Chair (2017-2019)
  • Secretary (2017-2019)

Terms are two years and begin following ALA Annual 2017. Officers must be able to commit to attending both ALA Midwinter and ALA Annual during their terms.

Elections will be held during the Metadata Interest Group meeting on Sunday, June 25th, 8:30 am to 10:00 am, McCormick Place W179b.

Anyone interested in standing for election to one of these offices is invited to get in touch with Mike Bolam (mrbst20@pitt.edu) and/or Liz Woolcott (liz.woolcott@usu.edu) prior to ALA. Please feel free to contact us if you have any questions or wish to announce your intent to run in advance. Additional nominations will be taken prior to the election at the meeting.

For more information on the roles and responsibilities of the positions, see our announcement on ALA Connect: http://connect.ala.org/node/266318.

Email

Posted in ALA Annual 2017, Conferences | 2 Comments

Using MarcEdit to Retool Existing MARC Records of Paper Maps for Use in an Online Geoportal

This is the second in our series of follow-up posts by Midwinter Lightning Talk presenters.


The Michigan State University Libraries recently joined the Big Ten Academic Alliance Geoportal, a consortial online discovery tool for maps and geographic data. While the principal focus of the geoportal’s map-based interface is access to geospatial data for use in GIS applications, the geoportal also accommodates map-based discovery of digital scans of paper maps. Contributing our scanned paper maps to the geoportal requires submission of records suitable for the generation of ISO 19115-compliant metadata. To accomplish this, we devised a MarcEdit workflow using our existing MARC records for paper maps to create new MARC records for digital maps — which could then be delivered as MARCXML records to the geoportal staff, who used them to generate the ISO 19115 metadata for display in the geoportal. An additional benefit of the workflow was the creation of new MARC records for the digital scans, for use in our own library catalog.

We opted to start with MARC records for paper maps that have already been cataloged and scanned. The first step in our workflow was deciding which MARC fields could be programmatically edited using the paper-based record as a starting point, and which fields would require human review with manual entry.

Examples of programmatic changes included:

  • changing the 300$a field to “1 online resource”
  • changing some coding in the fixed fields
  • changing the 338 field’s carrier type to “online resource”
  • adding 655_7 “Digital Maps.”

Examples of manual edits applied after new records were generated in MarcEdit included:

  • conversion to RDA standards, including spelling out abbreviations and removing brackets in titles
  • removal of FAST headings so as to trigger OCLC’s process for automated re-analysis and re-application of FAST headings
  • miscellaneous punctuation and formatting issues.

Some fields, such as a 776 linking back to the original paper-based record, could be created programmatically for the new scan-based record, but required human review afterward. Our complete spreadsheet of changes can be viewed here.

As a result of this project, MSU Libraries now has 44 maps represented in the geoportal. An example geoportal record may be viewed here, and its corresponding record in the MSU Libraries catalog may be viewed here. We are happy with our initial results, although in hindsight we would have adhered to the PCC Provider Neutral Guidelines, and we have modified our procedure to do so in the future. The MSU Map Library staff are also pleased with our results, and we are excited to apply our workflow to additional records.

Tim Kiser, Special Materials Catalog Librarian
Nicole Smeltekop, Special Materials Catalog Librarian

Posted in ALA Midwinter 2017, Conferences | Leave a comment

Overcoming the Challenges of Implementing Standardized Metadata Practices in a Digital Repository

This is the first in our series of follow-up posts by Midwinter Lightning Talk presenters.


Many of you probably are or have worked with Metadata Librarians. So, what is it that a Metadata Librarian normally can do? (1) A Metadata Librarian normally can perform Authority control of entities, including personal and corporate names. (2) She or he can freely use controlled terms from the thesaurus and subject lists including FAST terms, as well as keywords. (3) She or he can freely add a field, delete a field, change a field label and adjust its mapping. (4) She or he can adopt Standard Controlled Vocabularies including Library of Congress linked data values. (5) She or he will follow certain cataloging or metadata guidelines such as those of Dublin Core, DACS, RDA or MODS. Ideally, she or he can add linked data URIs such as for personal and corporate names, places, or, can verify the linked data URIs added by the system.

However, in working with some repositories, trying to implement normal or standardized practices can be a challenge. For example, some repositories don’t provide the authority control option. Some use the name that the author enters for his/her account; they allow controlled names to be added as additional fields, but not indexed, not display under “browse by names.” Some repositories prefer keywords or their own taxonomy to subjects; they don’t display thesaurus terms or subjects as a facet, nor as a “browse by” option. Furthermore, catalogers may lose control of the fields; in some hosting solutions, a cataloger needs to make a request for any field changes to the vendor rep through a repository administrator. Another issue is that the harvesting, for example, to OCLC Digital Gateway can be inadequate.

You may wonder why all these changes. It looks like one reason is that the focus has switched from librarians to users; some repositories promote researchers’ assigning keywords to their own materials more than catalogers’ assigning subjects. Some repositories and libraries replaced traditional methods with other ways, for example, they have their own taxonomies; they also adopt research identifiers such as Open Researcher and Contributor ID (ORCID); some libraries prefer institutional knowledge management names to LC authorized names. Essentially, some are designed more for authors, rather than catalogers or librarians. Also, we may have heard some folks say something like: Users don’t like library of congress subject terms, especially those with subdivisions. Authors don’t like controlled names etc. Are these more myth or fact? It’s worth considering. Today we will not address the various endeavors going on in the library community such as Bibframe led by the LC, but rather on the digital repository arena in general. Will metadata librarians and catalogers remain a strong stakeholder in the field of information description and access along with users, authors, vendors, information service or subject librarians?

For a Metadata Librarian, what are some of the responses to these challenges? The most important of all is probably working with other librarians including those from Digital Initiatives and Special Collections, and also subject librarians and authors. Some of the practical practices include: Establishing templates for different types of collections in the digital repository; Adding additional fields for controlled terms such as names besides the fields with uncontrolled or local values; Adding additional fields for linked data URIs; Duplicating or tweaking values for certain fields to get them harvested; Accepting author metadata, and keeping the option of metadata review open; and, making more requests to the vendor representative.

My final reflection would be the definition for “standardized Metadata Practices” changes over time and in the landscape of many systems and stakeholders; it is not quite clear whether a blend of more traditional practices and changed practices is a very good approach; are we gaining or losing anything by moving away from the traditional methods; and finally, it is beneficial when librarians work together and also with the vendors to make careful, practical and conscientious choices.

Slides are available at: connect.ala.org/files/2017MidWinter_Metadata_Deng.pptx.

Sai Deng, Metadata Librarian, University of Central Florida Libraries

Posted in ALA Midwinter 2017, Conferences | Leave a comment

Coming Soon: More on Metadata Lightning Talks

Beginning next week, watch for posts written by our very own Metadata Lightning Talk presenters at ALA Midwinter.  They promise to be both interesting and informational!

Posted in ALA Midwinter 2017, Conferences | Leave a comment