ALA Annual 2019 Presentation and Q&A

Hello ALCTS Metadata Interest Group blog! Rick Fitzgerald and Grace Thomas here – we are librarians at the Library of Congress and recently gave a talk about describing web archives at the ALA Annual ALCTS Metadata Interest Group Meeting!

We first want to thank Anna Neatrour and the ALCTS Metadata Interest Group board for having us and, second, to everyone who took time out of their busy conference schedule to attend our talk. Web archives are becoming increasingly prevalent in libraries, archives, and scholarly research, so we are excited about the interest in our work. Anna and Tillay invited us to share our slides, for anyone who would like to review or missed the session.

Additionally, we wanted to address some questions at the end for anyone who wasn’t able to attend. Please forgive us for paraphrasing the questions and also for paraphrasing our own answers!

Q&A:

Q: Quality review for web archives is challenging because of the scale, how does your program approach it?

A: We can’t look at everything, so we do as much as we can. This is an issue throughout the web archiving community and there have been efforts to explore automated quality review, including a workshop held by the International Internet Preservation Consortium (IIPC) this year dedicated solely to brainstorming quality review solutions. If those tactics advance into some kind of software, we would love to implement it, but for now we look at reports from the crawler and click through as much as we can.

Q: You mentioned datasets, and there is an issue among the community of retaining provenance information and scope notes, how does your program handle this?

A: This is a community-wide issue, also, and we have varying levels of provenance information. First, our curatorial data in Digiboard is the record of selection for a particular URL (in it, the selecting librarian must assign the URL to a collection and provide a justification). Second, once a URL is approved to go to crawl, our team assigns scopes telling the crawler what it is allowed to crawl as part of this URL (for example: social media or CDNs which might host embedded content found through the main URL) and what it is restricted from crawling (we don’t want to crawl all of social media, perhaps just a particular related profile or page). Third, we have the crawl logs from the crawler which has very rich metadata showing the path of how a certain page came to be captured: response codes from the server at the time of crawl, the MIME type of the crawled resource, capture timestamp, and size of the resource, for example. Since we do not have a legal mandate to crawl, our (very) complicated permissions process makes releasing the crawl logs publicly impossible right now. However, with time, perhaps more of this provenance data can become part of publicly released datasets. For now, check out the ones we have publicly available here.

Q: Is Digiboard open-source?

A: Unfortunately, no. Digiboard is a home-grown tool that works specifically for our scale (tens of thousands of URLs crawled at varying frequencies), complicated permissions process, complicated selection process (with over 200+ potential selecting librarians), organizational structure, and our current method of quality review. If you wish to begin web archiving, there are subscription-based services which take care of all behind-the-scenes technical work (maintaining and running the crawler, indexing the content, maintaining the indexes, maintaining a version of the Wayback Machine and specific accesspoints for collections, etc). Many national and regional libraries, archives, and university libraries throughout the world successfully use these kinds of services to perform web archiving!

Q: How will the sidecar records relate to the minimal records?

A: The sidecar MODS XML files will sit on the same server as the minimal MODS XML files (separate files). During the ETL (Extract Transform Load) process to convert the information from MODS XML into the Library’s Solr index for loc.gov, the two files will be merged into the pages you see on https://www.loc.gov/websites/ based on identical ID numbers.

For more information about the backlog we released last year, please see the Library of Congress Signal blog post: More Web Archives, Less Process, written by Grace. Also, if you are interested in getting updates on our work as we write about them or any other digital library news from the Library of Congress, bookmark The Signal!

For any other questions, please do not hesitate to send us an email, you can find our addresses at the end of the slide deck. Thank you again for giving us a platform to share our work and best of luck with future interest group activities!

Posted in ALA Annual 2019 | Tagged , , , , | Leave a comment

CC:DA Liaison Blog Post

After my recent appointment as the Metadata Interest Group’s liaison to the Committee on Cataloging: Description and Access (CC:DA), I reached out to outgoing liaison Jessica Hayden to inquire about her experience.  One of her principle recommendations was to use Metadata Interest Group (MIG) channels like the blog to grow awareness of RDA revision proposals an other CC:DA business relevant to MIG members, and to use these channels to get feedback and ideas from a broader segment of stakeholders within the community of metadata librarians.

But first a bit of background.  CC:DA “is the body within the American Library Association responsible for developing official ALA positions on additions to and revisions to RDA: Resource Description and Access.” For the past two years, changes to the RDA standard have been frozen because of the RDA Toolkit Restructure and Redesign (commonly known as the 3R Project), which will not only change the look, feel, and functionality of the RDA Toolkit, but also incorporate much of the IFLA Library Reference Model (LRM) into the text and structure of RDA.  In April, a “stabilized” English version of the new RDA text was released as part of the Beta RDA Toolkit. Neither this stabilized text nor the Beta RDA Toolkit will supplant the current version until the RDA Steering Committee (RSC) declares the 3R Project complete, which in all likelihood will not happen before the end of 2019.  In the meantime, the focus of the 3R Project will shift to translations and the development of policy statements, such as LC/PCC-PS.

In the meantime, addition and revision proposals to RDA are still frozen, although minor revision proposals, such as error or typo corrections, may be submitted.  However, the release of the stabilized text provides an excellent opportunity to review what will likely represent a major revision to RDA.  The incorporation of the LRM should be of especial interest to our community at the Metadata Interest Group, as this stabilized text represents a further shift towards an entity-based approach, with many ramifications for linked data implementation.  If you have ideas for revisions and additions, I’d love to hear them–feel free to contact me directly at trm2151 [at] columbia [dot] edu.

Timothy R. Mendenhall

Metadata librarian at Columbia University Libraries, performing both traditional MARC cataloging and non-MARC work in the digital library collections. Participant in the PCC.  Formerly a processing archivist and still active as an art cataloger at the Frick Art Reference Library.

Posted in Standards and Guidelines | Leave a comment

Nominations are opened for leadership roles in the ALCTS Metadata Interest Group

Announcement:

The ALCTS MIG seeks nominations (self-nominations welcomed) for the following offices:

  • Vice-Chair/Chair Elect (Vice-Chair 2019-2020, Chair 2020-2021)
  • Program Co-Chair (2019-2021)
  • Secretary (2019-2021)

These positions are held for two years, and attendance to ALA Annual and ALA Midwinter is expected. Service duties begin July 1, and would run through June 2021. Continue reading

Posted in ALA Annual 2019 | Tagged | Leave a comment

Program Announcement: ALCTS Metadata Interest Group Meeting at ALA Annual 2019

Please join the ALCTS Metadata Interest Group during ALA Annual Conference in Washington D.C. for a presentation and Q&A on the Library of Congress Web Archiving Program on Sunday, June 23, 2019, 9:00-10:00AM, Marriott Marquis, Archives Room during our regular scheduled meeting. The speakers are Rick Fitzgerald, Acquisitions and Bibliographic Access Directorate of the Library of Congress, and Grace Thomas, Digital Collections Specialist for the Library of Congress. We are looking forward to seeing you all in Washington, D.C. Be sure to add this event to your ALA Conference Scheduler.

The Library of Congress Web Archiving program, in existence since 2000, has had varying approaches over the years related to creating metadata records and making them accessible. While the program has been in a state of continuous evolution since its inception, the past six years have seen significant advancement in how the Library conceptualizes a web archive, and is therefore able to describe it.

This re-conceptualization was integral for assigning appropriate permissions to archived web content and indeed led to more accurate description. However, the implementation was part of an overall system migration and was lengthy, resulting in a backlog of over 4,000 undescribed, inaccessible web archives. Additionally, with the Library’s 2017 Digital Collecting Plan, which calls for an expansion of web archiving, the Library’s event and thematic web archive collections have only continued increasing in number, depth, and breadth.

Their talk will explain the recent history of web archives description at the Library of Congress, the process to implement a new model of description, outline the current model represented publicly as Metadata Object Description Schema (MODS) records incorporated into the loc.gov Solr index, share the process and triumph of clearing the backlog, set our work among other institutions participating in web archiving and description of web archives, and finally look toward the future in describing research projects we hope will enhance our description.

Speaker Bios

Rick Fitzgerald is a Librarian in the Acquisitions and Bibliographic Access Directorate of the Library of Congress. He has been the primary cataloger for the Library of Congress Web Archives since 2010, and has taken an active role in many of its transitions over the past several years.

Grace Thomas became a Digital Collections Specialist for the Library of Congress Web Archiving Team in August 2016. Currently, many of her tasks revolve around streamlining ways to facilitate description of and access to resources in the Library’s 1.7 petabyte web archive.

Posted in ALA Annual 2019 | 1 Comment

Panel Program on Crowdsourcing Metadata Presentations Available

Metadata Interest Group Meeting Panel Program: Crowdsourcing Metadata

At the ALCTS Metadata Interest Group meeting during ALA Midwinter Meeting in Seattle, on Sunday, January 27, 2019, there was a panel presentation on crowdsourcing metadata.

 

Crowdsourcing metadata: the revolutionary cataloging interface and how it can help your library expose and promote hidden collections

This presentation draws on recently completed original research (in press, to be published in the Journal of Library Metadata, vol. 18, issue 2) to analyze and explain the automated quality control features of Zooniverse’s crowdsourcing metadata platform. The results, it is argued, are truly revolutionary. Case studies are cited to demonstrate successful use of the platform by major institutions including the University of Oxford, the Folger, the Imperial War Museum and the Huntington. Similar initiatives based on proprietary platforms designed by the Smithsonian and the National Archives are also noted. Clearly, therefore, crowdsourcing of metadata to expose and promote hidden collections is a significant and rapidly growing development in libraries, archives and museums. We conclude with a description of an experimental project at Cal State Fullerton designed to ascertain whether such developments can succeed at a local institutional level. Finally, an invitation is extended for attendees to consider whether such initiatives may also be implemented for the benefit of their own respective institutions and users.

Samuel T. Barber

Cataloging & Metadata Librarian

California State University, Fullerton

Slides

 

Wisdom of the Crowd

Utah State University Libraries’ Cataloging and Metadata Unit has successfully used several methods to engage the public in metadata creation for USU’s Digital History Collections. Most, if not all the techniques we have tested, have yielded positive results and have improved the relevancy and accuracy of our descriptive metadata. During this presentation we will discuss different tools and techniques we have used to foster communication between our metadata specialists and the communities they serve, as well as approaches that were tried and did not yield the results we were hoping for. Attendees will be able to see what has been done at Utah State University and take those ideas and create new innovative approaches to collect crowdsourcing metadata at their own institutions.

Slides

Becky Skeen

Special Collections Cataloging Librarian

Utah State University

 

Andrea Payant

Metadata Librarian

Utah State University

 

About the Metadata Interest Group

The ALCTS Metadata Interest Group provides a broad framework for information exchange on current research developments, tools, and activities affecting networked information resources and metadata; coordinates and actively participate in the development and review of standards concerning networked resources and metadata in conjunction with the divisions’ committees and sections, other units within ALA, and relevant outside agencies; and develops programs and fosters and sponsors education and training opportunities that contribute to and enhance an understanding of networked resources and metadata, their identity, content, technology, access, control, and use; and to plan and monitor activities using the association’s strategic and tactical plan as a framework.

Posted in ALA Midwinter 2019 | Leave a comment

Midwinter 2019 Meeting Minutes Available

The minutes from the Metadata Interest Group’s 2018 Midwinter program and business meeting are now available.

Additional information about Midwinter is coming soon, including slides from the presentations at the MIG program!

Thank you to Rachel Tillay (blog coordinator) for taking notes during the business meeting and to Wendy Robertson (secretary) for putting together these minutes!

 

Posted in ALA Midwinter 2019 | Leave a comment

Last Reminder: ALCTS Metadata Interest Group Meeting at ALA Midwinter 2019

Please join the ALCTS Metadata Interest Group during ALA Midwinter Meeting in Seattle for a panel presentation on crowdsourcing metadata on Sunday, January 27, 2019, 8:30-10:00AM, Washington State Convention Center, Room 3A during our regular scheduled meeting. Speakers Samuel T. Barber, from California State University, Fullerton, and Becky Skeen & Andrea Payant, from Utah State University, will present on approaches to crowdsourcing metadata, followed by Q&A. Looking forward to seeing you all in Seattle. Be sure to add this event to your ALA Conference Scheduler.

For more information, see the presenter bios and abstracts: https://wp.me/p70q4y-66

Posted in ALA Midwinter 2019 | Leave a comment

Crowdsourcing Metadata Panel Presentation Reminder and Bios

Join the ALCTS Metadata Interest Group during ALA Midwinter Meeting in Seattle for a panel presentation on crowdsourcing metadata on Sunday, January 27, 2019, 8:30-10:00AM, Washington State Convention Center, Room 3A during our regular scheduled meeting. Speakers Samuel T. Barber, from California State University, Fullerton, and Becky Skeen & Andrea Payant, from Utah State University, will present on approaches to crowdsourcing metadata, followed by Q&A. Looking forward to seeing you all in Seattle. Be sure to add this event to your ALA Conference Scheduler.

Presenters and abstracts:

Samuel T. Barber

Cataloging & Metadata Librarian

California State University, Fullerton

Bio: Following a semi-successful semi-professional career as a musician, recording artist and sound engineer, Samuel T. Barber emigrated from the UK to the USA in 2013 with a newly-acquired MILS from the University of Strathclyde, Scotland, and a renewed desire to contribute to the advancement of librarianship and information science. Past cataloging and metadata work at Glasgow University, the Autry National Center and Cal State Fullerton has included exposing hidden collections formerly owned by luminaries ranging from the Scots psychiatrist R.D. Laing to Los Angeles author, ethnographer, historian and ‘celebrity librarian’ Charles Fletcher Lummis. His main research interest is the crowdsourcing of descriptive metadata and related interfaces, which he believes merit the term ‘revolutionary cataloging interfaces’. Samuel’s favorite book is ‘What’s going on’ by Marvin Gaye, and yes, he’s fully aware that this is a record and not a book.

Crowdsourcing metadata: the revolutionary cataloging interface and how it can help your library expose and promote hidden collections

This presentation draws on recently completed original research (in press, to be published in the Journal of Library Metadata, vol. 18, issue 2) to analyze and explain the automated quality control features of Zooniverse’s crowdsourcing metadata platform. The results, it is argued, are truly revolutionary. Case studies are cited to demonstrate successful use of the platform by major institutions including the University of Oxford, the Folger, the Imperial War Museum and the Huntington. Similar initiatives based on proprietary platforms designed by the Smithsonian and the National Archives are also noted. Clearly, therefore, crowdsourcing of metadata to expose and promote hidden collections is a significant and rapidly growing development in libraries, archives and museums. We conclude with a description of an experimental project at Cal State Fullerton designed to ascertain whether such developments can succeed at a local institutional level. Finally, an invitation is extended for attendees to consider whether such initiatives may also be implemented for the benefit of their own respective institutions and users.

Andrea Payant

Metadata Librarian

Utah State University

Bio: Andrea Payant is the Metadata Librarian at USU’s Merrill-Cazier Library. She has been working for USU Libraries for over ten years. She received her MLIS from San Jose State University. Her research interests include metadata quality benchmarks, research data management, crowdsourcing metadata, and technical services outreach.

Becky Skeen

Special Collections Cataloging Librarian

Utah State University

Bio: Becky Skeen is the Special Collections Cataloging Librarian at Utah State University’s Merrill-Cazier Library. She coordinates a majority of Special Collections cataloging projects for her library and helps with the creation of metadata for special collections’ digitized materials.

 

Wisdom of the Crowd: Successful ways to engage the public in metadata creation

Utah State University Libraries’ Cataloging and Metadata Unit has successfully used several methods to engage the public in metadata creation for USU’s Digital History Collections. Most, if not all the techniques we have tested, have yielded positive results and have improved the relevancy and accuracy of our descriptive metadata. During this presentation we will discuss different tools and techniques we have used to foster communication between our metadata specialists and the communities they serve, as well as approaches that were tried and did not yield the results we were hoping for. Attendees will be able to see what has been done at Utah State University and take those ideas and create new innovative approaches to collect crowdsourcing metadata at their own institutions.

 

About the Metadata Interest Group

The ALCTS Metadata Interest Group provides a broad framework for information exchange on current research developments, tools, and activities affecting networked information resources and metadata; coordinates and actively participate in the development and review of standards concerning networked resources and metadata in conjunction with the divisions’ committees and sections, other units within ALA, and relevant outside agencies; and develops programs and fosters and sponsors education and training opportunities that contribute to and enhance an understanding of networked resources and metadata, their identity, content, technology, access, control, and use; and to plan and monitor activities using the association’s strategic and tactical plan as a framework.

Posted in ALA Midwinter 2019 | Leave a comment

Program Announcement: ALCTS Metadata Interest Group Meeting at ALA Midwinter 2019

Join the ALCTS Metadata Interest Group during ALA Midwinter Meeting in Seattle for a panel presentation on crowdsourcing metadata on Sunday, January 27, 2019, 8:30-10:00AM, Washington State Convention Center, Room 3A during our regular scheduled meeting. Speakers Samuel T. Barber, from California State University, Fullerton, and Becky Skeen & Andrea Payant, from Utah State University, will present on approaches to crowdsourcing metadata, followed by Q&A. Looking forward to seeing you all in Seattle. Be sure to add this event to your ALA Conference Scheduler.

Presenters and abstracts:

Continue reading

Posted in ALA Midwinter 2019 | Leave a comment