Combine Metadata Harvester: Aggregate ALL the data! – Michigan Technology Community News

Combine harvesting wheat field. — (Rick Crowley / Cereal Harvest in Somerset / CC BY-SA 2.0)

The Digital Public Library of America displays over 36 million records. While a large share come from ‘Content Hubs’ like the Smithsonian or HathiTrust, there are still millions of records ingested from a wide range of smaller institutions across America. “The technologies we use to transform and validate XML records, like XSLT, are well-established and highly reliable, but software for handling records at this scale, and performing mass transformation and validation operations, is a little harder to come by,” writes Esty Thomas of the U-M Library I.T. Division in a recent blog post.

Thomas’s work involves the use and development of software originally developed by Wayne State University Libraries called Combine. Combine offers flexibility and repeatability to users handling diverse streams of metadata. According to Thomas, the Library is working to incorporate several different types of ingestion processes, so that it’s equally easy to pull in a spreadsheet of metadata from a very small local history museum as it is to fetch records from selected U-M collections.