The Inter-university Consortium for Political and Social Research (ICPSR) is an international consortium of more than 810 academic institutions and research organizations. They provide leadership and training in data access, curation, and methods of analysis for the social science research community. Their mission is to advance and expand social and behavioral research, act as a global leader in data stewardship, and provide rich data resources and responsive educational opportunities for present and future generations.
Partnering with Census Bureau Economist Katie Genadek, Trent Alexander, Research Professor, and David Bleckley, Senior Data Project Manager, at ICPSR needed to digitize about 500 million United States census records. The data are stored on 250,000 reels of microfilm and microfiche, which contain images from the 1960, 1970, 1980, and 1990 decennial census forms. The forms from 1850–1950 have already been digitized and linked by IPUMS (formerly the Integrated Public Use Microdata Series) and the National Bureau of Economic Research (NBER), and the censuses from 2000 forward were created digitally and linked by the Census Bureau, but while the data are digitized, the names on the 1960–1990 censuses were never transcribed because there was no statistical need to invest the resources into digitizing and storing those data. Without the names, those records cannot be linked to the other years’ data. When they’re done, researchers will have access to a massive, complete, multigenerational collection of linked census records from 1850–2020.
This project required methods and funding sufficient to deal with the massive scale and security requirements.
Scanning the tapes
They purchased 25 new, state-of-the-art, modern, high-speed microfilm scanners, which was a huge investment. Dozens of technicians at the Census Bureau worked daily for several years to scan the 250,000 reels. Before scanning, many of the reels needed to be prepared for use with the scanners, mainly because film from some years lacked the 2-foot “leader” and “trailer” film necessary for the reader to begin and complete spooling without missing images. Scanner technicians first needed to unwind and rewind tens of thousands of microfilm reels to add that leader and trailer film, then check each reel and rewind them in the correct direction before they could be scanned.
With that volume of winding, they didn’t want to burn out the motors on the Census Bureau’s new microfilm readers or occupy time on their new digital scanners. Instead, they hunted down old motorized film winders, microfilm viewers, and microfilm scanners that they could use for all the winding.
Obtaining the used hardware
They used online resources like eBay and government surplus property dispositions to find the equipment. They also reached out to their networks of colleagues in libraries and archives to see if they had unwanted equipment, having transitioned from microfilm to digital storage. Their favorite and the most durable is a Hollywood Film Company rewinder from the University of Illinois Libraries’ property disposition. It has been a project workhorse and is still in use at the Census Bureau to this day. Someday, it will have a permanent home at ICPSR.
Hollywood Film Company rewinder (Photo courtesy of University of Illinois Libraries)
Current status and future plans
As of this writing, they have completed all of the preparation and scanning. That work took more than four years and generated approximately 4 PB of data that are stored at the Census Bureau office in Jeffersonville, Indiana. The scanned records include both handwritten responses, such as respondents’ names, and “bubbled” responses, such as fields for race and age. They are using state-of-the-art handwriting recognition software, optical character recognition (OCR), and optical mark recognition (OMR) software to recover both respondent names and some bubbled information.
They are mostly focused on doing handwriting recognition and record linkage now. They are aiming to have a linked dataset available to researchers by the end of 2026. For more information, please visit their website or email info@censusdigitization.org.