Research reproducibility in data science and the role of IT staff

A significant challenge facing a wide variety of disciplines is the ability to reproduce research results. Researchers across U-M are working together to develop best practices that promote reproducible data science, and Michigan IT staff play an important role in this effort.

“Many IT staff members don’t normally get involved directly in specific research projects, so the reproducibility issue might not seem like a concern for them,” says Jing Liu, managing director of the Michigan Institute for Data Science (MIDAS). “But this issue is important for pretty much anyone who is involved in research in some way, regardless of whether they are faculty, staff, or students.”

For example, IT staff involved in data collection, management, and storage need to consider whether data documentation is clear and that the storage is set up correctly so that other researchers can access the data and know exactly what it is. “There can’t be misinterpretation of what the data actually represents,” explains Liu.

Similarly, staff who are involved in data analysis need to consider whether their code, workflow, hardware, and statistical assumptions can be shared and understood by others, and whether the same analysis can work with different hardware. “There are many such considerations that staff researchers and those who support researchers should be aware of and take very seriously,” says Liu.

To learn more, visit the Michigan Institute for Data Science (MIDAS) Reproducibility Hub for tools, methods, online presentations, and tutorials.