Big data, small footprint

By | March 27, 2020

At a time when we’re relying on the internet to an unprecedented degree in our daily lives, a team of U-M researchers led by Mosharaf Chowdhury and Harsha Madhyastha has found a way for tech companies, banks and health systems to squeeze more capacity out of our existing infrastructure. A change to the design of big-data software tool Apache Spark could enable the world’s biggest users of computing power to crunch through massive tasks up to 16 times faster while lightening their burden on the internet.

Spark is an open-source electronic framework that serves as a task manager, coordinating vast networks of individual computers to work together as a single machine on big computing tasks. When Spark was built a decade ago, most of this work took place at large data centers. But today, it’s increasingly being used to connect machines that are spread across the globe and connected by the internet.