New data science computing platform

Advanced Research Computing - Technology Services

Advanced Research Computing – Technology Services (ARC-TS) is pleased to announce an expanded data science computing platform, giving all U-M researchers new capabilities to host structured and unstructured databases, and to ingest, store, query, and analyze large datasets.

The new platform features a flexible, robust, and scalable database environment and a set of data pipeline tools that can ingest and process large amounts of data from sensors, mobile devices, wearables, and other sources of streaming data. The platform leverages the advanced virtualization capabilities of ARC-TS’s Yottabyte Research Cloud (YBRC) infrastructure and is supported by U-M’s Data Science Initiative launched in 2015. YBRC was created through a partnership between Yottabyte and ARC-TS announced last fall.

The following functionalities are immediately available:

  • Structured databases:  MySQL/MariaDB, and PostgreSQL.
  • Unstructured databases: Cassandra, MongoDB, InfluxDB, Grafana, and ElasticSearch.
  • Data ingestion: Redis, Kafka, RabbitMQ.
  • Data processing: Apache Flink, Apache Storm, Node.js and Apache NiFi.

Other types of databases can be created upon request.

These tools are offered to all researchers at U-M free of charge, provided certain usage restrictions are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact hpc-support@umich.edu.

At this time, the YBRC platform only accepts unrestricted data. The platform is expected to accommodate restricted data within the next few months.

ARC-TS also operates a separate data science computing cluster available for researchers using the latest Hadoop components. This cluster also will be expanded in the near future.

Author: Dan Meisler, Advanced Research Computing

Dan is the communications manager for Advanced Research Computing. You can reach him at dmeisler@umich.edu.