Microway supplied Milwaukee School of Engineering (MSOE) with an Nvidia® DGX™ POD-based supercomputer for education and applied research. This supercomputer forms the centerpiece of the university’s new computer science program and will support an expansion of deep learning and AI education designed to permeate across the institution.
DGX POD is a reference architecture that provides a blueprint for designing large-scale data center infrastructure that can support modern AI development. It is based on the Nvidia DGX SATURNV AI supercomputer, which powers internal Nvidia AI research and development used in autonomous vehicles, robotics, graphics, high-performance computing (HPC), and other domains.
As an experienced cluster integrator and Nvidia Partner Network Elite DGX partner, Microway’s role was essential to delivering a complete solution that was operational on day one. Microway experts performed a careful system, storage, and network architecture design and design review with MSOE IT personnel and Nvidia solutions architects to meet MSOE’s specific AI education and computer science needs.
The cluster design includes three racks of DGX servers, high-speed storage, 100G networking, and management servers, along with Nvidia NGC deep learning containers and the DGX software stack, deployed and managed with DeepOps. It features three DGX-1 AI systems with Nvidia V100 Tensor Core GPU accelerators; 20 Microway NumberSmasher Xeon + Nvidia T4 GPU teaching compute nodes; and access to NGC, which provides an online registry of software stacks optimized for deep learning, machine learning, and HPC applications, as well as pre-trained models and model training scripts. Also included in the deployment are high-performance storage arrays and a larger general-purpose storage pool from storage partner NetApp.
Microway’s design and integration experts worked closely with the MSOE team to ensure the custom DGX POD-based configuration met user needs. Microway delivered and installed the cluster fully integrated and ready to run after many weeks of intensive integration and stress testing were performed. Thorough testing ensured not only system functionality/stability but also performance with analysis of GPU throughput, local NVMe cache throughput, and network storage throughput. The teams worked together to customize storage, networking, and cluster software.
This configuration gives undergraduate students at MSOE supercomputer access in the classroom, enabling training of the next AI workforce. Traditional supercomputers require that users be familiar with command line interfaces and workload managers. The DeepOps install Microway provided to MSOE allows a student to access the “ROSIE” cluster in their web browser and start a DGX-1 or T4 GPU deep learning session with the click of a button.
“We are extremely pleased by the opportunity to work with Nvidia and MSOE on this significant new education and applied research facility,” said Eliot Eshelman, vice president of Strategic Accounts and HPC Initiatives at Microway. “Microway’s expertise, combined with Nvidia’s DGX POD architecture, enabled us to deliver a new type of cluster that melds the best of HPC with the latest developments in deep learning. In addition to enabling new research, this cluster simplifies student usage for studies of data analytics, AI, and computer science.”