Deploy a solution that delivers the compute-intensive resources needed to run AI workloads on existing HPC clusters.
Base and Plus Configurations for Intel Select Solution for HPC & AI Converged Clusters [Magpie]
Ingredient |
Intel Select Solutions for HPC & AI Converged Clusters [Magpie] |
Intel Select Solutions for HPC & AI Converged Clusters [Magpie] Plus Configuration |
---|---|---|
Workload Domain (Minimum 4-Compute-Node Configuration) |
||
Platform |
Dual-socket server platform |
Dual-socket server platform |
Processor |
2 x Intel® Xeon® Gold 6126 processor (2.60 GHz, 12 cores, 24 threads), Intel® Xeon® Gold 6226 processor (2.70 GHz, 12 cores, 24 threads), or a higher model number Intel® Xeon® Scalable processor | 2 x Intel® Xeon® Gold 6252 processor (2.10 GHz, 24 cores, 48 threads), or a higher model number Intel Xeon Scalable processor |
Memory |
192 GB | 192 GB |
Boot Drive |
240 GB Intel® SSD Data Center (Intel® SSD DC) S3520 SAA 3.0, 6 Gbps or equivalent |
240 GB Intel SSD DC S3520 SAA 3.0, 6 Gbps or equivalent |
Storage |
HPC parallel file system (470 megabits per second [Mbps] per client) |
HPC parallel file system (470 Mbps per client) |
Messaging Fabric |
Intel® Omni-Path Host Fabric Interface (Intel® |
Intel Omni-Path Host Fabric Interface (Intel OP HFI) Adapter 100 Series |
Management Network Switch |
10 GbE switch |
10 GbE switch |
Batch Scheduler |
Open source Magpie on SLURM |
Open source Magpie on SLURM |
Software |
Linux* operating system Intel® Cluster Checker 2019 OpenHPC** Intel® Omni-Path Fabric (Intel® OP Fabric) Software Intel® Parallel Studio XE 2019 Cluster Edition** Apache Spark TensorFlow Horovod |
Linux operating system Intel Cluster Checker 2019 OpenHPC** Intel Omni-Path Fabric Software Intel Parallel Studio XE 2019 Cluster Edition** Apache Spark TensorFlow Horovod |
Management Domain |
||
Management Network |
Integrated 10 GbE** |
Integrated 10 GbE** |
Firmware and Software Optimizations |
Intel® Hyper-Threading Technology (Intel® HT Technology) enabled Intel® Turbo Boost Technology enabled XPT prefetch enabled |
Intel HT Technology enabled Intel Turbo Boost Technology enabled XPT prefetch enabled |
Minimum Performance Standards |
||
Algorithm/Test |
Training/inference |
Using SLURM 4-node cluster (images/sec) |
ResNet50* Int8* |
Inference |
6,300 |
ResNet50 |
Training |
400 |
Business Value of Choosing a Plus Configuration Over a Base Configuration |
The Plus configuration allows faster time to train an AI model with its increased compute capabilities, and AI inferencing sees decreased time to insights due to Intel® Deep Learning Boost (Intel® DL Boost). |
|
**Recommended, not required |