High Performance Computing (HPC) is prevalent today across many industries, including financial services, life sciences, higher education research, manufacturing, and energy. More and more businesses are deploying HPC workloads in the cloud to take advantage of its elasticity, scalability, and availability. Job schedulers are critical for HPC applications given the nature of these workloads that create, process, and tear down thousands, and sometimes millions, of vCPU and network resources with TB to PB of storage capacity. Job scheduling tools lead to improved operational efficiency and a degree of certainty that a particular HPC job, which can run for hours to weeks, will complete successfully.
IBM Spectrum LSF is used extensively in the manufacturing and semiconductor industry to manage Electronic Design Automation (EDA) workloads. Dynamically running workloads on-premises and in the cloud, also known as cloud bursting, is becoming a more common practice to address capacity and provisioning time constraints within data centers and enable enterprises to take advantage of virtually unlimited resources.
However, the challenge with cloud bursting is integrating and maintaining operational consistency across on-premises and cloud environments. IBM Spectrum LSF, in combination with Google Cloud, addresses this problem head on.
Google Cloud is excited to announce, in collaboration with IBM, enhanced capabilities to IBM Spectrum LSF that enables organizations to integrate their on-premises job scheduling scripts with resources deployed in Google Cloud. Customers are now able to fully leverage Google Cloud’s highly scalable and secure Compute Engine, networking and storage infrastructure.
The LSF-Google Cloud resource connector patch supports key Google Cloud differentiators including Local SSDs, GCE instance templates, Preemptible VMs, and more:
- Bulk API support– Deploy large fleets of VM instances in a matter of seconds.
- Instance Templates – Simplify VM configuration by creating reusable templates.
- All Machine Types – Supports all GCE VM families and machine types, including Custom Machine Types.
- GPUs – Attach up to 16 GPUs per instance, including the largest A2 instances with up to 16 NVIDIA A100 GPUs
- Preemptible VMs – Preemptible VMs are provisioned from excess Compute Engine capacity and is a significant way to save money on GCE resources.
- Local SSD – Attach up to 9TB of NVMe SSD per instance
- Hyperthreading – Supports the “threads-per-core” option in GCE, which allows per-VM hypervisor level Hyperthread configuration (when supported by Instance Templates)
- Images – Supports custom disk images, including full support for Windows, for all attached Persistent Disks
- Placement Policies – Control where the instances are physically located relative to each other within a zone for improved low-latency performance
- Labels – Supports GCE Labels, which can be used for management of firewall rules, tracking billing, etc
- Minimum CPU Platform – Supports the ability to specify a minimum CPU Platform for your Virtual Machines.
This improvement to the IBM LSF Resource Connector was developed by IBM in coordination with Google. Find out more about the new supported features and their operation in the official IBM Spectrum LSF Resource Connector Documentation. You can also find additional documentation and download the software in the IBM Spectrum Computing Community. If you have further questions, you can contact IBM, or Google Cloud Sales.
Special thanks to Annie Ma-Weaver, Mark Mims, and Wyatt Gorman for their contributions.