HPC Network Engineer

Home Working at MBZUAI Vacancies HPC Network Engineer

Vacancy Overview

Application Open:

Full-Time

MBZUAI is seeking a highly skilled HPC Network Engineer to design, implement, and operate the high-performance networking infrastructure that underpins the university’s research computing environment.

This role is critical to ensuring reliable, low-latency, and high-bandwidth connectivity across GPU and CPU clusters, parallel storage systems, and research platforms supporting large-scale AI/ML and robotics workloads. The position focuses on network architecture, optimization, monitoring, and troubleshooting for HPC environments, enabling researchers to operate at scale while ensuring performance, resilience, security, and compliance across all HPC facilities.

Key Responsibilities:

HPC Network Architecture & Engineering

Design, deploy, and maintain high-performance network architectures for HPC clusters, GPU servers, CPU nodes, and parallel storage systems.
Configure and optimize high-speed interconnects, including InfiniBand, RoCE, and high-speed Ethernet (25/100/200GbE+), to support low-latency and high-throughput workloads.
Design network topologies optimized for MPI traffic, NCCL collectives, and large-scale data transfers.
Integrate networking solutions with parallel file systems such as Lustre, BeeGFS, or GPFS.

Network Operations, Monitoring & Troubleshooting

Monitor network performance, capacity, and availability across all HPC facilities.
Diagnose and resolve complex network issues affecting compute, storage, and distributed training workloads.
Implement performance monitoring, alerting, and diagnostics using HPC-specific networking tools.
Ensure maximum uptime and performance for research computing resources.

Security, Compliance & Reliability

Implement and maintain network security controls aligned with data center and institutional standards.
Ensure compliance with internal policies, safety requirements, and regulatory obligations.
Develop preventive maintenance procedures and support disaster recovery and resilience planning for network infrastructure.

Upgrades, Capacity Planning & Innovation

Plan and execute network upgrades, expansions, and technology refreshes with minimal disruption to research activities.
Support capacity planning and forecasting for growing AI/HPC workloads.
Evaluate emerging networking technologies relevant to AI and HPC (e.g., SmartNICs, CXL, GPUDirect RDMA).

Documentation & Collaboration

Develop and maintain detailed network documentation, architecture diagrams, configuration records, and operational procedures.
Collaborate with HPC system engineers, storage architects, MLOps, and research teams to ensure end-to-end system performance.
Provide expert-level support and guidance on network-related issues to internal stakeholders.

Professional Experience Required
Essential:

Minimum 5 years of experience in network engineering, with at least 3 years in HPC or research computing environments.
Extensive hands-on experience with high-performance networking technologies such as InfiniBand, Omni-Path, RoCE, or high-speed Ethernet.
Proven expertise configuring and troubleshooting network infrastructure for parallel file systems (e.g., Lustre, GPFS, BeeGFS).
Strong understanding of data-center networking concepts, including routing, switching, VLANs, RDMA, and network security.
Experience designing networks optimized for MPI workloads and large-scale distributed AI training.
Proficiency with network monitoring and diagnostic tools in HPC environments.
Ability to work in a demanding, service-oriented environment with strong organization, communication, and collaboration skills.

Preferred:

Experience with software-defined networking (SDN) in HPC contexts.
Professional certifications such as CCNP, CCIE, or equivalent.
Experience supporting HPC environments in academic or research institutions.
Exposure to GPU-centric networking architectures and NVIDIA networking technologies.

Apply Now:

First Name

Last Name

Phone

Highest Qualification

Number of Years of Experience in Related Position

Nationality

Phone in Number

Upload CV

Drag & Drop Files, Choose Files to Upload

Upload Cover Letter

Drag & Drop Files, Choose Files to Upload