HPC Engineer – IFM

Home Working at MBZUAI Vacancies HPC Engineer – IFM

Vacancy Overview

Application Open:

Full-Time

 

The Institute for Foundation Models (IFM) at MBZUAI operates some of the world’s largest AI supercomputing environments, supporting frontier AI research and foundation model development across thousands of GPUs.

 

We are seeking an HPC Engineer to join our growing infrastructure team. This role is suitable for recent graduates and early-career engineers who are passionate about Linux systems, large-scale computing, distributed systems, and AI infrastructure.

 

Key Responsibilities
• Support operation and maintenance of large-scale GPU computing clusters.
• Assist researchers with job submission, troubleshooting, and resource utilization.
• Monitor cluster health, performance, and availability.
• Troubleshoot Linux, hardware, storage, networking, and software issues.
• Support Slurm administration and user management.
• Assist with cluster deployment, upgrades, and validation.
• Develop scripts and automation tools.
• Maintain technical documentation and operational procedures.
• Participate in incident response and operational support.
• Collaborate with researchers, vendors, and internal teams.

 

Academic Qualifications 

  • Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, Software Engineering, Information Technology, Mathematics, Physics, or related disciplines.

Professional Experience Required
Preferred:
• Linux administration experience.
• Python, Bash, Go, or C/C++ programming.
• Networking fundamentals.
• Cloud platforms (Azure, AWS, GCP).
• Containers (Docker, Apptainer, Enroot).
• Git and software development workflows.
• AI/ML infrastructure exposure.
• HPC, distributed systems, or research computing experience.

Apply Now:

Drag & Drop Files, Choose Files to Upload
Drag & Drop Files, Choose Files to Upload