Application Open:
Full-Time
MBZUAI is seeking an High-Performance Computing (HPC) Lead for the Institute for Agriculture & AI (IAAI). The HPC Lead will be responsible for the effective management, optimization, and governance of the Institute’s HPC resources that support research, model development, and deployment activities. This role ensures secure, efficient, and equitable allocation of computational resources for internal teams and approved external partners, while maintaining high standards of operational excellence, system reliability, and compliance with institutional policies and governance requirements.
Key Responsibilities
HPC Infrastructure Management
- Oversee the design, operation, and continuous enhancement of the Institute’s high-performance computing (HPC) infrastructure, including compute clusters, storage systems, networking, and associated software environments supporting AI research.
- Ensure high availability, performance optimization, scalability, and resilience of HPC resources to meet the evolving demands of large-scale AI model training, simulation, and experimentation
Resource Allocation and Infrastructure Governance
- Design and oversee transparent, fair, and efficient allocation frameworks for HPC and computational resources, including scheduling, quota management, and usage monitoring. Ensure that resource allocation aligns with institutional priorities, approved research programs, and
governance decisions, while enabling responsible and equitable access.
Partner Support and Enablement
- Ensure that internal research teams and approved external partners are effectively supported in accessing and using HPC resources.
- Oversee onboarding, training, and enablement mechanisms to maximize the effective use of computational infrastructure for AI research, development, and innovation.
Operational Excellence and Reliability
- Maintain operational excellence across the Institute’s technical and computational platforms by ensuring robust system monitoring, incident management, performance tuning, and capacity planning.
- Drive continuous improvement of operational processes to ensure reliability, efficiency, and responsiveness to user needs.
Security, Compliance, and Data Governance
- Ensure that all HPC operations and associated research activities comply with MBZUAI’s information security standards, data governance policies, and applicable regulatory requirements.
- Oversee the implementation of appropriate access controls, auditing mechanisms, and safeguards to protect sensitive data, models, and intellectual property.
Research Enablement and Cross-Functional Coordination
- Work closely with research leadership, data engineers, technical teams, and operations functions to align infrastructure capabilities with scientific objectives and emerging research needs.
- Provide strategic input into infrastructure roadmaps, technology investments, and capacity planning to support the Institute’s long-term growth and impact.
Academic Qualifications Required
- Master’s degree in Computer Science, Computational Science, Data Science, Applied Mathematics, or a related field.
- PhD is desirable but not mandatory with equivalent senior experience.
Professional Experience Required
Essential:
- Minimum of eight (8) to ten (10) years of progressive experience managing high-performance computing (HPC) environments, large-scale computing infrastructure, or advanced research computing systems.
- Demonstrated experience supporting AI-driven and data-intensive workloads, implementing transparent resource allocation and governance frameworks, and leading multidisciplinary technical teams.
- Experience in implementing security protocols and compliance measures for HPC environments to safeguard sensitive research data.
Preferred:
- Experience within academic, research, or research-intensive institutional environments is highly desirable, particularly where HPC infrastructure underpins large-scale AI research and innovation.
- Familiarity with cloud-based HPC platforms and experience managing hybrid or multi-cloud environments