Application Open:
Full-Time
Job Purpose:
MBZUAI is looking to recruit a Research Scientist for the Institutional Function Models (IFM) team to develop foundational world models for accurate physical simulations, collaborating closely with engineering and data teams on large-scale training challenges. The role will include designing scalable data annotation pipelines, developing rigorous performance benchmarks, optimizing inference for real-time interaction, and advancing multimodal training systems. Expertise in visual tokenization, quantitative evaluation methods, and scaling laws for video pretraining is highly desired. Candidates should hold a postgraduate degree in the related field, with a proven research track record.
Key Responsibilities:
- Develop the foundational world model to accurately simulate the physical world.
- Collaborate with engineering and data teams to tackle key challenges in training the world model on large-scale clusters.
- Develop metrics and evaluation benchmarks to better assess model performance.
- Design and implement a scalable and efficient data annotation pipeline to ensure high-quality labeled data for training and evaluation.
- Optimize inference efficiency to enable real-time interaction.
Areas of Focus
- Scalable Training Systems: Develop and optimize infrastructure for training multimodal LLMs and video diffusion models at massive scale.
- Efficient Data Pipelines: Build scalable video data pipelines and annotation frameworks to support high-quality training data.
- Inference Optimization: Enhance inference efficiency through optimization and distillation techniques to enable real-time interaction.
- Visual Tokenization: Develop methods for discretizing visual features into tokens for improved model representation.
- Quantitative Evaluation: Establish rigorous benchmarks to assess physical accuracy, controllability, and intelligence.
- Scaling Laws for Video Pretraining: Investigate scaling law principles to guide efficient video pre-training strategies.
Academic Qualifications:
- MSc or PhD in Machine Learning or Computer Science, or equivalent industry experience.
Professional Experience:
- Experience in large-scale model training (LLMs or Diffusion Models) on large clusters.
- Hands-on experience with state-of-the-art video generative models (e.g., Sora, Veo2, MovieGen, CogVideoX, etc.).
- Experiences in building and optimizing large-scale video data pipelines.
- Experience in accelerating diffusion model inference for improved efficiency.
- Exceptional problem-solving and troubleshooting skills to tackle complex technical challenges.
- Strong systems and engineering expertise in deep learning frameworks such as PyTorch.
- Strong communication and collaboration skills for effective cross-functional teamwork.
- Ability to navigate ambiguity and drive projects in rapidly evolving research areas.
- Research contributions to top-tier conferences or journals (e.g., ICML, ICLR, NeurIPS, ACL, CVPR, COLM, etc.), with published work in relevant domains.