Application Open:
Full-Time
MBZUAI is seeking Research Engineers to support the development of next-generation AI systems capable of predicting cellular state changes in response to molecular perturbations, including small-molecule drugs, peptides, proteins, genetic edits, and RNA-based interventions. These models play a critical role in accelerating target identification, therapeutic discovery, precision cell engineering, and high-throughput drug screening. The engineer will contribute to building, training, and validating foundation-model architectures that learn complex biological representations and accurately forecast cellular responses across multimodal data types.
Key Responsibilities
AI Model Development for Cellular Response Prediction
- Design, train, and evaluate deep learning and foundation-model architectures to predict cell state transitions following molecular perturbations.
- Develop models capable of learning from multimodal biological datasets, including transcriptomics, proteomics, imaging, and perturbation-response profiles.
- Implement representation learning techniques to capture cellular heterogeneity, gene regulatory behavior, and molecular interaction patterns.
- Build generative or simulation-based models to forecast dose–response effects, off-target activities, and phenotypic shifts.
Molecular Perturbation Modeling
- Integrate molecular descriptors, sequence features, and structural representations (e.g., SMILES, protein sequences, CRISPR guides, iRNA constructs).
- Construct predictive systems to estimate the functional impact of genetic edits, RNA interference, or protein-level interventions.
- Develop embedding spaces linking molecular features to cellular phenotypes for enhanced interpretability and screening.
Multimodal Data Fusion
- Create fusion pipelines combining sequencing data, high-content imaging, and experimental metadata into unified model inputs.
- Implement attention mechanisms, contrastive learning, and cross-modal alignment methods for robust biological inference.
- Build scalable preprocessing and feature-engineering workflows to support large-scale multi-omics datasets.
Model Optimization & Deployment
- Build and optimize computational pipelines for high-performance training and inference on MBZUAI/cloud compute infrastructure.
- Deploy trained models for real-time or batch biological prediction, ensuring reproducibility, traceability, and performance monitoring.
- Contribute to benchmarking efforts, dataset curation, model evaluation methodologies, and scientific reporting.
Research Support & Knowledge Transfer
- Collaborate closely with faculty, postdocs, and research teams to align model development with experimental or scientific objectives.
- Document code, data processing pipelines, and model specifications to support cross-team reuse and long-term sustainability.
- Provide technical guidance on AI methodologies for biological modeling across Research Division initiatives.
Project Statistics & Reporting
- Track project milestones, risks, and deliverables in accordance with internal reporting templates.
- Prepare technical reports, dashboards, and progress summaries for leadership review.
- Translate modeling outputs into actionable insights for scientific and experimental partners.
Other Duties
- Perform all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.
Academic Qualification Required
- Bachelor’s degree in Computer Science, Computational Biology, Bioinformatics, Electrical Engineering, or a related field.
- A postgraduate in AI for biology, computational life sciences, or a related discipline is strongly preferred.
Professional Experience Required
Essential:
- Minimum 2 years of experience in machine learning or deep learning development (PyTorch, TensorFlow).
- Experience with distributed/parallel training or inference.
- Strong proficiency in Python and scientific computing libraries.
- Experience working with sequencing data, multi-omics datasets, or biological data processing.
- Knowledge of deep learning architectures relevant to biological modeling (transformers, graph neural networks, variational autoencoders, diffusion models).
- Experience training models on large datasets and optimization for research-grade performance.
Preferred:
- Experience with single-cell data, CRISPR screens, or perturbation-response profiling.
- Background in computational drug discovery or systems biology.
- Experience with foundation models, representation learning, or multimodal data fusion.
- Familiarity with cloud compute environments, high-performance computing, and MLOps workflows.
- Experience with parallel/distributed ML frameworks such as Megatron, DeepSpeed, and Ray.
- Working knowledge of biological assays and experimental pipelines is an advantage.