Application Open:
Full-Time
Job Purpose:
The Data Engineer designs, builds, and maintains scalable data pipelines and integration workflows that underpin MBZUAI’s enterprise data platform on Snowflake. Working in close collaboration with the Senior Data & Analytics Specialist, the role focuses on ingesting data from institutional source systems, transforming it into reliable curated datasets, and supporting the data warehouse environment that drives institutional reporting, analytics, and decision-making.
The Data Engineer applies engineering best practices to ensure pipeline reliability, data quality, and platform performance, contributing directly to MBZUAI’s growing data and analytics capability within the IT Applications unit.
Key Responsibilities:
Data Pipeline Development & Operations
- Design, develop, and maintain ETL/ELT pipelines to ingest, transform, and load data from institutional source systems (Banner ERP, HR, Finance, Student Information, Research) into the Snowflake data warehouse.
- Implement and manage orchestration workflows using tools such as Apache Airflow or equivalent, ensuring timely and reliable data delivery.
- Monitor pipeline health, troubleshoot failures, and implement automated alerting and recovery mechanisms.
- Optimize pipeline performance for efficiency, scalability, and cost-effectiveness.
Snowflake Data Warehouse & Platform Support
- Support the operation and evolution of the Snowflake cloud data warehouse under the technical direction of the Senior Data & Analytics Specialist.
- Build and maintain curated data layers (staging, integration, presentation) following agreed architectural patterns and naming conventions.
- Assist in capacity planning, warehouse sizing, performance tuning, and credit/cost monitoring within Snowflake.
- Manage Snowflake objects, including databases, schemas, roles, and access grants, in line with the platform’s RBAC model.
Data Modeling & Transformation
- Implement dimensional models, conformed dimensions, and transformation logic using modern ELT frameworks (e.g., dbt).
- Write efficient, well-documented SQL and Python code for data transformation, enrichment, and business logic.
- Ensure data models support downstream reporting, analytics, and integration requirements effectively.
Data Quality & Documentation
- Implement and maintain data quality checks, validation routines, and anomaly detection within pipelines.
- Maintain clear documentation of data flows, schemas, transformation logic, and pipeline dependencies.
- Collaborate with source system owners to resolve data quality issues and agree on data contracts.
Engineering Best Practices & CI/CD
- Follow Git-based version control workflows for all pipelines and transformation code.
- Contribute to CI/CD pipelines for automated testing and deployment of data engineering artifacts.
- Participate in code reviews and adhere to team coding standards and design patterns.
Collaboration & Communication
- Work closely with the Senior Data & Analytics Specialist on priorities, architecture decisions, and delivery planning.
- Collaborate with Institutional Research and business data owners to understand data requirements and translate them into pipeline specifications.
- Coordinate with IT Applications (including the Integrations Lead and WSO2 middleware team), IT Infrastructure, and IT Security to ensure secure, reliable data integration.
Capability Building
- Stay current with emerging data engineering tools, technologies, and Snowflake platform features.
- Contribute to internal knowledge sharing, processing documentation, and team learning.
Other Duties
- Carry out all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.
Academic Qualification:
Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or a related field.
Professional Experience:
Essential
- 3–5 years of experience in data engineering, analytics engineering, or related roles with increasing responsibility.
- Solid hands-on experience with SQL and at least one programming language (Python preferred) for data pipeline development.
- Experience building and maintaining ETL/ELT pipelines using orchestration tools (e.g., Apache Airflow or equivalent).
- Working experience with Snowflake. Familiarity with Snowflake features such as Snowpipe, Streams, Tasks, Time Travel, and RBAC is expected.
- Familiarity with dimensional modeling concepts and modern transformation frameworks (e.g., dbt).
- Experience with Git-based version control and CI/CD practices for data engineering workflows.
- Understanding of data quality concepts, testing approaches, and documentation practices.
Preferred
- Experience with Azure data services (ADLS Gen2, Azure Monitor, Key Vault) or comparable cloud services.
- Exposure to BI tools such as Tableau or Power BI and understanding of how data models support reporting.
- Experience in a higher education, research, or public sector environment.
- Familiarity with data governance and data security best practices.
- Experience with containerization (Docker) and infrastructure-as-code concepts.
- Exposure to Palantir Foundry or similar enterprise data platforms is a plus.