Application Open:
Full-Time
Job Purpose:
The Principal Data Architect will lead the design and implementation of scalable, reliable, intelligent, and high-performance distributed systems to support large-scale data services for the whole university. Collaborating with cross-functional teams and data owners, the role will identify the data ecosystem issues, analyze user requirements, build comprehensive data management solutions, and data governance frameworks. The Principal Data Architect will establish best engineering practices, define system quality standards, and lead the architecture of data hardware and software. This role offers the opportunity to advance MBZUAI development, drive innovation, and ensure scalability as we grow.
Key Responsibilities:
Data Architecture and Infrastructure:
- Take charge of the architecture of data hardware (HW) and software (SW) solutions, ensuring they meet business and technical requirements.
- Design and implement data pipelines, ETL processes, and data integration workflows.
- Optimize data storage, retrieval, and processing for performance and cost-efficiency.
Development of Data Systems:
- Fully support to implement of industry-leading distributed systems that are flexible, reliable, scalable, stable, robust, and extensible.
- Build high-performance storage and computing systems to support massive core data and large-scale products.
- Develop big data systems for various purposes, including real-time reporting, growth analysis, multi-dimensional analysis, and AI-based services.
- Review and provide feedback on final user requests and convert them to the architecture and ensuring accuracy and clarity.
Troubleshooting and Maintenance:
- Push engineers to troubleshoot production systems, identify application code-related issues, and ensure timely resolution.
- Push engineers to monitor system performance, diagnose bottlenecks, and implement optimizations to enhance efficiency.
Engineering Best Practices:
- Establish and promote solid design principles and best engineering practices for both technical and non-technical stakeholders.
- Provide input on, follow, and evangelize code quality guidelines and standards.
- Conduct code reviews to ensure adherence to best practices, maintainability, and scalability.
Collaboration and Communication:
- Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable insights.
- Collaborate with cross-functional teams to integrate data solutions into broader systems and workflows.
- Communicate complex technical concepts effectively to non-technical stakeholders.
Innovation and Continuous Improvement:
- Stay updated with emerging technologies, tools, and trends in data engineering and big data.
- Propose and implement innovative solutions to improve data processing, storage, and analysis capabilities.
- Continuously optimize data systems to handle increasing volumes of data and evolving business needs.
Data Security and Compliance:
- Implement data security best practices to protect sensitive information and ensure compliance with regulations (e.g., GDPR, CCPA).
- Conduct regular audits and vulnerability assessments to maintain data integrity and security.
- Ensure data systems adhere to industry standards and organizational policies.
Testing and Quality Assurance:
- Define and implement testing frameworks for data pipelines and systems to ensure reliability and accuracy.
- Collaborate with QA teams to identify and resolve data-related issues.
- Ensure data quality through validation, cleansing, and transformation processes.
DevOps and Deployment:
- Collaborate with DevOps teams to ensure smooth deployment and integration of data systems.
- Implement logging, monitoring, and alerting systems to ensure data system health and performance.
- Manage CI/CD pipelines for data engineering workflows.
Mentorship and Leadership:
- Mentor data engineers and team members to foster skill development and growth to meet the university’s needs.
- Lead technical discussions and contribute to strategic decision-making for data engineering initiatives.
- Drive the adoption of best practices and innovative technologies within the team.
Other Duties:
- Perform all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.
Academic Qualification:
- Master’s degree in Computer Science, Data Science, or related field.
- A PhD degree will be preferred.
Professional Experience:
Essential
- Minimum 10-12 years of experience in one or more programming languages such as Java, C++, or Python.
- Minimum 8 years of proven experience as a Data Engineer in developing complex, high-quality Data/Software/AI application systems.
- Minimum 5 years in senior or principal roles in large-scale data systems.
- Experience in petabyte-level data processing is a plus.
- Strong understanding of data platform concepts, including Data Lake, Data Warehouse, ETL, Big Data Processing, Real-time Processing, Scheduling, Monitoring, Data Governance, and Task Governance.
- Proficiency in Big Data technologies such as Hadoop, MapReduce, Hive, Spark, Metastore, Flume, Kafka, Flink, Elasticsearch, and data platforms such as Snowflake.
- Experience architecting data systems for complex business problems, including data warehousing, data ingestion, query optimization, data segregation, ETL, ELT, Redshift, EC2, S3, Azure, and AWS.
- Expertise in optimizing columnar and distributed data processing systems and infrastructure.
- Proficient in applying best-practice Design Patterns and Design Principles to software architecture and algorithms.
- Experience building enterprise software architectures such as Microservices, SOA, and MVC.
- Hands-on experience with monitoring, alerting, and logging tools like Prometheus, New Relic, Datadog, ELK stack, and distributed tracing.
- Strong knowledge of testing methodologies, including unit tests, component tests, and integration tests.
- Expertise in database technologies, including MySQL, PostgreSQL (knowledge of normal forms, ACID, isolation levels, index anatomy), and NoSQL databases like MongoDB and Redis.
- Proficiency in managing Linux environments.
- Proficient understanding of code versioning tools such as Git/GitFlow and SourceTree.
- Experience in building process management and continuous integration.
- Experience working with modern software development methodologies such as Scrum, Kanban, and XP.
Preferred
- 12+ years of industry experience.
- Experience in higher education or research institutions, with an understanding of core research facility operations.
- Proficiency in data analytics for process optimization and continuous improvement.
- Strong English proficiency, with fluency in additional languages as a plus.
- Familiar with data handling with Snowflake.
- Proven experience and success at Higher Education Reference Models (HERMs) implementation and optimization or similar.