Role Overview
We are looking for a Senior Technology Engineer to drive platform stability, automation, and operational excellence within the Data Science Platform (DSP).
This is not a support role — this is a hands-on engineering role where you own automation, orchestration, and reliability across a hybrid cloud ecosystem (OpenShift + AWS/Azure/GCP).
You will be the backbone of DSP operations — if things break, scale poorly, or require manual intervention, that’s your problem to eliminate permanently.
Requirements
Key Responsibilities
Platform Engineering & Operations
- Own end-to-end technical operations of DSP infrastructure
- Ensure high availability, performance, and scalability of platform services
- Monitor system health, troubleshoot issues, and implement permanent fixes (not patchwork)
Automation & Orchestration
- Design and implement automation frameworks to eliminate manual processes
- Build CI/CD pipelines and automate deployment workflows
- Drive infrastructure-as-code (IaC) adoption using tools like Terraform/Ansible
Container & Cloud Platform Management
- Manage and optimize OpenShift / Kubernetes environments
- Work across multi-cloud (AWS, Azure, GCP) infrastructure
- Ensure efficient resource utilization and cost optimisation
MLOps / Data Platform Support
- Enable smooth ML model deployment and lifecycle management
- Support tools like OpenShift AI, SageMaker, or similar platforms
- Ensure reproducibility and reliability of data science workflows
Monitoring & Reliability
- Implement monitoring using Prometheus, Grafana, ELK stack
- Define SLAs, SLOs, and ensure platform meets reliability standards
- Drive proactive incident prevention (not reactive firefighting)
Collaboration & Governance
- Work closely with Data Scientists, DevOps, and Platform teams
- Ensure adherence to security, compliance, and governance standards
- Act as a technical SME for DSP operations
Mandatory Skills (Non-Negotiable)
- Strong experience in OpenShift / Kubernetes
- Hands-on experience in multi-cloud environments (AWS/Azure/GCP)
- Expertise in automation (Terraform, Ansible, Jenkins, GitOps)
- Strong knowledge of CI/CD pipelines and DevOps practices
- Experience in Python or scripting (Bash/Shell)
- Experience with monitoring tools (Prometheus, Grafana, ELK)
Good to Have
- Experience in MLOps / AI platforms (OpenShift AI, SageMaker, Bedrock)
- Exposure to LLM deployment / inference platforms (vLLM, Triton, etc.)
- Knowledge of data pipelines and big data ecosystems
- Banking or financial services experience
Experience Required
- 6–10 years of relevant experience in Platform Engineering / DevOps / Cloud Engineering
- Proven experience managing enterprise-scale platforms