Requirement:
- Experience: 5+ years
- Strong experience in DevOps or Site Reliability Engineering (SRE) roles.
- Strong knowledge of Docker, Kubernetes, Terraform, and CI/CD pipelines.
- Hands-on experience with AWS, Azure, or other cloud platforms.
- Familiarity with GPU infrastructure and ML workloads is a plus.
- Good understanding of monitoring and logging systems (Prometheus, Grafana).
- Ability to collaborate with ML teams for optimized inference and deployment.
- Strong troubleshooting and problem-solving skills in high-scale environments.
- Knowledge of infrastructure security best practices, cost optimization, and performance tuning.
- Exposure to vector databases and AI/ML deployment pipelines is highly desirable.
Responsibilities:
- Maintain and manage Kubernetes clusters, AWS/Azure environments, and GPU infrastructure for high-performance workloads.
- Design and implement CI/CD pipelines for seamless deployments and faster release cycles.
- Set up and maintain monitoring and logging systems using Prometheus and Grafana to ensure system health and reliability.
- Support vector database scaling and model deployment for AI/ML workloads.
- Collaborate with ML engineering teams to optimize inference performance and resource utilization.
- Ensure high availability, security, and scalability of infrastructure across multiple environments.
- Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
- Troubleshoot production issues and implement proactive measures to prevent downtime.
- Continuously improve deployment processes and infrastructure reliability through automation and best practices.
- Participate in architecture reviews, capacity planning, and disaster recovery strategies.
- Drive cost optimization initiatives for cloud resources and GPU utilization.
- Stay updated with emerging technologies in cloud-native, AI infrastructure, and DevOps automation.
Qualifications
Bachelor’s or master’s degree in computer science, Information Technology, or a related field
About the Company
👋🏼 We're Nagarro
We are a Digital Product Engineering company that is scaling in a big way! We build products, services, and experiences that inspire, excite, and delight. We work at scale — across all devices and digital mediums, and our people exist everywhere in the world (17500+ experts across 39 countries, to be exact). Our work culture is dynamic and non-hierarchical. We are looking for great new colleagues. That is where you come in!
Sponsored
Explore Engineering
Skills in this job
People also search for
Similar Jobs
More jobs at Nagarro
Senior Staff Engineer (Data Warehousing, ETL)
Nagarro
Storage System Engineer (m/f/d)
Nagarro
Data Engineer – Snowflake & dbt
Nagarro
Associate Distinguished Engineer (Enterprise Architect in Banking domain)
Nagarro
Associate Distinguished Engineer - Salesforce Architect (Field Service)
Nagarro
Apply for this position
Sign In to ApplyAbout Nagarro
Careers at Nagarro. Find Great Talent with Career Pages. | powered by SmartRecruiters | Find Great Talent with a Career Page.
Similar Jobs
More jobs at Nagarro
Senior Staff Engineer (Data Warehousing, ETL)
Nagarro
Storage System Engineer (m/f/d)
Nagarro
Data Engineer – Snowflake & dbt
Nagarro
Associate Distinguished Engineer (Enterprise Architect in Banking domain)
Nagarro
Associate Distinguished Engineer - Salesforce Architect (Field Service)
Nagarro