OSL Group Limited is seeking an experienced DevOps/Site Reliability Engineer to join our dynamic team. In this role, you will be responsible for designing, implementing, and maintaining a highly scalable and reliable infrastructure to support our rapidly growing cloud-based digital assets platform.
Responsibilities:
- Maintain and optimize a large-scale server environment of over 1,000 instances across AWS
- Develop and manage Infrastructure as Code (IaC) solutions using Terraform for provisioning and managing cloud resources
- Design and implement containerization strategies using Kubernetes for deploying and orchestrating microservices
- Automate deployment and configuration management processes using Ansible
- Write custom scripts and tools in Python and Bash to enhance monitoring, alerting, and incident response
- Design and maintain Gitlab CI/CD pipelines to automate application deployment and infrastructure provisioning
- Implement robust disaster recovery and high-availability strategies to ensure service uptime and resilience
- Analyze cloud spending and implement cost-optimization measures to improve infrastructure efficiency
Requirements:
- 5-10 years of experience as a DevOps or Site Reliability Engineer
- Experience in designing and managing Gitlab CI/CD pipelines
- Expertise in container orchestration with Kubernetes
- Proficient in Infrastructure as Code (IaC) using Terraform
- Experience in configuration management with Ansible
- Skilled in programming with Python and Bash
- Familiarity with cloud platforms, specifically AWS
- Certified in AWS Cloud technologies (preferred)