We are seeking a hands-on Platform Engineer to support a high performance computing platform used by computational scientists in Research & Development. This role focuses on AWS infrastructure, DevOps automation, container platforms, and high throughput storage, with heavy use of infrastructure as code. You will own cloud and HPC infrastructure end-to-end and work closely with scientists and engineers to deliver scalable, reliable, and automated platform solutions.
Key Responsibilities:
- Design, build, and operate scalable and high performance cloud infrastructure on AWS
- Manage infrastructure as code using Terraform, Terragrunt, and CloudFormation
- Build immutable infrastructure with Packer
- Develop and maintain CI/CD pipelines using GitLab CI/CD
- Operate containerized workloads across:
- Amazon EKS
- Docker on EC2
- Singularity (Apptainer) for HPC workloads
- Configure systems using Ansible
- Design and operate high throughput cloud and HPC storage solutions
- Monitor, troubleshoot, and optimize platforms for performance, reliability, and cost
- Document architectures and operational best practices
Qualifications
- Strong hands-on experience with AWS
- Deep experience with Terraform / Terragrunt; working knowledge of CloudFormation
- Experience with GitLab CI/CD
- Containers: Kubernetes (EKS), Docker, familiarity with Singularity
- High performance storage experience (e.g., FSx for Lustre, Weka, or similar)
- Image Builds: Experience with Packer for AMI and image creation
- Strong Linux and networking experience, with working knowledge of Ansible for configuration management
- Python/Bash scripting, plus strong communication and documentation skills
- Proficiency with Git
Nice to Have / Preferred Skills:
- Experience with HPC, scientific computing, or data-intensive platforms
- Familiarity with Go or other scripting languages for automation
- Cloud security best practices
- Observability tools (Prometheus, Grafana, CloudWatch)
Qualifications:
- 5+ years in DevOps, Platform Engineering, or SRE
- Bachelor’s degree or equivalent practical experience
- Strong problem-solving and communication skills
- Comfortable working independently in a fast moving, collaborative environment
Additional Information
***This role is 100% remote.