Infrastructure Engineer
Infrastructure Engineer
Location: Remote
About ValidMind
ValidMind empowers financial services organizations to bring more trust and transparency to the world’s AI/ML/LLM models. With the rapid evolution of AI, increased regulatory scrutiny, and lack of fit-for-purpose tooling, financial services’ Model Risk Management (MRM) and AI Governance functions are under enormous pressure to ensure compliance. We are passionate about helping these organizations seamlessly and confidently test, validate, and document their business’ AI models while ensuring compliance with domestic and international AI and model risk regulations.
Overview
We’re looking for a skilled Infrastructure Engineer to design, build, and maintain reliable, scalable infrastructure that supports our engineering teams and product delivery. You’ll be responsible for managing cloud environments, implementing infrastructure-as-code practices, and ensuring high availability and observability of our systems.
What You'll Do & Your Impact:
- Design, deploy, and manage infrastructure using Docker, Kubernetes, and Terraform to support production and development environments.
- Manage cloud infrastructure on a major provider, preferably AWS (experience with GCP or Azure also considered).
- Implement monitoring and observability solutions using tools such as Datadog, Splunk, Prometheus, or Grafana to ensure system reliability and performance.
- Collaborate closely with backend and fullstack engineers to support continuous integration, delivery, and deployment pipelines.
- Participate in the on-call rotation, respond to incidents, and help drive post-incident reviews and reliability improvements.
- Automate operational tasks using scripting languages such as Bash and Python.
- Maintain and improve security and compliance practices within infrastructure and deployment processes.
- Document infrastructure designs, processes, and procedures to promote transparency and knowledge sharing across the team.
Who You Are & What Makes You Qualified:
- 3+ years of professional experience in infrastructure, DevOps, or SRE roles.
- Strong experience with containerization (Docker) and orchestration (Kubernetes) in production environments.
- Proven experience with Terraform or other infrastructure-as-code tools.
- Hands-on experience with AWS (EC2, ECS/EKS, S3, IAM, CloudWatch, etc.) or another major cloud platform.
- Proficiency in monitoring and logging tools (e.g., Datadog, Splunk, Prometheus, ELK stack).
- Comfortable writing automation scripts in Bash and Python.
- Experience supporting CI/CD pipelines and deployment workflows.
- Strong communication skills and ability to collaborate effectively with cross-functional teams.
- Willingness to participate in an on-call rotation and help improve system reliability and response processes.
Nice-to-Have(s):
- Familiarity with service mesh or networking within Kubernetes.
- Experience with security best practices in cloud and containerized environments.
- Understanding of GitOps workflows (e.g., ArgoCD, Flux).
- Knowledge of performance tuning, capacity planning, and cost optimization in cloud environments.
Why Join Us
- Opportunity to have a direct impact on the stability and scalability of core systems.
- Collaborative engineering culture with strong ownership and autonomy.
- Exposure to a modern tech stack and opportunities for professional growth.
At ValidMind, we create the most efficient solution for organizations to automate testing, documentation, and risk management for AI and statistical models. Working here means being at the forefront of AI risk management, but it’s also more personal than that: we promote an inclusive culture where we value your ideas and creativity. We want you to have a sense of ownership over your work, to build mutual trust with your peers, and to feel supported in everything you do. There is ample room to grow as a VC-backed company in the early stages of growth.
Similar Jobs
Intermediate Fullstack Engineer (Ruby on Rails & Vue.js), Plan: Knowledge
GitLab
Software Engineer
Level Access
Agentic Senior Software Engineer
MERGE
AI Application Architect
Nimble Gravity
Lead Test Engineer (Salesforce)
Tria Federal
Similar Jobs
Intermediate Fullstack Engineer (Ruby on Rails & Vue.js), Plan: Knowledge
GitLab
Software Engineer
Level Access
Agentic Senior Software Engineer
MERGE
AI Application Architect
Nimble Gravity
Lead Test Engineer (Salesforce)
Tria Federal