WitnessAI is the unified AI security and governance platform enterprises trust to govern and protect all AI activity. We provide complete, network-level visibility into every interaction including employees and autonomous agents, even in native apps where legacy tools are blind. Unlike security that relies on outdated keywords, our platform understands intent, enabling intelligent policies that stop novel threats like prompt injection while empowering productivity. WitnessAI transforms security from a bottleneck into the enabler of your AI strategy as the confidence layer for enterprise AI.
High-impact individual contributor role at the intersection of cloud infrastructure, developer experience, AI-augmented tooling, and platform security. You will own architecture and operations of a large-scale multi-tenant SaaS platform across AWS and GCP, partner directly with product engineering teams on new service design, and work autonomously from ambiguous problem to production solution.
Design and operate 50+ EKS/GKE clusters, multi-tenant compute, autoscaling, and cluster lifecycle management across AWS and GCP.
Own Infrastructure-as-Code (Terraform) for multi-account, multi-region environments spanning 200+ repos and services.
Architect and run end-to-end CI/CD pipelines (Harness, GitHub Actions, ArgoCD) with supply chain security, SBOM, and progressive delivery.
Build and operate large-scale observability stacks — metrics, logs, distributed traces — with OpenTelemetry across all clusters.
Embed DevSecOps controls: secret management, image signing (Cosign/Chainguard), OPA/Kyverno policy-as-code, and compliance automation.
Create AI/LLM-powered internal tools for platform operations, incident triage, drift detection, and CI/CD automation.
Operate SQL (Aurora/PostgreSQL) and NoSQL (DynamoDB, ClickHouse, Elasticsearch) platforms at scale, including DR and schema lifecycle.
Partner with dev teams from architecture review through production launch; produce ADRs, runbooks, and engineering standards.
10+ years in DevOps, Platform Engineering, or SRE in cloud-native SaaS environments.
Expert AWS (EKS, RDS/Aurora, IAM, VPC, Cost Management) with solid GCP experience.
Production Terraform at scale: modules, state, drift detection, multi-account patterns.
Advanced Kubernetes: RBAC, network policy, GitOps (ArgoCD/Flux), operators, and resource management.
Strong Go and/or Python — able to build and ship production-grade internal tooling.
Hands-on experience building tools with AI/LLM APIs integrated into engineering workflows.
Production SQL proficiency and NoSQL platform operations experience.
Demonstrated design and operation of large-scale, multi-cluster observability solutions.
DevSecOps: vulnerability management, supply chain security, compliance frameworks (SOC 2 / ISO 27001).
Self-directed: scopes ambiguous problems, drives to solution, and delivers independently.
Regulated industry background (Saas-based fintech, healthtech, AI infrastructure).
GPU infrastructure or AI model-serving experience (vLLM, SageMaker).
Open source development projects used by others
Hybrid work environment
Competitive salary and equity
Health, dental, and vision insurance
401(k) plan
Opportunities for professional development and growth
Generous vacation policy
$234,000-$257,000 (The exact salary will be determined based on the selected candidate’s location, qualifications, experience, and relevant skills.)