Clockwork.io

Senior Product Manager - AI Observability

Clockwork.io Palo Alto, California Today
product

About Clockwork Systems

Clockwork.io – Software Driven Fabrics to increase GPU cluster utilization

Clockwork Systems was founded by Stanford researchers and veteran systems engineers who share a vision for redefining the foundations of distributed computing. As AI workloads grow increasingly complex, traditional infrastructure struggles to meet the demands of performance, reliability, and precise coordination. Clockwork is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability to catch and quickly resolve problems, workload fault tolerance to keep jobs running through failures, and performance acceleration that dynamically routes and paces traffic to avoid congestion.

To learn more, visit www.clockwork.io.

About the Role

As Senior Product Manager for AI Observability, you will lead the product strategy and execution for Clockwork’s cross-stack observability solution which helps customers detect slow or failing workloads and precisely correlate them with underlying infrastructure issues. You’ll work at the forefront of the emerging AI market, bringing world-first observability technologies to life. 

What You Will Do

  • Define and drive product strategy and roadmap for Clockwork’s AI Observability portfolio, covering Fleet Audit (pre-flight validation), Fleet Observability (to uncover and solve fabric issues in real-time) and AI Workload Observability (to identify workload issues and correlate them to the underlying infrastructure). 
  • Develop a deep understanding of pain points and workflows by working directly with customers and crisply translate them into compelling and differentiated product requirements.
  • Drive end-to-end rapid execution - write PRDs, set priorities, unblock teams, make tradeoffs, and ensure high-quality releases.
  • Partner cross-functionally with engineering, sales, and marketing to shape the product, ship reliably, and communicate clear value to technical customers.
  • Be the voice of the product internally

What We’re Looking For

  • 7+ years of Product Management experience with at least some time working in the observability space
  • Strong experience with modern observability stacks: metrics, logs, traces, OpenTelemetry, Prometheus/Grafana.  Familiarity with GPU observability tooling (e.g, NVIDIA DCGM, NSight) and experience with MLOps and LLMOpps ecosystems is a plus.
  • Strong technical depth in Kubernetes, SLURM, AI training and related components (e.g. PyTorch, NCCL, etc.), GPU clusters and RDMA networking (InfiniBand and RoCE)
  • Excellent product leadership - clear writing, crisp tradeoffs, strong prioritization, and the ability to collaborate effectively with highly technical engineering teams
  • Customer empathy and discovery strength - able to identify high-impact pain points and convert them into compelling product strategy and execution.
  • A builder mindset that is energized by early-stage products, rapid iteration, customer closeness, and shipping market changing solutions.

Enjoy

  • Challenging projects.
  • A friendly and inclusive workplace culture.
  • Competitive compensation.
  • A great benefits package.
  • Catered lunch.

Clockwork Systems is an equal opportunity employer. We are committed to building world-class teams by welcoming bright, passionate individuals from all backgrounds. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, religion, age, sex, sexual orientation, gender identity or expression, national origin, disability, or protected veteran status. We believe diversity drives innovation, and we grow stronger together.

Sponsored

Explore Product

Skills in this job

People also search for