Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting fundamental limits in power, capacity, and cost with today’s homogeneous, vertically integrated infrastructure. Gimlet addresses this by decoupling AI workloads from the underlying hardware. Our platform intelligently partitions workloads into components and orchestrates each component to hardware that best fits its performance and efficiency needs. This approach enables heterogeneous systems across multi-vendor and multi-generation hardware, including the latest emerging accelerators. These systems unlock step-function improvements in performance and cost efficiency at scale.
On top of this foundation, Gimlet is building a production-grade neocloud for agentic workloads. Customers use Gimlet to deploy and manage their workloads through stable, production-ready APIs, without having to reason about hardware selection, placement, or low-level performance optimization.
Gimlet works with foundation labs, hyperscalers, and AI native companies to power real production workloads built to scale to gigawatt-class AI datacenters.
Gimlet Labs is seeking a Member of Technical Staff focused on compilers. In this role, you will work on the core compilation and lowering infrastructure that transforms high-level AI workloads into efficiently executable programs across diverse, cutting-edge hardware. You will design and implement compiler systems that partition workloads, lower them through multiple IRs, and target a range of execution runtimes and accelerators.
This is a role for engineers who enjoy building real systems, working close to hardware, and translating emerging AI models and execution patterns into production-ready infrastructure.
Design and implement compiler pipelines that transform high-level AI workloads into executable programs across heterogeneous hardware
Develop and evolve multi-level IRs spanning graph-level, tensor-level, and kernel-level representations
Implement partitioning and lowering strategies that map workload components to the appropriate execution runtimes and accelerators
Support both ahead-of-time and JIT compilation paths, including dynamic shapes and runtime specialization
Integrate new model architectures, ops, and execution patterns into the compiler stack
Work closely with runtimes, kernels, and systems to ensure correctness, performance, and scalability across the stack
Strong software engineering fundamentals
Experience building or working on compiler systems or compiler-adjacent infrastructure
Comfort reasoning about execution, memory, and performance across hardware boundaries
Experience with compiler frameworks such as LLVM, MLIR, TVM, XLA, or similar
Experience designing or working with multi-level IRs
Familiarity with hardware-specific lowering for GPUs, accelerators, or vectorized architectures
Experience interfacing with execution runtimes, launch APIs, or memory allocators
Software development experience in C++ and Python