Nearly every disease will become treatable in our lifetimes. Mandolin is laying the clinical and financial infrastructure to get groundbreaking treatments to patients faster, powered by AI agents.
Mandolin partners closely with the largest healthcare institutions in the US, covering more than $10B drug spend across the country. We're backed by Greylock, SV Angel, Maverick, SignalFire, and the founders of Vercel, Decagon, and Yahoo.
Our copilots handle ~80% of clinic back-office workload today. Reaching 99% means capturing the long-tail edge cases that bury teams in rework, pushing the latest foundation models to their accuracy and cost limits, and proving every improvement with airtight, regulator-ready evaluation.
We need an AI Engineer who has already built this bridge from "impressive demo" to "lights-out production." You will own the systems that serve, monitor, and improve our models in the field—turning Mandolin from a helpful copilot into a true autopilot where work closes itself and clinicians only handle the exceptions.
Model serving & inference. Deploy and operate LLMs and VLMs for real-time inference using vLLM, SGLang, or equivalent runtimes. Tune KV caching, batching strategies, and speculative decoding to hit latency and cost targets.
ML pipeline ownership. Build and maintain HIPAA-compliant pipelines—data capture → training runs → inference → human-in-the-loop feedback—end to end.
Evaluation & telemetry. Design evaluation harnesses and telemetry systems that surface model degradation, edge-case failures, and business impact before and after every deploy. Use customer and model telemetry to close the feedback loop continuously.
Performance debugging. Diagnose and resolve ML workflow bottlenecks—identifying whether issues are IO-bound, memory-bound, or compute-bound across GPU, CPU, and serverless footprints.
Infrastructure & reliability. Work with distributed systems (Kubernetes, queues, workers, load balancing) to keep inference services fault-tolerant, scalable, and observable.
Model strategy. Select and integrate SOTA models for vision, language, document parsing, and OCR. Apply fine-tuning, RAG, or quantization where ROI justifies it.
Product integration. Pair with forward-deployed engineers to turn field discoveries into new datasets, metrics, and rapid model iterations.
Production experience deploying and serving LLMs or VLMs—familiar with inference runtimes (vLLM, SGLang, or similar), KV caching, and speculative decoding.
White-box understanding of transformer-based models: tokenization (image and text), autoregressive generation, temperature scaling, and sampling techniques.
Hands-on experience with document parsing, OCR models, or structured data extraction from unstructured inputs.
Ability to debug ML system bottlenecks and reason clearly about IO vs. memory vs. compute tradeoffs.
Experience with distributed systems fundamentals—Kubernetes, message queues, workers, load balancing—sufficient to own a production inference stack.
Track record building telemetry and evaluation frameworks: using real customer data to measure model performance and using model-level signals to debug edge cases.
Proficiency in Python and comfort working across the ML stack, from data pipelines to serving infrastructure.
Healthcare, claims processing, or complex form-extraction experience.
Familiarity with fine-tuning techniques (LoRA/PEFT) or retrieval-augmented generation (RAG).
Experience on cloud ML stacks—Vertex AI, AWS SageMaker, or Kubernetes-native ML workflows.
Open-source contributions, peer-reviewed research, or public technical writing