We’re looking for an AI Researcher focused on model distillation to help us push the frontier of efficient, high-performance models. You’ll work on turning large, expensive models into smaller, faster, and more deployable systems—while maintaining or improving quality.
This role is ideal for someone who enjoys publishing research, working close to real systems, and seeing their ideas move from papers → code → production.
Design and evaluate model distillation techniques (teacher–student training, self-distillation, layer-wise distillation, representation matching, etc.)
Research tradeoffs between model size, latency, memory, and accuracy
Develop novel distillation approaches for:
Large language models
Long-context or specialized architectures
Inference-constrained environments
Run large-scale experiments and ablations; analyze results rigorously
Collaborate with engineers to productionize research outcomes
Write and submit research papers to top-tier venues (NeurIPS, ICML, ICLR, COLM, etc.)
Contribute to internal research notes, technical blogs, and open-source projects when appropriate
Required
Strong background in machine learning research
Hands-on experience with model distillation or closely related topics (compression, pruning, quantization, representation learning)
Publication experience (conference or journal papers, workshop papers, or arXiv preprints)
Solid understanding of deep learning fundamentals (optimization, training dynamics, generalization)
Fluency in PyTorch (or equivalent) and research-grade experimentation
Ability to clearly communicate research ideas, results, and limitations
Nice to Have
Experience distilling large language models
Work on efficiency-focused research (latency, memory, throughput)
Experience with long-context models or non-Transformer architectures
Open-source contributions in ML or research tooling
Prior startup or applied research experience
Real ownership over research direction at a Series A stage
Strong support for publishing and open research
Tight feedback loop between research and real-world deployment
Access to meaningful compute and production-scale problems
Small, highly technical team with deep ML and systems expertise
ML researchers from academia transitioning to industry
Research engineers with published work in model efficiency
PhD / Post-doc graduates or industry researchers who still want to publish