We think conversational AI agents will deliver all professional services in India. We started with astrology. We're a small group of engineers, designers, and product folks building at the intersection of conversational AI and domain expertise. Making an AI agent sound human-like is hard. Making an AI an expert in a domain is also hard. We're doing both together.
We're backed by Accel, Arkam Ventures, and Weekend Fund.
Frontier models are incredible at English. They're not incredible at Indic languages and our users speak Hindi, Hinglish, Tamil, Telugu, and a dozen others. Our agent needs to be fluent, accurate, and domain-expert-level across all of them.
The Applied AI Research team owns two problems. First: finetuning models for our use case improving accuracy, fluency, and domain understanding in Indic languages where foundation models fall short. Second: making those models work reliably in production prompt engineering, context management, retrieval strategies, and the systems that turn a capable model into a domain expert.
You'll bridge the gap between research and production. Some weeks you're running finetuning experiments to improve Hindi response quality. Other weeks you're redesigning how context flows through a multi-turn conversation. The through-line is the same: make the model better at doing what our users need, in the language they think in.
Finetune models for Indic language performance improving fluency, accuracy, and domain understanding in Hindi, Hinglish, and other Indian languages
Build and manage finetuning pipelines data curation, training runs, evaluation, and deployment of fine-tuned models
Work with the team on prompt engineering and context management designing how the model receives and reasons over information across multi-turn conversations
Design retrieval strategies that get the right domain data to the model at the right time
Run experiments on model behavior how finetuning, context structures, prompt formulations, and tool designs affect output quality across languages
Collaborate with Evaluation to measure what actually matters especially for subjective, language-dependent quality
Stay current with frontier model capabilities and figure out how to exploit new features the day they ship
Experience with LLM finetuning you've trained or fine-tuned models, managed datasets, and evaluated results
Understanding of Indic NLP challenges tokenization, code-switching (Hinglish), script diversity, and where current models fail
Experience with LLMs in production prompt engineering, context management, retrieval-augmented generation
You understand the difference between a demo and a production system you've fought with context windows, hallucinations, and inconsistent model behavior
Strong engineering skills you ship code, not just papers. Your research runs in production.
Experimental rigor you design experiments, control variables, and know when results are significant
Experience finetuning on Indic language data or multilingual corpora
Experience with Claude, Gemini, or other frontier model APIs at depth
Familiarity with our stack: Elixir, TypeScript/Bun, PostgreSQL, NATS
Published work or substantial projects in applied NLP, multilingual models, or knowledge-grounded generation
You've read our whitepapers (Realtime Context Engine, Context Splitting) and have thoughts on what we got wrong
We care about craft obsessively. Your work gets questioned, pulled apart, and rebuilt not because we're harsh, but because everyone here holds each other to a standard most places don't bother with. We work out of a hacker house in Vasant Kunj. We strongly encourage everyone to be in office.
If that sounds like the only way you'd want to work let's talk.