Is the Member of Technical Staff - Multi-Modal, Vision role at Liquid Ai remote?

Yes, the Member of Technical Staff - Multi-Modal, Vision role at Liquid Ai is a remote position.

How do I apply for the Member of Technical Staff - Multi-Modal, Vision position at Liquid Ai?

You can apply for the Member of Technical Staff - Multi-Modal, Vision position at Liquid Ai directly through HireHere. Click the "Apply" button on the job listing to be taken to the application page.

Member of Technical Staff - Multi-Modal, Vision

Liquid AiRemote1d ago

About Liquid AI

Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there.

The Opportunity

The VLM team builds vision-language models that run on-device, under tight latency and memory constraints, without sacrificing quality. We have released four best-in-class models and we're just getting started. This role blends research and implementation: you'll design experiments, run them, and turn the results into shipped models.

Minimal qualifications:

Hands-on experience in training or evaluating VLMs with demonstrated experimental rigor.
Turning research ideas into robust, maintainable implementations, not one-off prototypes.
Proficiency in Python, and experience with distributed training (DeepSpeed, FSDP, Megatron-LM, etc.).
M.S. or Ph.D. in Computer Science, Mathematics, or a related field; or equivalent industry experience.

This role is for you if you have experience in some of the following:

Building or optimizing multimodal training or data pipelines.
Multimodal post-training experience (SFT, preference optimization, RL-style methods).
Dataset design and data quality expertise (scoring, filtering, dedup, long-tail mining).
Prior open-source contributions (models, benchmarks, eval tooling).
Published research at top AI conferences (NeurIPS, ICML, CVPR, ECCV, ICLR, ACL, etc.).
Experience with computer vision or visual representation learning.

What working here might look like

Ship a capability end-to-end. Example: lead visual grounding from task spec through data curation, training recipe, ablations, evaluation, integration into the final run, and open-weight release.
Improve reasoning through RL, preference methods, and better attribution to visual evidence.
Push the quality-efficiency frontier on token efficiency, encoder/connector design. Exemplary outcome: a connector that cuts vision tokens without quality loss.
Build data pipelines that move model quality. Synthetic generation, filtering, dedup, diagnostics, from captioning to reasoning tasks.
Scale VLM infra and raise the team's bar. Multi-node pipelines, reproducible experiments, shared tooling, and hiring.

What Success Looks Like (Year One)

Our VLM models are SOTA across all major benchmarks
You own a major workstream (video understanding, data quality, or encoder architecture) end-to-end
At least one model has shipped to production with your direct contribution

What We Offer

Full ownership: You own your work from architecture to deployment.
Compensation: Competitive base salary with equity in a unicorn-stage company
Health: We pay 100% of medical, dental, and vision premiums for employees and dependents
Financial: 401(k) matching up to 4% of base pay
Time Off: Unlimited PTO plus company-wide Refill Days throughout the year

Member of Technical Staff - Multi-Modal, Vision

About Liquid AI

The Opportunity

What working here might look like

What Success Looks Like (Year One)

What We Offer

Explore Engineering

Skills in this job

People also search for