What is the salary for this Distributed Training Engineer role?

Salary information is not publicly listed for this position. Apply directly to discuss compensation with Periodic Labs.

Where is this Distributed Training Engineer position located?

This is an on-site position at Periodic Labs located in Remote.

How do I apply for this Distributed Training Engineer job at Periodic Labs?

Click the 'Apply' button on this page to be redirected to Periodic Labs's application portal. Make sure to have your resume ready and tailor your application to highlight relevant experience.

What is Periodic Labs?

Periodic Labs is actively hiring for Engineering roles. Visit the company page to see all open positions and learn more about working at Periodic Labs.

Distributed Training Engineer

Periodic Labs Remote 1 day ago

engineering

About Periodic Labs

We are an AI + physical sciences lab building state of the art models to make novel scientific discoveries. We are well funded and growing rapidly. Team members are owners who identity and solve problems without boundaries or bureaucracy. We eagerly learn new tools and new science to push forward our mission.

About the role

You will optimize, operate and develop large-scale distributed LLM training systems that power AI scientific research. You will work closely with researchers to bring up, debug, and maintain mid-training and reinforcement learning workflows. You will build tools and directly support frontier-scale experiments to make Periodic Labs the world’s best AI + science lab for physicists, computational materials scientists, AI researchers, and engineers. You will contribute open-source large scale LLM training frameworks.

You might thrive in this role if you have experience with:

Training on clusters with ≥5,000 GPUs
5D parallel LLM training
Distributed training frameworks such as Megatron-LM, FSDP, DeepSpeed, TorchTitan
Optimizing training throughput for large scale Mixture-of-Expert models