Astera is a private foundation on a mission to steer science and technology toward an abundant future. We believe the coming years will bring an era of unprecedented scientific and technological advancement as exponential progress in AI converges with central advances in other fields to dramatically accelerate innovation. This inflection point provides an unparalleled opportunity to fundamentally rethink the institutions, systems, and tools that drive scientific progress.
Unlike traditional non-profit research organizations, projects supported by Astera operate like high-velocity startups, allowing us to focus on ambitious goals, match structure to problem, and attract strong technical talent and leadership. You can read more about our mission, vision, and programming here.
We are seeking a scientist to join the diffUSE Project, which focuses on developing next-generation protein representations that bridge dynamic structural biology to downstream functional applications. The diffUSE Project is an ambitious initiative designed to advance our understanding of protein dynamics by building the experimental methods, computational models, and global infrastructure needed to capture molecular motion at scale. Our goal is to establish dynamic structural biology as a foundational pillar of modern science, as transformative and indispensable as the Protein Data Bank has been for static structures.
In this role, you will build ensemble-aware protein representations to enable the integration with downstream inputs such as protein language model (PLM), LLMs, or other functional predictions. You will design and maintain large-scale bioinformatic pipelines, manage complex datasets, develop metrics for dynamics and/or fine-tune or architect ML models to capture sequence-structure-function relationships. A key part of the role involves synthesizing diverse data sources to improve biological relevance. You will work closely with experimental collaborators to ground computational insights in real biological systems.
Build ensemble-aware protein representations that integrate PLM and LLM embeddings with experimentally derived structural heterogeneity for functional prediction
Design, develop, and maintain large-scale bioinformatic pipelines capable of processing and managing complex, high-dimensional datasets
Fine-tune or architect ML models to capture sequence-structure-function relationships, with a focus on dynamic and conformational features
Synthesize diverse data sources spanning evolutionary history, binding affinity, allostery, and functional annotations to improve model performance and biological relevance
Collaborate closely with experimental partners to ground computational representations in real biological measurements and ensure models are continuously refined against experimental ground truth
Contribute to the broader diffUSE infrastructure, helping establish community-wide standards and tools for dynamic structural biology
PhD in bioinformatics, computational biology, machine learning, or a related field.
Strong understanding of protein structure and function.
Demonstrated experience building large bioinformatic pipelines and managing high-dimensional datasets.
Proficiency in fine-tuning or modifying ML models (e.g., transformer-based architectures).
Familiarity with protein language models (ESM, AlphaFold, etc.) is a plus.
Collaborative, team-oriented mindset with the ability to drive research questions from conception to execution.
The posted salary range is based on location in the Bay Area. The successful candidate will receive a competitive compensation package, commensurate with their experience and location.