We are seeking a Senior Machine Learning Engineer / Platform Engineer to design and build a production-grade agentic workflow platform. This role sits at the intersection of LLM systems engineering, distributed platforms, and applied ML, with a strong emphasis on orchestration, reliability, and extensibility. You will be responsible for architecting and implementing agent-based workflows that integrate large language models, retrieval systems, structured knowledge, and external APIs—designed for robustness, observability, and real-world business use.
- Design and implement multi-agent and single-agent workflows using orchestration patterns and tools, context engineering, memory management, and guardrail strategies.
- Design RAG pipelines incorporating vector search, hybrid retrieval, and citation tracking.
- Implement knowledge graph–backed reasoning, including ontologies, entity resolution and graph-based context construction.
- Design evaluation frameworks for agent task completion correctness, quality, cost, and latency.
- Develop and deploy machine learning models, focusing on production readiness, scalability, and performance.
- Collaborate with data scientists to transition experimental models into robust, production-grade applications.
- Integrate with collaboration platforms (e.g., Teams, alerting systems) for intelligent distribution of insights.
- Implement and manage CI/CD pipelines to automate deployment, testing, and monitoring of models.
- Architect and deploy systems on AWS, leveraging compute, storage and security services
Qualifications
- Bachelor’s or master’s degree in computer science, Engineering, or related field.
- 6+ years of experience in software engineering, ML engineering, or platform engineering.
- Strong proficiency in writing production-grade Python, and experience with Claude Code or Cursor.
- Hands-on experience with LLM-based systems, including:
- LangChain / LangGraph
- MCP
- Langsmith
- Claude or comparable frontier models
- AWS AgentCore or comparable agentic frameworks
- Solid understanding of RAG architectures, embeddings, and vector search.
- Experience designing and consuming APIs (REST and/or async/event-driven).
- Strong cloud engineering experience on AWS.
- Knowledge of how to fine-tune frontier models to specific domain knowledge
- Experience with distillation, quantization and small language models is a plus
- Experience deploying traditional machine learning models into production environments using MLOps tools and best practices.
- Knowledge of distributed systems, large-scale model optimization, and API development.
- Exceptional ability to work on a team – especially a dynamic, innovative “tiger team” developing early stage PoC systems.
- Strong understanding of container orchestration and cloud-native application design.
- Ability to work in dynamic environments, handling rapid experimentation and iterative development.
Additional Information
Personal Characteristics
- A self-motivated individual who thrives on seeing the results of their work and its impact on the business
- Strong communication skills, both verbally and in writing
- A keen sense for the art of the possible
- Proven ability to be flexible and work hard, both independently and collaboratively
- Methodical and organized - in general, in experimental design, and in code!
- Attention to detail with strong analytical, mathematical, and problem-solving skills
- An interest in learning about the energy commodities space
- Resourceful and able to think creatively and adapt in a dynamic and energetic environment
- Team player, with an open, non-political style and a high level of personal integrity
- Desire to be a thought-partner in a fast-growing team, and make an impact at a business that sits at the heart of the world’s energy flows
This Role is located in Houston, TX - In office 5x a week
All your information will be kept confidential according to EEO guidelines.