Bobyard builds AI systems that automate takeoffs for contractors, saving them dozens of hours per project. Delivering this reliably at scale requires production-grade ML infrastructure, deployment systems, and cloud architecture that do not break under real customer usage.
You will have very high autonomy in designing, executing, and iterating on our infrastructure. We are a startup, and we move fast. You will be the person responsible for turning research models into reliable production systems and building the foundation that allows engineering to ship quickly and safely. We look for world-class engineers who think in systems, take ownership of reliability and cost, and can go heads down to build durable infrastructure.
Design and maintain ML deployment and model serving infrastructure
Build end-to-end pipelines for model packaging, inference, monitoring, and scaling
Implement infrastructure-as-code across all cloud resources (Terraform target state)
Own CI/CD pipelines, release processes, and deployment automation
Manage GPU provisioning, utilization, and cloud cost optimization
Build monitoring, alerting, and observability across services
Work closely with ML and fullstack engineering to ship production systems
Contribute to product development (React + Django) when infrastructure priorities allow
Strong PyTorch knowledge with understanding of speed and memory bottlenecks and inference optimization
Comfortable managing GPU services (AWS, GCP,...), model containers, versioning and scaling
Experience owning infrastructure at a small team or startup
Cloud-native and pragmatic — chooses simple, reliable solutions
High ownership mindset — you don’t wait to be told what to fix
Cost-aware and disciplined about cloud spend
Full-stack capable — can ship features in React or Django when needed
Fast learner who can navigate unfamiliar systems and tools quickly
Passion for building foundational systems that enable product velocity
This is a full-time & in-person role in the SF Bay Area. Learning rate and ownership are vital factors. If you can build the infrastructure that our models and customers depend on — at the speed and quality the market demands (or if you can prove that you will acquire the ability to do so fast enough), we would love to work with you.