Robots and Pencils

AI Engineer – L4

Robots and Pencils Calgary, AB (Remote-Friendly) 1 day ago
data

At Robots & Pencils, we build meaningful, scalable digital products by blending strategy, design, and engineering. We are seeking a Level 4 AI Engineer to build production LLM applications for an enterprise client as part of a long-term, delivery-focused engagement.

You will own the AI stack end-to-end, including RAG pipelines, prompt engineering, and evaluation frameworks. This is a hands-on role: you will write production code, tune prompts, build evaluation and observability systems, and iterate based on real user feedback.

There is a working proof of concept in place. Your responsibility is to make it production-ready and extend it with intelligent, reliable features that operate at enterprise scale.

 

What You’ll Do

AI & LLM Application Delivery

· Build, optimize, and evolve RAG pipelines, including retrieval strategies, chunking, and re-ranking

· Develop prompts and guardrails for domain-specific LLM applications

· Implement hallucination detection, mitigation, and fact-checking mechanisms

· Build embeddings-based search and recommendation features

· Validate AI features with real users and iterate based on qualitative and quantitative feedback

Evaluation, Monitoring & Reliability

· Set up and maintain LLM evaluation frameworks to measure quality, relevance, and reliability

· Implement observability and monitoring for production AI systems

· Monitor live AI systems and resolve quality, accuracy, and performance issues

· Continuously improve AI outputs based on evaluation data and user behavior

Platform & System Integration

· Work closely with product and engineering teams to integrate AI into user-facing features

· Build and maintain backend services in Python

· Integrate with vector databases to support retrieval and semantic search workflows

· Ensure AI solutions meet enterprise requirements for security, scalability, and maintainability

Delivery & Collaboration

· Collaborate with cross-functional partners across product, engineering, and design

· Operate effectively in environments with evolving requirements and ambiguity

· Communicate clearly with technical and non-technical stakeholders

· Take ownership of delivery outcomes from experimentation through production

 

Required Skills & Experience

· 8+ years of professional software engineering experience, with 4+ years focused on applied AI/ML or data-driven systems in production environments

· 3+ years building and operating production AI systems

· Strong hands-on experience with LLM applications, including RAG, prompt engineering, and evaluation

· Experience implementing hallucination detection and mitigation techniques

· Proficiency in Python

· Experience working with vector databases (Weaviate, Pinecone, or similar)

· Experience with LLM evaluation frameworks (Langfuse, Weights & Biases, or custom solutions)

· Production experience using Claude and/or GPT APIs

· Strong understanding of embeddings and semantic search

· Comfortable working with ambiguity and iterating on unclear problems

· Bachelor's degree in computer science, Engineering, Data Science, or a related technical field, or equivalent practical experience

· Advanced degree (Master’s or PhD) in a relevant field

 

Nice to Have

· Experience with Azure AI services, including Azure OpenAI and Cognitive Services

· Experience with document processing (PDF extraction, OCR)

· Exposure to audio or speech processing (e.g., Whisper or similar tools)

· Experience building enterprise B2B software

· Experience with ML classification and model training

 

Tech Stack

· LLMs: Claude (Anthropic), Azure OpenAI

· Vector Database: Weaviate

· Backend: Python

· Infrastructure: Azure

· Evaluation & Observability: Langfuse or similar

 

How You Work

· You are hands-on and delivery-focused, writing code and owning outcomes

· You balance speed with quality in production environments

· You communicate clearly and collaborate effectively across disciplines

· You take ownership of ambiguous problems and drive them to resolution

· You prioritize reliability, maintainability, and real-world impact

 

Why Robots & Pencils

· Real production impact not a POC that sits on a shelf

· Exposure to the full AI lifecycle: RAG, LLM applications, evaluation, classification, and monitoring

· End-to-end ownership of the AI stack and technical decision-making

· A small, senior team with direct access to enterprise clients

Sponsored

Explore Data

Skills in this job

People also search for