What is the salary for this AI Evaluation Engineer role?

Salary information is not publicly listed for this position. Apply directly to discuss compensation with Weekday AI.

Where is this AI Evaluation Engineer position located?

This is an on-site position at Weekday AI located in Pune, Maharashtra, India, Asia.

How do I apply for this AI Evaluation Engineer job at Weekday AI?

Click the 'Apply' button on this page to be redirected to Weekday AI's application portal. Make sure to have your resume ready and tailor your application to highlight relevant experience.

Weekday AI is actively hiring for Engineering roles. Visit the company page to see all open positions and learn more about working at Weekday AI.

AI Evaluation Engineer

Weekday AI Pune, Maharashtra, India 1 day ago

engineering

This role is for one of the Weekday's clients

We are seeking an AI Evaluation Engineer to evaluate, validate, and ensure the quality of AI/ML systems working with complex, real-world data. This role focuses on assessing component mapping, retrieval-augmented generation (RAG) based Q&A systems, and feature extraction from structured and unstructured sources such as repair records, catalogs, free-text inputs, and technical documentation.

This is a hands-on engineering role centered on designing custom evaluation frameworks, datasets, and automated pipelines (including LLM-as-a-judge approaches) to measure quality, detect regressions, and support release readiness. While domain training will be provided, strong ownership in building evaluation intuition and maintaining high-quality test datasets is essential.

Requirements

Key Responsibilities

AI Evaluation & Quality Assurance

Evaluate ML and LLM outputs using defined metrics, benchmarks, and acceptance criteria.
Design and maintain automated evaluation pipelines to assess model accuracy, consistency, and reliability.
Develop and own high-quality evaluation datasets, golden test cases, and benchmarks.

Testing & Release Validation

Execute evaluation-driven smoke tests and regression tests prior to releases.
Track quality metrics and provide clear go/no-go signals for production deployments.
Detect regressions and unexpected model behavior across releases and data changes.

Analysis & Insights

Analyze evaluation results to identify trends, inconsistencies, and failure patterns.
Provide actionable insights to improve model performance and system behavior.

System & API Validation

Validate AI services at the API level for correctness, robustness, and stability.
Monitor system performance, latency, and error rates under production-like workloads.

Cross-Functional Collaboration

Work closely with ML, backend, and product teams to define expected AI behavior.
Ensure evaluation coverage aligns with real-world use cases and business requirements.

Skills & Experience

Core Skills

Strong proficiency in Python for evaluation scripting and automation.
Solid understanding of Machine Learning and AI systems, including LLM-based workflows.
Experience with data analysis to interpret evaluation metrics and model outputs.

Nice to Have

Experience with LLM evaluation frameworks or LLM-as-a-judge techniques.
Familiarity with RAG pipelines, NLP systems, or large-scale data processing.
Experience building CI/CD-style evaluation or testing pipelines for AI systems.

Skills

Python · Machine Learning · Artificial Intelligence · Data Analytics

Similar Jobs

Applied AI, Evaluation Engineer

Mistral

Engineering

Evaluation Engineer

Elicit

Engineering

AI/LLM Evaluation & Alignment Software Engineer

LeoTech

Engineering

Senior Software Engineer, AI Eval

Sentry

Engineering

Evaluations - Platform Engineer

Antimetal

Engineering

More jobs at Weekday AI

AWS & Azure Cloud Engineer

Weekday AI

Engineering

HR Recruiter / Talent Acquisition Specialist

Weekday AI

Inside Sales Specialist

Weekday AI

Sales

Technical Support Engineer

Weekday AI

Support

Cloud Support Engineer

Weekday AI

Engineering

Report this job

Apply for this position

About Weekday AI

At Weekday (backed by YC; also Product Hunt #1 product of the day), we are building the next frontier in hiring. We have built the largest database of white collar talent in India and have built outreach tools on top of it to generate highest response ...

Category: Engineering

View all jobs at Weekday AI Visit website