Design a Continuous Eval Pipeline for an Enterprise RAG Product

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

Design (and partially build) a continuous-eval pipeline for a RAG system: (1) a structured eval set with at least 50 queries grouped by query class; (2) automated scoring (LLM-as-judge plus a smaller exact-match component) for answer accuracy, citation correctness, and hallucination rate; (3) a dashboard view (Streamlit or similar) showing scores over the last N deploys; (4) an alerting threshold definition for when to block a deploy. Build a working slice on around 200 public legal-policy documents (e.g., EU regulations from EUR-Lex). Produce a 3-page customer-facing commitment document plus an internal engineering proposal.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Design and build a working slice of a continuous-eval pipeline for an enterprise RAG product, plus a customer-facing commitment document.

Earning criteria — what you'll demonstrate

Design an eval set with realistic query-class coverage for RAG
Combine LLM-as-judge with deterministic checks for honest scoring
Build a continuous-eval pipeline architecture
Translate eval commitments into customer-facing prose

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

AI Measurement and Evaluation

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Design

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Engineer

Building the working slice end-to-end is the AI engineer's bread and butter at any RAG-shipping team.

This challenge sharpens

retrieval-augmented-generation
python
llm-evaluation

Prompt Engineer

LLM-as-judge prompt design with validation is exactly the prompt engineer's contribution to a serious eval pipeline.

This challenge sharpens

llm-evaluation
continuous-evaluation
stakeholder-communication

One more thing

You can put a credential on your CV by Friday.

Start this challenge