Ship a Streaming RAG Endpoint with Caching and Fallbacks

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You will build a FastAPI service exposing one POST /chat endpoint that (1) streams tokens via Server-Sent Events, (2) caches identical (system_prompt, query, retrieved_context) tuples in Redis with a 24-hour TTL, (3) falls back from primary LLM to secondary on rate-limit or 5xx, then to a small open-weight model running locally if both fail, (4) attributes per-request token cost and writes it to a Postgres table. Provide pytest integration tests covering happy-path, cache-hit, primary-down, and both-providers-down scenarios. Deliver code, tests, and a 3-page runbook covering alerts and incident playbook.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Ship a streaming, cached, fallback-capable RAG endpoint with per-request cost tracking and a clear runbook.

Earning criteria — what you'll demonstrate

Implement Server-Sent Event streaming for LLM responses
Design a multi-provider fallback chain
Cache LLM responses safely with the right key composition
Track and attribute per-request LLM cost

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

LLM Application Development

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Engineer
AI Engineering

AI Engineer

Shipping a production LLM endpoint with streaming, caching, and fallback is the day-one work of AI engineers at any AI-product startup.

This challenge sharpens

llm-api-integration
streaming
fallback-design

MLOps Engineer

Owning the cost-tracking schema and the runbook bridges directly into MLOps work on inference platforms.

This challenge sharpens

cost-tracking
fallback-design
response-caching

Machine Learning Engineer

The fallback chain plus the local open-weight model brings together application engineering and ML deployment in one project.

This challenge sharpens

fallback-design
llm-api-integration
fastapi

One more thing

You can put a credential on your CV by Friday.

Start this challenge