Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Ship a Streaming RAG Endpoint with Caching and Fallbacks
Code

Ship a Streaming RAG Endpoint with Caching and Fallbacks

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You will build a FastAPI service exposing one POST /chat endpoint that (1) streams tokens via Server-Sent Events, (2) caches identical (system_prompt, query, retrieved_context) tuples in Redis with a 24-hour TTL, (3) falls back from primary LLM to secondary on rate-limit or 5xx, then to a small open-weight model running locally if both fail, (4) attributes per-request token cost and writes it to a Postgres table. Provide pytest integration tests covering happy-path, cache-hit, primary-down, and both-providers-down scenarios. Deliver code, tests, and a 3-page runbook covering alerts and incident playbook.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Ship a streaming, cached, fallback-capable RAG endpoint with per-request cost tracking and a clear runbook.

Earning criteria — what you'll demonstrate

  • Implement Server-Sent Event streaming for LLM responses
  • Design a multi-provider fallback chain
  • Cache LLM responses safely with the right key composition
  • Track and attribute per-request LLM cost

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Engineer

Shipping a production LLM endpoint with streaming, caching, and fallback is the day-one work of AI engineers at any AI-product startup.

This challenge sharpens

  • llm-api-integration
  • streaming
  • fallback-design

MLOps Engineer

Owning the cost-tracking schema and the runbook bridges directly into MLOps work on inference platforms.

This challenge sharpens

  • cost-tracking
  • fallback-design
  • response-caching

Machine Learning Engineer

The fallback chain plus the local open-weight model brings together application engineering and ML deployment in one project.

This challenge sharpens

  • fallback-design
  • llm-api-integration
  • fastapi

One more thing

You can put a credential on your CV by Friday.