Overview
What this challenge is about.
You will build a FastAPI service exposing one POST /chat endpoint that (1) streams tokens via Server-Sent Events, (2) caches identical (system_prompt, query, retrieved_context) tuples in Redis with a 24-hour TTL, (3) falls back from primary LLM to secondary on rate-limit or 5xx, then to a small open-weight model running locally if both fail, (4) attributes per-request token cost and writes it to a Postgres table. Provide pytest integration tests covering happy-path, cache-hit, primary-down, and both-providers-down scenarios. Deliver code, tests, and a 3-page runbook covering alerts and incident playbook.
The Brief
What you'll do, and what you'll demonstrate.
Ship a streaming, cached, fallback-capable RAG endpoint with per-request cost tracking and a clear runbook.
Earning criteria — what you'll demonstrate
- Implement Server-Sent Event streaming for LLM responses
- Design a multi-provider fallback chain
- Cache LLM responses safely with the right key composition
- Track and attribute per-request LLM cost
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Shipping a production LLM endpoint with streaming, caching, and fallback is the day-one work of AI engineers at any AI-product startup.
This challenge sharpens
- llm-api-integration
- streaming
- fallback-design
MLOps Engineer
Owning the cost-tracking schema and the runbook bridges directly into MLOps work on inference platforms.
This challenge sharpens
- cost-tracking
- fallback-design
- response-caching
Machine Learning Engineer
The fallback chain plus the local open-weight model brings together application engineering and ML deployment in one project.
This challenge sharpens
- fallback-design
- llm-api-integration
- fastapi