AI & Data
Generative AI & LLMs Challenges
Generative AI & LLMs challenges put you inside the work of building with large language models. You'll develop skills in prompt patterns, few-shot prompting, chain-of-thought, and LLM API integration, learning how these models behave before you scale them.
From there you'll handle the harder edges — RAG architectures, vector database basics, fine-tuning, and prompt versioning — putting LLM guardrails and LLM evaluation around every deployment the way AI teams actually do. Each challenge you solve earns a verified credential you can share with recruiters.
- CodeIntermediateNew
Build a BM25 + Embeddings Hybrid Search for a Legal-Tech Document Portal
Stand up an OpenSearch cluster with BM25 indexing on the 2.4M-document corpus. Generate dense embeddings (you choose the model; justify cost and quality trade-offs) and index th…
- Information Retrieval
- Bm25
- Vector Database Basics
Data Mining and Information Retrieval - AnalysisBeginnerNew
Run an A/B Test on Two System Prompts for a Sales Email Assistant
You will (1) design the A/B test (random assignment by rep_id, 50/50 split, 2-week duration), (2) instrument three primary metrics: reply rate (event-based), average tokens per …
- Prompt Evaluation
- A/B Testing
- Metric Design
LLM Application Development - AnalysisIntermediateNew
Cut Latency and Cost on a High-Volume Summarization Service
You receive 30 days of anonymized request logs (prompt token counts, completion token counts, latencies, models used). Profile the cost and latency distribution, then design and…
- Finops & Cost Optimization
- Latency Optimization
- Prompt Compression
LLM Application Development - CodeIntermediateNew
Prototype Constitutional-AI Style Guardrails for an Internal Chatbot
Author a 'constitution' of 15 to 20 principles tailored to internal research use (no IP leakage, no off-label medical claims, no personnel-data fishing, etc.). Implement a criti…
- Constitutional Ai
- Alignment Techniques
- LLM Evaluation
AI Safety and Alignment Practice your coursework on real scenarios.
Every challenge is shaped from real industry context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
- CodeIntermediateNew
Wire a Knowledge Graph into a Pharma RAG Assistant
You receive: 100 internal benchmark questions with reference answers; a 50,000-document anonymized RAG index; a curated drug-target-disease KG (~80,000 triples) loaded into a tr…
- Kg Grounded RAG
- Sparql
- Entity Linking
Knowledge Graphs and Semantic Web - CodeIntermediateNew
LoRA Fine-Tune a 7B LLM for Legal-Clause Extraction
You receive a curated extraction dataset (2,000 train, 500 val, 500 test contracts with span-level labels across 12 clause types) and a fine-tunable 7B base model (e.g., Llama-3…
- Fine Tuning
- Fine Tuning
- Parameter Efficient Tuning
Fine-Tuning Large Language Models - CodeIntermediateNew
Build an Internal-Tools Agent for a Mid-Cap Enterprise
You receive OpenAPI specs for 4 mock internal APIs and 30 reference question-answer pairs spanning easy lookups and multi-tool chains. Build the agent using an LLM tool-use fram…
- Ai Agents
- Tool Use
- Agent Evaluation
AI Agents and LLM-Based Agents - DesignIntermediateNew
Design and Pitch an LLM-Powered Tutoring Product
As a 4-person team, deliver: (1) a product concept anchored in Jobs-to-be-Done (when X, I want Y so I can Z); (2) a Figma prototype of the full flow; (3) a partially functional …
- Product Design
- User Research
- LLM Evaluation
AI Software Engineering Group Project - Browse challenges
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
- DesignIntermediateNew
Design a Continuous Eval Pipeline for an Enterprise RAG Product
Design (and partially build) a continuous-eval pipeline for a RAG system: (1) a structured eval set with at least 50 queries grouped by query class; (2) automated scoring (LLM-a…
- Continuous Evaluation
- LLM Evaluation
- RAG Architectures
AI Measurement and Evaluation - DesignIntermediateNew
Instrument a Model Monitoring Stack from Scratch
Pick the priority product (recommend the customer-service RAG assistant, around 40k queries/day). Define monitoring signals: input drift (Evidently/NannyML), output quality (LLM…
- Model Monitoring
- Data Drift Detection
- LLM Evaluation
ML Engineering and Production ML - CodeIntermediateNew
Instruction-Tune a Small Model for an Edtech Tutor
You receive a 1.5B base model (e.g., SmolLM-1.7B or Qwen-1.8B), permission to use 2 hours of a rented A100, and a curated seed of around 5,000 math-tutoring dialogues. Augment w…
- Instruction Tuning
- Fine Tuning
- Dataset Curation
Fine-Tuning Large Language Models - ResearchIntermediateNew
Run an Alignment Probe on a Coding Assistant
You will design 240 probe prompts across 3 classes: (1) over-refusal (innocuous coding asks the model should fulfill), (2) insecure code patterns (asks where the model should wa…
- Red Team Operations
- Alignment Evaluation
- LLM Evaluation
Large Language Models Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
- CodeIntermediateNew
AI-Driven Sales Lead Scoring for a B2B SaaS Scale-Up
You will receive a sample dataset of 200 leads with fields like company size, industry, email open rates, and website visits. Using AI tools, you must craft prompts to generate …
- Prompt Patterns
- Lead Scoring
- Data Analysis
Data-Driven Prototyping with AI - CodeBeginnerNew
Build a Math Intelligent-Tutoring Assistant for High Schoolers
You receive: a curated set of 40 algebra problems with worked solutions, the company's pedagogy rubric ('hint, don't reveal' principle), and a baseline 'just answer' chatbot for…
- Intelligent Tutoring
- Prompt Patterns
- Ai Agents
AI in Education and Learning Analytics - CodeIntermediateNew
Build a Vector-Search Backend for an Enterprise AI Knowledge Assistant
You receive a corpus of around 20,000 PDFs (mixed scanned and digital) totalling around 30 GB and a labeled retrieval set of 200 queries with human-judged ground-truth passages.…
- RAG Architectures
- Vector Database Basics
- Word Embeddings
Data Engineering and Big Data Systems - CodeIntermediateNew
Ship an MVP RAG Knowledge Assistant for a Climate-Tech Startup
As a 4-person team across a 6-week sprint, ship: (1) an ingestion pipeline for around 4,000 mixed PDFs and markdown files; (2) a vector store with documented chunking strategy; …
- RAG Architectures
- Software Engineering For Ai
- Vector Databases
AI Software Engineering Group Project - DesignBeginnerNew
Chain-of-Thought for High-School Math Tutoring
You receive 80 practice problems across 4 topics (linear equations, factoring, systems of equations, quadratics), each with the correct answer and an expected age-appropriate ex…
- Chain Of Thought
- Zero Shot Prompting
- Few Shot Prompting
Prompt Engineering - CodeIntermediateNew
Fine-Tune a Diffusion Model for an E-commerce Product Studio
You receive 1,200 curated product + lifestyle images across 6 product categories, a brand-style guide, and the company's current studio cost per image (around EUR 18). Fine-tune…
- Diffusion Models
- Stable Diffusion
- Dreambooth
Generative AI - ResearchIntermediateNew
Design a Capability Evaluation for an Open-Weights Coding Model
Pick a recent open-weights coding model (e.g., a Qwen, DeepSeek, or Llama variant). Design an evaluation set of around 40 coding tasks across 4 buckets: standard benign coding, …
- Capability Evaluation
- Safety Evaluation
- LLM Evaluation
AI Safety and Alignment - CodeIntermediateNew
Build a Domain Instruction-Tuning Recipe for a Legal Coach
You will source instruction data from three streams: ~3,000 synthetic paralegal Q&A generated by a frontier model (anonymized prompts), ~1,500 curated examples from public legal…
- Instruction Tuning
- Fine Tuning
- Data Curation
Large Language Models - DesignSeniorNew
Design Eval Suite for a Multimodal Brainstorming Assistant
You receive (1) the assistant's current API, (2) a list of 6 launch user-personas, and (3) the product team's quality target ('beat the previous model on 4 of 6 personas'). Desi…
- LLM Evaluation
- Multimodal Evaluation
- Safety Evaluation
Generative AI - CodeIntermediateNew
Fine-Tune a Transformer for Customer-Support Triage at an Enterprise AI Vendor
You receive 240,000 labeled support tickets across 14 queues, with English, Bahasa Indonesia, and Tagalog. Fine-tune a multilingual transformer encoder (XLM-RoBERTa-base is a st…
- Hugging Face Transformers
- Fine Tuning
- Multilingual NLP
Deep Learning - ResearchIntermediateNew
Audit a Public LLM Benchmark for Validity Threats
Choose one open LLM benchmark (e.g., MMLU, GPQA, BIG-Bench-Hard, MATH). Read the benchmark paper plus at least three follow-up critiques. Audit (1) data contamination risk again…
- Benchmark Evaluation
- Data Contamination Analysis
- Annotation Methodology
AI Measurement and Evaluation - CodeIntermediateNew
Extract Skills and Roles from Job Postings for a Recruiter Tool
You receive 30,000 anonymized job postings and a labelled 1,000-posting benchmark with (skill, role, seniority) spans. Fine-tune a small token classifier (e.g., DeBERTa-v3-base)…
- Information Extraction
- Token Classification
- Esco Taxonomy
Linguistic Engineering and Language Technologies
How it works
From brief to credential, in six steps.
Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.
Industry teams behind a decade of practitioner briefs
Hiring from this pool?
Sponsor a challenge and meet candidates through actual work.
Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.



















































































