Evaluation

If you like applying Evaluation, every challenge here gives you a chance to practice it on a real industry brief.

Recommended Challenges

· Beginner only Clear

CodeBeginnerNew
Open-Domain QA over Product Documentation
You receive a snapshot of the documentation (Markdown) and 120 real support questions with the URLs of pages containing the answer. Build an open-domain QA pipeline: chunk the d…
- Open Domain Qa
- Passage Retrieval
- Reading Comprehension
Question Answering and Conversational Systems
CodeBeginnerNew
Build a Wake-Word Detector for a Smart-Speaker Startup
You receive a small public Japanese-speech dataset, 30 hours of recorded wake-phrase utterances from 50 volunteers, and 200 hours of background-noise recordings. Train a lightwe…
- Keyword Spotting
- Speech Recognition
- On Device Ml
Speech Recognition and Spoken Language Processing
CodeBeginnerNew
Image Search for a DTC Furniture Retailer's App
Use a pretrained vision-embedding model (CLIP ViT-B/32 or DINOv2-small). Index a catalog of around 1,500 furniture images. Curate a small evaluation set of around 50 user-style …
- Image Embeddings
- Vision Transformers
- Image Search
Computer Vision (Undergraduate)
ResearchBeginnerNew
Build an Accessibility Checklist for a Voice Health Assistant
You receive 20 audio samples spanning accents and speech patterns, the assistant's published dialog state machine, and a list of current voice prompts. Audit the assistant for i…
- Accessibility (Wcag 2.2)
- Interaction Design
- Evaluation
Human-Computer Interaction for AI Systems
Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
AnalysisBeginnerNew
Spectral Clustering for an Urban-Mobility Operator's Network
You receive 6 months of anonymized O-D trip data (around 4 million trips, around 8,000 virtual stations), the current 9 hand-drawn zones, and the operations team's KPIs (rebalan…
- Spectral Methods
- Spectral Clustering
- Graph Laplacian
Machine Learning on Graphs
ResearchBeginnerNew
Hyperparameter Search via CMA-ES for a Pharma QSAR Model
You receive a labeled QSAR dataset (around 25,000 compounds, regression on a binding-affinity target), a fixed feature pipeline (Morgan fingerprints + descriptors), and the team…
- Cma Es
- Metaheuristics
- Hyperparameter Optimization
Evolutionary Computation and Metaheuristic Search
CodeBeginnerNew
Prototype a Multimodal Visual-Question-Answering Demo
You will use a small open-source vision-language model (e.g., LLaVA-1.5-7B or PaliGemma) and prompt-engineer it for the warehouse-VQA task. Build a Gradio web demo. Construct a …
- Vision Language Models
- Multimodal Perception
- Prompt Patterns
Machine Perception
CodeBeginnerNew
Build a Video-Question-Answering Demo on a Budget
Pick the model (Video-LLaVA, VideoChat2, or LLaVA-Video) and justify on the A10G budget. Build a Streamlit demo: upload video, ask question, get answer with cited frame timestam…
- Video Language Models
- Multimodal Fusion
- Streamlit
Multimodal Machine Learning
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
Browse challenges
CodeBeginnerNew
Build a Multilingual Text-Mining Dashboard for Hotel Reviews
You receive 200,000 sampled reviews across 9 languages plus an English-only labeled benchmark of 1,000 reviews for sentiment and aspect (rooms, food, staff, value, location). Bu…
- Multilingual NLP
- Sentiment Analysis
- Aspect Extraction
Linguistic Engineering and Language Technologies
ResearchBeginnerNew
Curate a Domain Lexicon for a Climate-Tech NLP Stack
You receive 5,000 policy documents and a benchmark of 200 documents with manually tagged domain terms. Curate a lexicon of ~1,500 terms with (1) canonical English form, (2) Swah…
- Lexical Resources
- Named Entity Recognition
- Spacy
Linguistic Engineering and Language Technologies
CodeBeginnerNew
Plan Safe Paths for a Last-Mile Sidewalk Robot
You receive 4 hours of recorded sidewalk traversals with annotated pedestrian tracks, occupancy grids, and a map of the pilot neighborhood. Implement a sampling-based planner (R…
- Motion Planning
- Sampling Based Planning
- Cost Function Design
Robot Perception and Autonomy
CodeBeginnerNew
Build Semantic Search for an Internal Engineering Wiki
You receive a Confluence XML export (~12k pages, ~80 MB of text) and a hand-labeled benchmark of 50 internal queries with ground-truth doc IDs. Chunk and embed the corpus with a…
- Embedding Models
- Vector Database Basics
- Pgvector
Vector Databases and Embeddings
Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
StrategyBeginnerNew
Plan a Self-Improving Sales-Research Agent
Build the v0 agent: given a company URL, it gathers 5 fact bullets (recent news, headcount range, tech stack hints, hiring patterns, a recent leadership change) and drafts a 4-l…
- Ai Agents
- Agent Design
- A/B Testing & Experimentation
AI Agents and LLM-Based Agents
CodeBeginnerNew
Semantic Segmentation for a Solar-Panel Inspection Drone
Use a publicly-available solar-panel dataset (or the PV-Defect-Detection dataset). Fine-tune a small U-Net or SegFormer-tiny on panel/no-panel pixel-level segmentation. Evaluate…
- Semantic Segmentation
- Cnn Classification
- Transfer Learning
Computer Vision (Undergraduate)
AnalysisBeginnerNew
Cost-Optimize an Embedding Pipeline for a Customer Support Knowledge Base
You receive: (a) the current pipeline (full re-embed on any article change, OpenAI text-embedding-3-large, 3,072 dims) with one month of cost logs, (b) a sample of 5,000 article…
- Embedding Models
- Finops & Cost Optimization
- Change Detection
Vector Databases and Embeddings
CodeBeginnerNew
Segment Solar Panels in Aerial Imagery for an Energy Audit Startup
You receive 600 labelled 1024x1024 orthophoto tiles (panel masks) and 1,000 unlabeled tiles. Train a segmentation model (U-Net or DeepLabV3+ baseline), validate at 0.85 IoU on a…
- Semantic Segmentation
- U Net
- Aerial Imagery
Image Processing and Computational Imaging
ResearchBeginnerNew
Evaluate a Generative AI Image Tool with a Within-Subjects Study
You will write a study protocol, recruit 20 participants (a Discord callout is fine), counterbalance the two conditions, and run 45-minute sessions over Zoom. Collect three meas…
- Experimental Design
- User Study
- Within Subjects Design
Human-Computer Interaction for AI Systems
DesignBeginnerNew
Privacy-Preserving Crowd-Density Estimator for Transit Stations
Use a public crowd-counting dataset (e.g., ShanghaiTech or JHU-CROWD) to train a small crowd-density estimator (CSRNet or similar). Wrap it in an on-device pipeline (Python is f…
- Crowd Counting
- Scene Understanding
- Privacy By Design
Visual Intelligence and Visual Reasoning

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship

Evaluation Challenges | Ewance