Evaluation

If you like applying Evaluation, every challenge here gives you a chance to practice it on a real industry brief.

Recommended Challenges

· Senior only Clear

ResearchSeniorNew
Investigate Why Our Generative Model Memorizes Training Data
Pick a small open-source diffusion model (e.g., a Stable-Diffusion-class community model trained on LAION-subset). Reproduce a published membership-inference + extraction probe …
- Generative Models
- Memorization Analysis
- Differential Privacy
Advanced Deep Learning
ResearchSeniorNew
Embodied Visual Reasoning for a Warehouse Pick Assistant
Use an embodied simulator (Habitat 3.0 or Isaac Sim — pick one and justify) to render 300 cluttered-bin scenarios with a target item label. For each scenario, build two reasonin…
- Embodied Vision
- Vision Language Models
- Visual Reasoning
Visual Intelligence and Visual Reasoning
ResearchSeniorNew
Implement an Autoregressive Model for Anonymized Voice-Synthesis at a Defense Vendor
You receive a public-domain speech dataset (LibriTTS subset, around 50 speakers) and a fixed evaluation protocol (speaker-identifiability AUC, emotion-preservation MOS proxy, in…
- Autoregressive Models
- Voice Conversion
- Speech Synthesis
Deep Generative Models
ResearchSeniorNew
Inductive Logic Programming for a Fraud-Rule Discovery Pilot
You receive a labeled fraud dataset (around 25,000 transactions, around 4% positive class), a feature schema (28 features including device, geo, behavioral history), and a basel…
- Inductive Logic Programming
- Symbolic Ai
- Rule Learning
Fuzzy Logic, Knowledge Representation, and Symbolic Reasoning
Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
AnalysisSeniorNew
Cost-Quality Prompt Optimization at Scale
You receive 2,000 labeled code snippets (human rater consensus score 1-5) and budget for at most 8,000 API calls across the optimization run. Run a factorial sweep of 3 prompt s…
- Prompt Optimization
- Cost Quality Tradeoff
- Experimental Design
Prompt Engineering
CodeSeniorNew
Train a Manipulation Policy for Bin Picking with Imitation Learning
You receive a dataset of 500 teleop trajectories on the in-distribution part plus a held-out simulation environment with a never-seen part. Train an imitation-learning policy (D…
- Imitation Learning
- Manipulation
- Diffusion Policy
Advanced Robotics
ResearchSeniorNew
Multi-Tenant Vector Isolation for a B2B Knowledge Assistant
Build a small proof-of-concept in your chosen vector store (Pinecone or Qdrant — pick one and justify) that supports 10 simulated tenants with 1,000 vectors each. Implement the …
- Multi Tenant Isolation
- Vector Databases
- Threat Modeling
Vector Databases and Embeddings
CodeSeniorNew
Grounded Language for a Robotics Pick-and-Place Demo
Use a tabletop simulator (PyBullet or Isaac Sim, both open) with 5 object types and 5 spatial relations (left of, right of, behind, in front of, on top of). Curate or generate a…
- Grounded Language Understanding
- Semantic Parsing
- Perception
Computational Semantics
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
Browse challenges
ResearchSeniorNew
Open-Vocabulary Segmentation Benchmark for a Robotics R&D Lab
Use a curated 200-image household scene set (publicly-available HM3D renderings or COCO + a handful of household prompts). Benchmark 3 open-vocabulary segmentation models: SAM +…
- Open Vocabulary Segmentation
- Vision Language Models
- Benchmarking
Computer Vision
ResearchSeniorNew
Benchmark Conformal Prediction for a Healthcare Risk-Score
You receive a labeled dataset of about 25,000 patient encounters with the current risk-score's predictions and ground-truth 1-year outcomes. Implement and compare split conforma…
- Conformal Prediction
- Uncertainty Quantification
- Calibration
Statistical Machine Learning
CodeSeniorNew
PPO Alignment Loop with a Pretrained Reward Model
You receive a small open-weights base model (around 7B), a previously trained reward model, and 5,000 prompts (no responses) for PPO rollouts. Run PPO with TRL's PPOTrainer for …
- Rlhf
- Ppo
- Reward Hacking
Machine Learning from Human Preferences (RLHF and Alignment)
CodeSeniorNew
Train a GAN for Synthetic Defect Augmentation on a Factory Line
You receive a labeled defect dataset (12 defect types, ranging from 8 to 4,200 examples each), the production classifier, and a starter StyleGAN2-ADA codebase. Train a GAN per r…
- Gans
- Stylegan
- Data Augmentation
Generative AI
Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
ResearchSeniorNew
Trajectory Prediction Model for Urban Robotaxis
Use the Argoverse 2 motion-forecasting dataset (open access). Train an LSTM baseline + a transformer challenger (e.g., a small Wayformer or HiVT). Evaluate on minADE/minFDE (min…
- Trajectory Prediction
- Transformer Models
- Evaluation
AI for Autonomous Vehicles

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship

Evaluation Challenges | Ewance