Verified credentials. On-chain. Forever.Your Ewance certificates last forever — on-chain, verifiable by anyone.Learn more

Experiment Design

If you like applying Experiment Design, every challenge here gives you a chance to practice it on a real industry brief.

Recommended Challenges

· Expert only Clear

ResearchExpertNew
Train Cooperative Agents with Multi-Agent RL
Pick an open multi-agent environment (PettingZoo's MPE 'simple_spread', Overcooked-AI, or SMAC). Implement or wrap three methods: IPPO (independent PPO per agent), MAPPO (centra…
- Multi Agent Reinforcement Learning
- Ppo
- Pytorch
Multi-Agent Systems
ResearchExpertNew
Long-Context QA Evaluation Benchmark for Legal Memoranda
You receive 25 anonymized legal memoranda (50-90 pages each) and 100 QA pairs whose answers are deliberately spread across the documents (25 in pages 1-20, 25 in pages 20-40, 25…
- Long Context Qa
- Benchmark Design
- Model Evaluation
Question Answering and Conversational Systems
ResearchExpertNew
Open-Vocabulary Segmentation Benchmark for a Robotics R&D Lab
Use a curated 200-image household scene set (publicly-available HM3D renderings or COCO + a handful of household prompts). Benchmark 3 open-vocabulary segmentation models: SAM +…
- Open Vocabulary Segmentation
- Vision Language Models
- Benchmarking
Computer Vision
ResearchExpertNew
Model-Based RL for a Robotic Arm Pick-Place Task
You receive a PyBullet pick-and-place environment (Franka Panda arm, 12 object types, randomized starting poses) and a SAC baseline that hits 85% success after about 1.5 million…
- Model Based Rl
- World Models
- Reinforcement Learning
Deep Reinforcement Learning
Practice your coursework on real scenarios.
Every challenge is shaped from real industry context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
ResearchExpertNew
Quantify Sim-to-Real Gap for a Warehouse Manipulation Policy
You receive a trained pick-and-place policy (PyTorch), the simulation env (Isaac Lab), and access to a real-arm rig (or recorded teleop episodes if hardware is unavailable). Def…
- Sim To Real
- Manipulation
- Experiment Design
Robot Perception and Autonomy
ResearchExpertNew
Benchmark Reward-from-Feedback Methods on a Tabletop Pick-Place
You will use a Franka Panda arm in PyBullet on a 4-object pick-and-place task. For each of the three feedback methods, train a reward model and a downstream policy until converg…
- Reinforcement Learning
- Reward Learning
- Preference Comparison
Human-Robot Interaction
ResearchExpertNew
Pre-Register and Run a Small Neural-Network Ablation Study
You will study how three architectural and regularization choices (depth: 2/4/8 hidden layers; activation: ReLU vs. GELU; weight decay: 0 / 1e-4 / 1e-3) affect a small MLP's tes…
- Neural Networks
- Regularization
- Experiment Design
Machine Learning
ResearchExpertNew
Train a Small Diffusion Model for Synthetic Defect Generation
You receive 2,000 labeled defect images and 18,000 clean weld images. Train a small class-conditional latent diffusion model on the defect images (Hugging Face diffusers is fine…
- Generative Perception
- Diffusion Models
- Data Augmentation
Machine Perception
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
Browse challenges
AnalysisExpertNew
Cost-Quality Prompt Optimization at Scale
You receive 2,000 labeled code snippets (human rater consensus score 1-5) and budget for at most 8,000 API calls across the optimization run. Run a factorial sweep of 3 prompt s…
- Prompt Optimization
- Cost Quality Tradeoff
- Experiment Design
Prompt Engineering
ResearchExpertNew
Stress-Test Scalable Oversight on a Tool-Using Agent
Design a sandwich-oversight study: pick a task domain where non-expert oversight is plausible but not trivial (e.g., reviewing data-analysis steps, checking small bug fixes, eva…
- Scalable Oversight
- Alignment Research
- Experiment Design
AI Safety and Alignment
ResearchExpertNew
Reproduce a Mechanistic Interpretability Result on a Small Transformer
Pick a published mechanistic-interpretability paper that operates on a small (under 1 billion parameter) open-source transformer (e.g., GPT-2 small, Pythia 70M). Set up the envi…
- Mechanistic Interpretability
- Transformer Internals
- Pytorch
AI Safety and Alignment
CodeExpertNew
Auto-Tune a Distributed Training Cluster's Throughput
Pick a representative fine-tune job (an open 7B model on a public instruction dataset is fine). Define the search space: NCCL_ALGO, NCCL_PROTO, num_workers, prefetch_factor, gra…
- Distributed Training
- Hyperparameter Tuning
- Nccl
Machine Learning Systems
Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
ResearchExpertNew
Plan a Parameter-Efficient Fine-Tuning Strategy for a Big-Tech AI Lab
You will produce (1) a 6-page survey of four PEFT methods (LoRA, adapters, prefix tuning, IA3) with their strengths, weaknesses, and parameter footprints, (2) a one-page decisio…
- Parameter Efficient Fine Tuning
- Transfer Learning
- Fine Tuning
Meta-Learning, Transfer Learning, and Multi-Task Learning
ResearchExpertNew
Compare RNN vs Transformer for Long-Sequence Modeling
Pick a public trajectory dataset (e.g., Argoverse 2, Waymo Open, or ETH-UCY). Implement three models with comparable parameter counts (around 5M each): an LSTM baseline, a vanil…
- Transformers
- Rnn
- State Space Models
Neural Networks for NLP
ResearchExpertNew
Pretrain a Small Vision Transformer with Self-Supervised Learning
You receive 80,000 unlabeled 224x224 histology tiles plus 4,000 labeled tiles split into train/val/test. Pretrain a ViT-Small using a self-supervised method of your choice (DINO…
- Self Supervised Learning
- Vision Transformers
- Pytorch
Advanced Deep Learning
ResearchExpertNew
Benchmark Long-Context Architectures on a Legal-Doc Retrieval Task
You receive a public legal-QA dataset (e.g., LongBench's legal split or similar) filtered to documents over 50,000 tokens. Implement or wrap 3 architectures: a sliding-window Tr…
- Long Context Architectures
- State Space Models
- Transformers
Advanced Deep Learning
ResearchExpertNew
Graph Transformer Research Probe for a Drug-Target Predictor
You receive a public drug-target interaction dataset (around 50,000 drug-target pairs with labels and molecular graphs), a strong GIN baseline, and a starter GraphGPS implementa…
- Graph Transformers
- Graph Neural Networks
- Message Passing
Machine Learning on Graphs
ResearchExpertNew
Trajectory Prediction Model for Urban Robotaxis
Use the Argoverse 2 motion-forecasting dataset (open access). Train an LSTM baseline + a transformer challenger (e.g., a small Wayformer or HiVT). Evaluate on minADE/minFDE (min…
- Trajectory Prediction
- Transformer Models
- Evaluation
AI for Autonomous Vehicles
ResearchExpertNew
SAT-Based Planner for Smart-Grid Demand Response
Encode the dispatch problem (which customers to curtail by how much, respecting per-customer contractual caps and grid-cell totals) as a SAT or MaxSAT instance. Solve 50 histori…
- Sat Based Planning
- Constraint Encoding
- Benchmarking
Automated Planning

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship

Experiment Design Challenges | Ewance