Model Evaluation

If you like applying Model Evaluation, every challenge here gives you a chance to practice it on a real industry brief.

Recommended Challenges

· Beginner only Clear

CodeBeginnerNew
Build a Fairness Evaluation Harness for a Credit-Score Model
Implement a Python module that, given model predictions, ground truth, and group identifiers, computes demographic parity difference, equal-opportunity difference, predictive-pa…
- Algorithmic Fairness
- Statistical Evaluation
- Python Or Javascript
AI Measurement and Evaluation
CodeBeginnerNew
Build a Credit-Card Fraud Detector for a Singapore Neobank
You receive 9 months of anonymized authorization data (around 8 million transactions, around 0.4 percent fraud) plus current rule outcomes. Split temporally and train at least t…
- Classification Modeling
- Class Imbalance
- Model Calibration
AI and Quantitative Finance
AnalysisBeginnerNew
Audit a Hiring-Screening Model for Demographic Bias
You receive: (a) inference API access to the production model (black-box), (b) a 12,000-resume audit benchmark with self-declared gender and age-band labels (consented, GDPR-com…
- Fairness Metrics
- Bias Auditing
- Model Evaluation
AI Ethics, Fairness, and Responsible AI
AnalysisBeginnerNew
Analyze a Learning-Analytics Dataset for At-Risk Detection
You receive an anonymized dataset of LMS engagement features (logins, assignment submissions, forum posts, video-watch time), grade history, and a binary label for end-of-semest…
- Learning Analytics
- Classification
- Fairness Metrics
AI in Education and Learning Analytics
Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
CodeBeginnerNew
Ship a Lightweight ML Microservice for an EdTech Reading App
You receive 3 months of session telemetry (around 50M reading events, child-anonymized). Engineer features per session window, train a small classifier (logistic regression base…
- Feature Engineering
- Model Serving
- Containerization
Applied Machine Learning
CodeBeginnerNew
Image-Quality Triage Tool for a Tele-Radiology Network
You receive 10,000 chest-X-ray images with multi-label quality flags (rotation, clipping, motion). Train a small multi-label CNN that outputs a per-flag probability and a single…
- Medical Imaging
- Classification
- Neural Networks
Machine Learning for Imaging and Medical Image Analysis
CodeBeginnerNew
Churn-Prediction Model for a B2B Vertical SaaS
Use 18 months of anonymized data (provided) covering: usage events, login frequency, support tickets, NPS responses, billing health, feature adoption, practice firmographics. De…
- Supervised Learning
- Python Or Javascript
- Ml Applications
Machine Learning (CS Elective)
CodeBeginnerNew
Tune a Recommender for an EU Streaming Music App
Use the public Last.fm-360k or similar dataset (anonymized listening histories) as a stand-in. Implement a baseline matrix-factorization recommender, then a hybrid that adds tra…
- Recommender Systems
- Feature Engineering
- Model Evaluation
Applied Machine Learning
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
Browse challenges
CodeBeginnerNew
Build a Robust Image Classifier for a Climate-Tech Satellite Startup
You receive a labeled dataset of about 25,000 Sentinel-2 patches (positive = illegal construction visible, negative = not). The dataset is split by region AND by season so you c…
- Data Augmentation
- Deep Learning
- Pytorch Or Tensorflow
Advanced Deep Learning
AnalysisBeginnerNew
Evaluate Speech-to-Text Quality for a Contact-Center Analytics Vendor
You receive 200 anonymized call-recording snippets (2-4 minutes each, ~67 per language) with reference transcripts plus a domain glossary of about 600 product terms. Run all thr…
- Speech Recognition
- Sequence Models
- Model Evaluation
Machine Perception
AnalysisBeginnerNew
Customer-Segmentation Study for a DTC Subscription Box
Use 18 months of anonymized data: order history, churn events, NPS responses, box-rating data, referral activity, marketing-channel attribution. Engineer features (RFM-style + b…
- Unsupervised Learning
- Python Or Javascript
- Ml Applications
Machine Learning (CS Elective)
CodeBeginnerNew
Build a Face-Anonymization Tool for a Civic-Tech Newsroom
Use a pretrained face detector (RetinaFace or YOLOv8-face is fine). Build a Python tool with a Gradio or Streamlit UI that: (1) detects faces in an uploaded photo, (2) shows det…
- Object Detection
- Image Processing
- Opencv
Computer Vision (Undergraduate)
Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
AnalysisBeginnerNew
Stress-Test a Hiring-Funnel Model for Bias
You receive a synthetic-but-realistic dataset of 25,000 past applicants with features (years of experience, education tier, prior role tags) and outcome labels (advanced past th…
- Model Evaluation
- Fairness Metrics
- Logistic Regression
Machine Learning (Undergraduate)
AnalysisBeginnerNew
Cost-Model a Foundation-Model API Migration
You receive: 90 days of API logs (request volume, token distributions), the customer's golden eval set of 200 prompts, the incumbent and new pricing schedules, and quality ratin…
- Cost Modeling
- Ai Workforce Strategy
- Model Evaluation
AI for Business and AI Product Management
CodeBeginnerNew
Reduce Dimensionality on Sensor Streams for a Mid-Cap Robotics OEM
You receive 120 robot-hours of windowed sensor data (5s windows, 240 channels) with labels for normal vs. one of four fault classes. Implement (1) PCA, (2) kernel PCA with an RB…
- Dimensionality Reduction
- Kernel Methods
- Autoencoders
Machine Learning
CodeBeginnerNew
Ship a Churn-Prediction Mini-Project End to End
You receive a 12-month anonymized dataset of subscriber events (logins, lesson completions, payment history, support tickets) for around 200,000 users. Define churn precisely (n…
- Feature Engineering
- Model Evaluation
- Gradient Boosting
AI/ML Practicum and Hands-on Lab
CodeBeginnerNew
Predict Subscription Churn for an EdTech Platform
You receive a CSV with about 18,000 student-month rows: features include login frequency, session length, quiz scores, parent app opens, and plan tier. The target is whether the…
- Supervised Learning
- Logistic Regression
- Gradient Boosting
Machine Learning (Undergraduate)
CodeBeginnerNew
Team Practicum: Build a Crop-Disease Classifier with a Field Partner
You receive a labeled dataset of about 8,000 phone photos plus around 1,200 unlabeled photos from a held-out county. Audit and clean the labels (expect 5-10% noise), train a Mob…
- Transfer Learning
- Pytorch Or Tensorflow
- Model Evaluation
AI/ML Practicum and Hands-on Lab
CodeBeginnerNew
Image-Classification Model for a Quality-Control Line at a Bottling Plant
Train an image classifier on 8,000 labeled bottle images (3 defect classes + 'ok'). Use transfer learning from a pre-trained backbone (EfficientNet-B0 or MobileNetV3) — the line…
- Deep Learning
- Supervised Learning
- Ml Applications
Machine Learning (CS Elective)
CodeBeginnerNew
Markov Random Field for Image Segmentation in Crop Monitoring
You receive 60 Sentinel-2 image tiles (10-meter resolution) over 12 vineyards, each tile with per-pixel disease labels from agronomist field walks. Take the consultancy's existi…
- Markov Random Fields
- Graph Cuts
- Image Segmentation
Probabilistic Graphical Models
AnalysisBeginnerNew
Optimize Hyperparameters with Bayesian Optimization on a Tight Budget
You receive a B2B-SaaS churn dataset (about 12,000 customer-month rows, 38 features) and a fixed sweep budget of 40 trials per model family. Implement a Bayesian optimizer (Optu…
- Bayesian Optimization
- Hyperparameter Tuning
- Ensemble Methods
Advanced Machine Learning
AnalysisBeginnerNew
Detect Fraudulent Refund Requests for a Mid-Market Marketplace
You receive a labeled dataset with buyer history, seller history, shipping carrier, refund reason text, and outcome label (legit / fraud). Train and evaluate at least two classi…
- Classification
- Model Calibration
- Imbalanced Classification
Machine Learning (Undergraduate)
CodeBeginnerNew
Train a Word-Alignment Model for Low-Resource Catalan-Aranese
You receive a 35,000-sentence Catalan-Aranese parallel corpus plus a 1,200-pair manually annotated word-alignment test set. Train (1) a classic statistical alignment baseline (e…
- Alignment
- Neural Mt
- Low Resource Mt
Machine Translation
AnalysisBeginnerNew
Customer Churn Prediction for 40-Person SaaS Scale-Up
You receive a dataset with 500 customers and 10 features (e.g., monthly logins, number of support tickets, contract length, industry). Your task is to perform exploratory analys…
- Logistic Regression
- Classification
- Feature Engineering
Econometrics

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship