Computer & Information Sciences

Data Science Challenges

Real data-science projects and challenges on Ewance — clean messy datasets, build and evaluate models, and turn raw data into decisions the way a working data scientist does. Solve them to build a portfolio of verified, recruiter-checkable proof you can do the work — not just describe it.

Recommended challenges

CodeIntermediateNew
Restore Smartphone Low-Light Photos for a Consumer AI App
You receive 200 paired low-light / well-lit phone photos plus 1,000 unpaired low-light photos. Build a pipeline that combines a learned denoiser (e.g. a small DnCNN-style model …
- Image Restoration
- Denoising
- Tone Mapping
Image Processing and Computational Imaging
CodeBeginnerNew
Predict Catalyst Properties for a Green-Hydrogen Pharma Spinout
Use an open catalyst dataset (e.g., Open Catalyst Project subset, or a Materials Project pull) where each candidate has descriptors and a target activity property. Train a tabul…
- Tabular Modeling
- Uncertainty Quantification
- Feature Engineering
AI for Science and Engineering
AnalysisBeginnerNew
Optimize Hyperparameters with Bayesian Optimization on a Tight Budget
You receive a B2B-SaaS churn dataset (about 12,000 customer-month rows, 38 features) and a fixed sweep budget of 40 trials per model family. Implement a Bayesian optimizer (Optu…
- Bayesian Optimization
- Hyperparameter Tuning
- Ensemble Methods
Advanced Machine Learning
AnalysisBeginnerNew
Build a Public Open-Data Dashboard for Urban Mobility
Pull the city's open-data cyclist-collision dataset (10 years of incidents, geocoded). Define a clear before/after window around the protected-lane rollout, control for traffic-…
- Exploratory Data Analysis
- Data Wrangling
- Geospatial Analysis
Applied Data Analysis and Practical Data Science
Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
ResearchIntermediateNew
Explore the Bias-Variance Trade-off on a Tabular Healthcare Cohort
You receive a 90,000-patient anonymized de-identified tabular dataset (demographics, labs, claims-derived features) and a binary 12-month-readmission outcome. Pick three model f…
- Bias Variance Tradeoff
- Regularization
- Model Selection
Machine Learning
CodeBeginnerNew
Build a Credit-Card Fraud Detector for a Singapore Neobank
You receive 9 months of anonymized authorization data (around 8 million transactions, around 0.4 percent fraud) plus current rule outcomes. Split temporally and train at least t…
- Classification Modeling
- Class Imbalance
- Model Calibration
AI and Quantitative Finance
CodeSeniorNew
Run a Backpropagation Bug-Hunt on an Open-Source RL Implementation
You receive the offending fork (around 4,000 lines of PyTorch) and three known-failure seeds. Reproduce the NaN failure deterministically, instrument the forward and backward pa…
- Backpropagation
- Pytorch
- Debugging
Deep Learning
CodeIntermediateNew
Actor-Critic for Energy-Storage Dispatch
You receive 3 years of hourly day-ahead price data and a Python simulator that models state of charge, round-trip efficiency, and a 1-day price forecast with documented uncertai…
- Actor Critic
- A2c
- Deep Rl
Reinforcement Learning
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
Browse challenges
CodeFoundationalNew
Edge Detection Pipeline for a Manufacturing QA Camera
Use a small provided dataset of around 200 part images under 3 lighting conditions. Build a classical pipeline using OpenCV: grayscale + adaptive thresholding + Canny edge detec…
- Image Processing
- Edge Detection
- Opencv
Computer Vision (Undergraduate)
CodeIntermediateNew
Extract Structured Lease Terms for a Commercial Real-Estate Platform
You receive 500 anonymized lease PDFs and a labelled gold set of 150 leases with the 14 fields filled in. Build a pipeline that does (1) layout-aware PDF parsing (Unstructured, …
- Information Extraction
- Pdf Parsing
- Named Entity Recognition
Linguistic Engineering and Language Technologies
CodeIntermediateNew
Map a Climate-Policy Corpus to Linked Open Data
You receive 12,000 policy PDFs and a benchmark of 200 documents with manually linked entities (places, organizations, policies). Build a pipeline that runs NER, candidate-genera…
- Entity Linking
- Linked Open Data
- Wikidata
Knowledge Graphs and Semantic Web
CodeIntermediateNew
Build a Canary Rollout for a Production Recommender
Pick a serving stack (Triton, Seldon Core, KServe, or BentoML). Implement two-model traffic splitting with a configurable percentage (start at 5%). Wire up online metric collect…
- Canary Deployment
- Kubernetes
- Ab Testing
ML Engineering and Production ML
Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
AnalysisIntermediateNew
Benchmark Approximate Nearest-Neighbor Indexes for a Code-Search Startup
You receive a 5 M-vector sample (768-dim, float32) and a 1,000-query labeled benchmark with ground-truth top-50 neighbors per query. Index the same sample in Chroma (HNSW), Qdra…
- Ann Indexes
- Hnsw
- Benchmarking
Vector Databases and Embeddings
DesignIntermediateNew
Instrument a Model Monitoring Stack from Scratch
Pick the priority product (recommend the customer-service RAG assistant, around 40k queries/day). Define monitoring signals: input drift (Evidently/NannyML), output quality (LLM…
- Model Monitoring
- Data Drift Detection
- LLM Evaluation
ML Engineering and Production ML
ResearchBeginnerNew
Evaluate a Generative AI Image Tool with a Within-Subjects Study
You will write a study protocol, recruit 20 participants (a Discord callout is fine), counterbalance the two conditions, and run 45-minute sessions over Zoom. Collect three meas…
- Experiment Design
- User Study
- Within Subjects Design
Human-Computer Interaction for AI Systems
AnalysisBeginnerNew
Stress-Test a Hiring-Funnel Model for Bias
You receive a synthetic-but-realistic dataset of 25,000 past applicants with features (years of experience, education tier, prior role tags) and outcome labels (advanced past th…
- Model Evaluation
- Fairness Metrics
- Logistic Regression
Machine Learning (Undergraduate)
ResearchBeginnerNew
Build an Accessibility Checklist for a Voice Health Assistant
You receive 20 audio samples spanning accents and speech patterns, the assistant's published dialog state machine, and a list of current voice prompts. Audit the assistant for i…
- Accessibility
- Voice Interaction Design
- Evaluation
Human-Computer Interaction for AI Systems
CodeBeginnerNew
Image Search for a DTC Furniture Retailer's App
Use a pretrained vision-embedding model (CLIP ViT-B/32 or DINOv2-small). Index a catalog of around 1,500 furniture images. Curate a small evaluation set of around 50 user-style …
- Image Embeddings
- Vision Transformers
- Image Search
Computer Vision (Undergraduate)
AnalysisIntermediateNew
Draft GDPR + AI Act Data Provisions for a Training-Data Vendor
Anchor the work on (1) GDPR Articles 28 (processor obligations) and 32 (security), (2) the EU AI Regulation's data-governance article for high-risk systems, and (3) the EDPB's p…
- Data Protection Law
- Contract Redlining
- Regulatory Analysis
AI Law, Policy, and Regulation
CodeIntermediateNew
Domain-Adapt an NLP Pipeline from News to Customer-Support Tickets
You receive 30,000 anonymized customer-support tickets (PT-BR + ES) plus the news-trained NER and intent models. Apply continued pretraining of a multilingual encoder (e.g., XLM…
- Transfer Learning
- Domain Adaptation
- Continued Pretraining
Meta-Learning, Transfer Learning, and Multi-Task Learning
StrategyBeginnerNew
Scope a Demand-Forecasting Model with Operations Stakeholders
You receive recorded interview transcripts (or summary notes) for the three personas, plus a sample of the historical sales data. Map each stakeholder's pain to candidate ML pro…
- Stakeholder Framing
- Ml Problem Scoping
- Metric Design
Machine Learning in Practice
ResearchIntermediateNew
Safety-Test a Customer-Service Agent for Adversarial Prompts
You receive a sandboxed instance of the agent (a tool-using LLM that can read account balances and open support tickets — both mocked). Design a red-team suite of at least 80 pr…
- LLM Agents
- Red Teaming
- Adversarial Prompts
AI Agents and LLM-Based Agents
ResearchSeniorNew
Stress-Test Scalable Oversight on a Tool-Using Agent
Design a sandwich-oversight study: pick a task domain where non-expert oversight is plausible but not trivial (e.g., reviewing data-analysis steps, checking small bug fixes, eva…
- Scalable Oversight
- Alignment Research
- Experiment Design
AI Safety and Alignment
ResearchSeniorNew
Audit a Production Model for Membership Inference Attacks
Use a black-box membership inference attack (e.g., the LiRA or shadow-model attack). You have query access to a sandboxed copy of the model + the original training data labels f…
- Membership Inference
- Privacy Attacks
- Model Evaluation
Privacy-Preserving Machine Learning

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Related fields

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship