Scale Feature Pipelines for a Hyperscaler Search-Ranking Team

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

Fix three bottlenecks in a PySpark ranking pipeline using an 80 GB sample, prototype fixes, and estimate production impact. Earn a verifiable certificate.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Halve a 14-hour Spark feature pipeline's wall-clock time at unchanged compute spend.

Earning criteria — what you'll demonstrate

Read and interpret a Spark UI to find real bottlenecks
Apply skew-handling, partition-tuning, and UDF-elimination patterns
Extrapolate sample-scale measurements to production reliably
Write a pre-RFC that engineering peers can review

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Machine Learning at Scale

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Data Engineer
Data Science

Data Engineer

Profiling and tuning a production-grade Spark pipeline with a written pre-RFC is the textbook senior-data-engineer task at any hyperscaler.

This challenge sharpens

spark
data-pipelines
performance-profiling

Machine Learning Engineer

Owning feature-pipeline performance is increasingly part of the MLE remit because slow features starve model iteration.

This challenge sharpens

data-pipelines
performance-profiling
python

MLOps Engineer

Cost-aware infrastructure work on shared compute platforms is core MLOps territory; this challenge practices the discipline.

This challenge sharpens

spark
cost-aware-engineering
distributed-systems

One more thing

You can put a credential on your CV by Friday.

Start this challenge