Approximate Inference for a Topic Model on Customer Tickets

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive 180,000 tickets (subject + body) spanning the last 18 months. Preprocess into a bag-of-words representation with sensible stopwords and bigrams. Fit a 20-topic LDA via stochastic variational inference (SVI) and via collapsed Gibbs sampling. Compare on (a) wall-clock training time, (b) held-out per-word perplexity on a 10 percent test split, and (c) topic stability across two consecutive weekly snapshots, measured by best-matching topic-word Jaccard overlap. Wrap the winner in a Monday-morning refresh job and write a 1-page note on why the topics drifted before.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Compare variational and Gibbs inference for a weekly-refreshed LDA topic model on support tickets, and recommend one with documented trade-offs.

Earning criteria — what you'll demonstrate

Implement and compare stochastic variational inference vs. collapsed Gibbs sampling
Measure topic-model quality with held-out perplexity and stability metrics
Diagnose and explain topic drift in production
Translate a probabilistic-inference choice into a business-readable note

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Probabilistic Graphical Models

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Machine Learning Engineer

Choosing an inference algorithm under real production constraints (weekly refresh, stability, latency) is the kind of MLE judgement call hiring managers look for.

This challenge sharpens

variational-inference
python
model-evaluation

NLP Engineer

Topic modeling on real support text plus text preprocessing at scale is core NLP-engineer territory at any product-led SaaS.

This challenge sharpens

latent-dirichlet-allocation
text-processing
model-evaluation

Data Scientist

Diagnosing why a probabilistic model drifted week-over-week and communicating the fix is exactly what data scientists do when dashboards lose trust.

This challenge sharpens

approximate-inference
model-evaluation
text-processing

One more thing

You can put a credential on your CV by Friday.

Start this challenge