Build a Crawler-and-Topic Pipeline for Public-Sector Web Analytics
Overview
What this challenge is about.
You will build a polite, robots.txt-respecting crawler that ingests about 30,000 new posts/week across the 80 forums into a normalized dataset. Apply a topic model (BERTopic, with bilingual support) and surface weekly topic deltas. Build a Streamlit dashboard letting a policy advisor explore top topics, sample posts, and trends. Deliver the crawler, topic pipeline, dashboard, and a 3-page methodology doc covering data-handling and privacy choices.
The Brief
What you'll do, and what you'll demonstrate.
Build a crawler + topic pipeline + dashboard that surfaces civic concerns on municipal forums in near-real-time.
Earning criteria — what you'll demonstrate
- Build an ethical, compliant web crawler at modest scale
- Apply modern topic modeling to bilingual short-text data
- Design an analyst dashboard with delta-over-time views
- Document data-handling and privacy choices defensibly
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Data Engineer
Owning a crawler-to-dashboard pipeline with privacy discipline is the entry-point work for a data engineer on a civic-tech or analytics team.
This challenge sharpens
- web-crawling
- nlp-pipeline
- dashboard-design
NLP Engineer
Applying modern topic-modeling to bilingual short text is the daily reality of NLP engineers in govtech and analytics.
This challenge sharpens
- nlp-pipeline
- topic-modeling
- python
AI Product Designer
Designing a usable analyst surface for topic-delta exploration is exactly an AI product designer's craft in mission-driven orgs.
This challenge sharpens
- dashboard-design
- data-storytelling
- topic-modeling