Extract Structured Lease Terms for a Commercial Real-Estate Platform

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive 500 anonymized lease PDFs and a labelled gold set of 150 leases with the 14 fields filled in. Build a pipeline that does (1) layout-aware PDF parsing (Unstructured, PyMuPDF, or LayoutParser), (2) field extraction using a hybrid of regex/rules + a small extractive model, (3) per-field confidence scoring, (4) a Streamlit review tool for low-confidence rows. Evaluate per-field accuracy on a held-out 50-lease test set. Deliver: pipeline, review tool, evaluation report, and a 3-page deployment recommendation for the platform team.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Automate per-field lease extraction at 95% accuracy with a human-review fallback for low-confidence rows.

Earning criteria — what you'll demonstrate

Build a hybrid rule + ML extraction pipeline on real PDF data
Calibrate per-field confidence to route to human review
Evaluate IE accuracy per field, not just overall
Translate extraction performance into deployment recommendations

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Linguistic Engineering and Language Technologies

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

NLP Engineer
AI Engineering

NLP Engineer

Owning an IE pipeline on real PDFs with calibrated confidence is the day-one work of NLP engineers at any vertical-document AI startup.

This challenge sharpens

information-extraction
named-entity-recognition
pdf-parsing

AI Engineer

Wiring the human-in-the-loop fallback plus the rollout plan is core AI-engineer work at vertical-AI vendors.

This challenge sharpens

human-in-the-loop
confidence-calibration
pdf-parsing

Data Engineer

Designing the extraction pipeline and gold-set evaluation is the data-engineering backbone of any IE product.

This challenge sharpens

information-extraction
evaluation
pdf-parsing

One more thing

You can put a credential on your CV by Friday.

Start this challenge