Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Curate a Domain Lexicon for a Climate-Tech NLP Stack
Research

Curate a Domain Lexicon for a Climate-Tech NLP Stack

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive 5,000 policy documents and a benchmark of 200 documents with manually tagged domain terms. Curate a lexicon of ~1,500 terms with (1) canonical English form, (2) Swahili/French/Portuguese variants where they exist, (3) a 1-line definition, (4) a Wikidata QID where possible. Add the lexicon as a spaCy EntityRuler component layered on top of a baseline NER pipeline. Evaluate precision/recall improvement on the benchmark. Deliver: lexicon CSV, integrated pipeline, evaluation notebook, and a 4-page methodology note for the non-profit's grant report.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Lift NER on African climate-policy documents by curating a domain lexicon and integrating it into a spaCy pipeline.

Earning criteria — what you'll demonstrate

  • Curate a domain lexicon with multilingual variants
  • Integrate a lexical resource into a spaCy pipeline
  • Evaluate NER lift from a lexical-resource integration
  • Author methodology notes suitable for grant reporting

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

NLP Engineer

Lexicon curation and spaCy pipeline integration is core NLP-engineer work in any domain-specific NLP product.

This challenge sharpens

  • lexical-resources
  • named-entity-recognition
  • spacy

Data Engineer

Curating and publishing a reusable lexical resource is the data-engineering skillset that open-data orgs hire for.

This challenge sharpens

  • lexical-resources
  • wikidata
  • multilingual-nlp

Applied AI Scientist

Methodology-driven lift on a benchmark with honest limits is the day-to-day of applied AI scientists in research-grade NLP work.

This challenge sharpens

  • named-entity-recognition
  • evaluation
  • multilingual-nlp

One more thing

You can put a credential on your CV by Friday.