Train a Word-Alignment Model for Low-Resource Catalan-Aranese
Overview
What this challenge is about.
You receive a 35,000-sentence Catalan-Aranese parallel corpus plus a 1,200-pair manually annotated word-alignment test set. Train (1) a classic statistical alignment baseline (eflomal or fast_align) and (2) a neural alignment model (e.g., SimAlign or Awesome-Align with a multilingual encoder). Evaluate Alignment Error Rate (AER) and Precision/Recall. Discuss the trade-offs (training time, inference speed, dependency on external models) for a public-sector procurement audience. Deliver a 3-page recommendation memo.
The Brief
What you'll do, and what you'll demonstrate.
Pick the best word-alignment model for a low-resource Catalan-Aranese language pair under a public-procurement constraint set.
Earning criteria — what you'll demonstrate
- Train and evaluate statistical and neural word alignment models
- Apply Alignment Error Rate and related metrics correctly
- Reason about low-resource language constraints and tooling availability
- Communicate trade-offs to a public-sector procurement audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Word-alignment work on low-resource pairs is exactly the niche NLP engineers own at govtech and translation-tooling shops.
This challenge sharpens
- alignment
- low-resource-mt
- transformer
Applied AI Scientist
Picking a model that fits public-sector procurement constraints (open-source, supportability) is core applied-AI work at consultancies serving governments.
This challenge sharpens
- model-evaluation
- low-resource-mt
- alignment
ML Researcher
Comparing statistical and neural alignment with rigorous AER reporting is the kind of focused ML-research deliverable that small NLP labs hire for.
This challenge sharpens
- neural-mt
- alignment
- model-evaluation