AI & Data
Generative AI & LLMs Challenges
Generative AI & LLMs challenges put you inside the work of building with large language models. You'll develop skills in prompt patterns, few-shot prompting, chain-of-thought, and LLM API integration, learning how these models behave before you scale them.
From there you'll handle the harder edges — RAG architectures, vector database basics, fine-tuning, and prompt versioning — putting LLM guardrails and LLM evaluation around every deployment the way AI teams actually do. Each challenge you solve earns a verified credential you can share with recruiters.
- CodeIntermediateNew
Design Prompt Versioning and Observability for a Coding Assistant
You will (1) design a prompt-registry data model (versions, owners, environments, change log) and implement it in Postgres + a small Python SDK, (2) instrument the assistant to …
- Prompt Versioning
- Observability
- Pii Scrubbing
LLM Application Development - DesignIntermediateNew
Design a Continuous Eval Pipeline for an Enterprise RAG Product
Design (and partially build) a continuous-eval pipeline for a RAG system: (1) a structured eval set with at least 50 queries grouped by query class; (2) automated scoring (LLM-a…
- Continuous Evaluation
- LLM Evaluation
- RAG Architectures
AI Measurement and Evaluation - AnalysisIntermediateNew
Catastrophic-Forgetting Audit on a Domain Fine-Tune
You receive the fine-tuned 7B chemistry model and its base, plus a benchmark basket (MMLU subset, GSM8K, IFEval, a small instruction-following set). Run all 4 benchmarks on both…
- Catastrophic Forgetting
- LLM Evaluation
- Fine Tuning
Fine-Tuning Large Language Models - ResearchIntermediateNew
Run an Alignment Probe on a Coding Assistant
You will design 240 probe prompts across 3 classes: (1) over-refusal (innocuous coding asks the model should fulfill), (2) insecure code patterns (asks where the model should wa…
- Red Team Operations
- Alignment Evaluation
- LLM Evaluation
Large Language Models Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
- ResearchIntermediateNew
Audit a Public LLM Benchmark for Validity Threats
Choose one open LLM benchmark (e.g., MMLU, GPQA, BIG-Bench-Hard, MATH). Read the benchmark paper plus at least three follow-up critiques. Audit (1) data contamination risk again…
- Benchmark Evaluation
- Data Contamination Analysis
- Annotation Methodology
AI Measurement and Evaluation - CodeIntermediateNew
Build an Evaluation Harness for an Internal LLM Assistant
You will design and implement an evaluation harness in Python that runs four test suites: (1) helpfulness (LLM-as-judge with rubric), (2) factual grounding (compare cited source…
- LLM Evaluation
- LLM As Judge
- Prompt Injection Testing
Large Language Models - CodeIntermediateNew
Design a Visual Search Backend for a Boutique Luxury Marketplace
You receive a catalog of 80,000 luxury items (image + sparse metadata) and a labeled query set of 300 user photos with hand-picked target items. Choose an embedding strategy (CL…
- Visual Search
- Word Embeddings
- Clip
Deep Learning for Computer Vision - CodeBeginnerNew
Build Semantic Search for an Internal Engineering Wiki
You receive a Confluence XML export (~12k pages, ~80 MB of text) and a hand-labeled benchmark of 50 internal queries with ground-truth doc IDs. Chunk and embed the corpus with a…
- Embedding Models
- Vector Database Basics
- Pgvector
Vector Databases and Embeddings - Browse challenges
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
- AnalysisBeginnerNew
Cost-Optimize an Embedding Pipeline for a Customer Support Knowledge Base
You receive: (a) the current pipeline (full re-embed on any article change, OpenAI text-embedding-3-large, 3,072 dims) with one month of cost logs, (b) a sample of 5,000 article…
- Embedding Models
- Finops & Cost Optimization
- Change Detection
Vector Databases and Embeddings - CodeIntermediateNew
Fine-Tune a Diffusion Model for a Sustainable-Fashion Mood-Board Tool
You receive around 1,200 curated images of sustainable garments tagged with silhouette and material. Choose a base diffusion model (Stable Diffusion 1.5/2.1 or SDXL) and apply L…
- Diffusion Models
- Fine Tuning
- Ai Image Generation
Deep Generative Models - CodeIntermediateNew
Finetune a Diffusion Model for Sustainable-Fashion Mockups
You receive 1,200 product photos with paired captions and the brand's style guide. Fine-tune a Stable-Diffusion-class base model with LoRA (Low-Rank Adaptation, a parameter-effi…
- Diffusion Models
- Lora Finetuning
- Pytorch Or Tensorflow
Advanced Deep Learning - ResearchIntermediateNew
QLoRA Fine-Tune for a Customer-Support Domain Assistant
You receive 8,000 anonymized support ticket pairs (question -> agent response), the company's product documentation (around 600 pages), and a strong RAG baseline already running…
- Qlora
- Fine Tuning
- RAG Architectures
Fine-Tuning Large Language Models Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
- CodeIntermediateNew
Build a Multimodal Generation Pipeline for a Tourism Operator
You receive 40 sample 30-second videos shot by tour guides, the operator's brand voice doc, and SEO keyword lists for EN/PT/ES. Build a pipeline that (1) extracts a representati…
- Multimodal Generation
- Vision Language Models
- LLM Inference
Generative AI - CodeIntermediateNew
Ship an MVP RAG Knowledge Assistant for a Climate-Tech Startup
As a 4-person team across a 6-week sprint, ship: (1) an ingestion pipeline for around 4,000 mixed PDFs and markdown files; (2) a vector store with documented chunking strategy; …
- RAG Architectures
- Software Engineering For Ai
- Vector Databases
AI Software Engineering Group Project - StrategyBeginnerNew
Plan a Self-Improving Sales-Research Agent
Build the v0 agent: given a company URL, it gathers 5 fact bullets (recent news, headcount range, tech stack hints, hiring patterns, a recent leadership change) and drafts a 4-l…
- Ai Agents
- Agent Design
- A/B Testing & Experimentation
AI Agents and LLM-Based Agents - CodeIntermediateNew
Extract Skills and Roles from Job Postings for a Recruiter Tool
You receive 30,000 anonymized job postings and a labelled 1,000-posting benchmark with (skill, role, seniority) spans. Fine-tune a small token classifier (e.g., DeBERTa-v3-base)…
- Information Extraction
- Token Classification
- Esco Taxonomy
Linguistic Engineering and Language Technologies - CodeIntermediateNew
Natural Language Inference for an HR-AI Compliance Tool
Use SNLI/MNLI/ANLI as starting data and curate 200 domain-specific HR examples (synthetic or anonymized) for fine-tuning. Fine-tune a small encoder (DeBERTa-v3-base or similar),…
- Natural Language Inference
- Transformer Models
- Fine Tuning
Computational Semantics - CodeIntermediateNew
Train a Domain-Specific Reranker for a Legal-Tech Search Box
You receive 20,000 (query, document, relevance-label) triples from the firm's contract corpus. Fine-tune a small cross-encoder (e.g., ms-marco-MiniLM-L-6-v2 or BAAI/bge-reranker…
- Cross Encoder Reranker
- Fine Tuning
- Ir Evaluation
Information Retrieval and Search - ResearchIntermediateNew
Design a Capability Evaluation for an Open-Weights Coding Model
Pick a recent open-weights coding model (e.g., a Qwen, DeepSeek, or Llama variant). Design an evaluation set of around 40 coding tasks across 4 buckets: standard benign coding, …
- Capability Evaluation
- Safety Evaluation
- LLM Evaluation
AI Safety and Alignment - CodeIntermediateNew
LLM-Powered FAQ Chatbot for 40-Person SaaS Scale-up
You have access to TaskFlow's internal documentation, help articles, and a sample of 500 support tickets. Your task is to build a retrieval-augmented generation (RAG) pipeline: …
- Large Language Models
- RAG Architectures
- Information Retrieval
Text Analytics and Natural Language Processing - ResearchSeniorNew
Plan a Parameter-Efficient Fine-Tuning Strategy for a Big-Tech AI Lab
You will produce (1) a 6-page survey of four PEFT methods (LoRA, adapters, prefix tuning, IA3) with their strengths, weaknesses, and parameter footprints, (2) a one-page decisio…
- Parameter Efficient Fine Tuning
- Transfer Learning
- Fine Tuning
Meta-Learning, Transfer Learning, and Multi-Task Learning - AnalysisIntermediateNew
Transfer-Learning Backbone Bake-Off for Retail Product Tagging
You receive 80,000 retail product images tagged with multiple labels from a 250-tag taxonomy. Use each of the three pretrained backbones via two transfer strategies: (1) linear …
- Transfer Learning
- Fine Tuning
- Supervised Learning
Meta-Learning, Transfer Learning, and Multi-Task Learning - CodeIntermediateNew
Build a Cross-Lingual Retrieval-Augmented QA System
Index around 5,000 internal-knowledge docs across the three languages using a multilingual embedding model (e.g., multilingual-e5 or BGE-M3). Build the retrieval-then-answer pip…
- RAG Architectures
- Cross Lingual Retrieval
- Multilingual Embeddings
Neural Networks for NLP - CodeBeginnerNew
Structured-Output Prompts for Invoice Extraction
You receive 300 real invoice transcripts (already OCR-ed) labeled with 14 target fields, plus the current production prompt and its 12 percent failure log. Design a new prompt u…
- Structured Output
- Json Schema
- Few Shot Prompting
Prompt Engineering
How it works
From brief to credential, in six steps.
Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.
Industry teams behind a decade of practitioner briefs
Hiring from this pool?
Sponsor a challenge and meet candidates through actual work.
Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.



















































































