Computer Science
Site Reliability & Observability Challenges
Site Reliability & Observability challenges put you on the hook for keeping production healthy. You'll build the fundamentals — Application Monitoring, Dashboard Reading, and Performance Analysis — and instrument services with OpenTelemetry instrumentation, Prometheus & Grafana, then define what "healthy" means through Service Level Objectives and SLO / SLI definition.
From there you'll handle the harder edges — Incident command, On-call runbooks, Multi-region failover, and Chaos engineering — the way reliability teams actually operate under pressure. Each challenge you solve earns a verified credential you can share with recruiters.
- CodeIntermediateNew
Instrument Network Telemetry for an ISP's Backbone
Receive the backbone topology (12 routers across 4 PoPs, mix of Cisco IOS XR + Juniper Junos), the current SNMP-based monitoring stack, and 4 weeks of customer-complaint tickets…
- Network Telemetry
- Gnmi
- Kafka Event Streaming
Advanced Computer Networks - CodeIntermediateNew
Build a Canary Rollout for a Production Recommender
Pick a serving stack (Triton, Seldon Core, KServe, or BentoML). Implement two-model traffic splitting with a configurable percentage (start at 5%). Wire up online metric collect…
- Canary Deployment
- Kubernetes Orchestration
- A/B Testing
ML Engineering and Production ML - AnalysisIntermediateNew
Debug Latency Tail With Distributed Tracing on a Logistics SaaS
Receive 7 days of anonymized trace data in Tempo, the service map (12 services), and the customer complaint log. Investigate: filter the slowest 1 percent of traces, identify th…
- Distributed Tracing
- Performance Analysis
- Tempo
Software Observability - DesignIntermediateNew
Design Parallel I/O for a Climate-Simulation Data Pipeline
Analyze the current I/O pattern: each MPI rank writes its own file via serial HDF5 (the classic anti-pattern). Design a single shared file using parallel HDF5 + MPI-IO with coll…
- Mpi Io
- Parallel Hdf5
- Lustre Filesystem
High-Performance and Scientific Computing Practice your coursework on real scenarios.
Every challenge is shaped from real industry context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
- StrategyIntermediateNew
Design a Multi-Region Blue/Green Database Deployment for an EdTech Platform
Design the migration: source PostgreSQL 14 (primary in São Paulo) to target PostgreSQL 16 (primary in Mexico City, replica in Bogotá). Use logical replication for zero-downtime …
- Blue Green Deployment
- Logical Replication
- Postgresql Or Mysql
DevOps and Secure Deployment - AnalysisIntermediateNew
Design a Custom Page-Replacement Policy for a Tier-1 Cloud Provider Simulator
Use the provided simulator (Python harness wrapping a C++ page-cache model) and the team's 3 anonymized workload traces (web-cache, key-value store, batch analytics). Implement …
- Memory Management
- Page Replacement
- Benchmarking
Operating Systems - CodeBeginnerNew
Observability Injection: Distributed Tracing via Sidecars
Enable Envoy tracing with the OpenTelemetry tracer in Istio MeshConfig. Configure a Tempo backend with a Grafana frontend. Verify W3C tracecontext propagation across all 26 serv…
- Distributed Tracing
- Opentelemetry Instrumentation
- API Gateway Patterns (Kong, Envoy)
Service Mesh and Microservices Networking - DesignSeniorNew
Multi-Region Failover for an Enterprise RAG Service
Design and prototype: (1) a primary-region deployment of the RAG service (vector DB + LLM inference + retrieval API), (2) a passive secondary region with replicated vector store…
- Multi Region Architecture
- Disaster Recovery
- Terraform
Cloud Computing for Data and ML - Browse challenges
Explore role
Product Manager
Ship product that solves real user problems. Combine user research, prototyping, and stakeholder alignment to turn ambiguous briefs into measurable wins — the role at the centre of modern software teams.
- AnalysisIntermediateNew
Tune OpenMP Performance on a Memory-Bound Genomics Pipeline
Profile the existing pipeline at 1, 4, 8, 16, 32, 48, 64 threads using Intel VTune + Linux perf. Identify the bottlenecks (likely candidates: NUMA-unaware memory allocation, fal…
- Openmp
- Numa Awareness
- False Sharing
High-Performance and Scientific Computing - AnalysisIntermediateNew
Tune Consistency Levels for a Global Ride-Hailing Platform
Receive an anonymized topology export (3 regions, RF=3 per region) plus 7 days of query traces tagged by class. Reproduce the matching anomaly in a Kubernetes-based test cluster…
- Consistency Models
- Cassandra
- Distributed Tracing
Distributed Systems - DesignIntermediateNew
Design SLO-Driven Alerts for a Telco's Subscriber API
Receive a 90-day RED (Rate, Errors, Duration) metrics export for the subscriber API across 6 endpoints and 38 weeks of paging history. Define an SLO per endpoint (e.g., 99.9 per…
- Slo Design
- Alerting
- Prometheus & Grafana
Software Observability - CodeBeginnerNew
Implement Progressive Delivery with Flagger for an E-Commerce Backend
Install Flagger (or Argo Rollouts) into the existing Kubernetes + Istio stack. Configure canary analysis using Prometheus metrics: request-success-rate, request-duration p99, an…
- Flagger
- Argo Rollouts
- Canary Deployment
GitOps and Continuous Delivery Build a verifiable portfolio.
Submissions become evidence. Reviewers with shipping experience score against a rubric; the result becomes a credential anyone can verify.
Why Ewance
- AnalysisSeniorNew
Diagnose Clock Skew in a HFT Order Matching System
Receive anonymized PTP grandmaster + slave logs (nanosecond resolution, 30 days, 3 cabinets) plus a synthetic order-flow generator. Identify the drift pattern (likely candidate:…
- Logical Clocks
- Ptp Time Sync
- Cpp Programming
Distributed Systems - AnalysisBeginnerNew
Define SLOs and Error Budgets for a Real-Time Trading API
Pull 90 days of API latency + error data per endpoint from Prometheus (anonymized exports provided). Propose Service Level Indicators (SLIs) for 3 services × 2 SLI types (availa…
- Slo / Sli Definition
- Error Budgets
- Sli Design
Site Reliability Engineering - CodeIntermediateNew
Roll Out OpenTelemetry Tracing Across a Microservices Fintech
Receive an anonymized service map (90 services, payment-critical path of 12), a runtime mix (Node.js, Go, Java), and existing logging/metrics setup. Define: an OTel SDK adoption…
- Distributed Tracing
- Opentelemetry Instrumentation
- Sampling Strategies
Software Observability - DesignIntermediateNew
Observability for a Microservices Payments Platform
Design the observability architecture: OpenTelemetry traces from 38 services into Tempo, structured logs via Loki, RED (rate, errors, duration) metrics via Prometheus, SLOs defi…
- Observability
- Opentelemetry Instrumentation
- Slo Design
DevOps and Secure Deployment - PresentationBeginnerNew
Run an Incident-Response Tabletop for a Healthtech On-Call Team
Design 3 tabletop scenarios with realistic timeline injects (every 5-10 minutes, new info arrives). Run the tabletop hybrid (in-person + remote) with the 8 on-call engineers + 2…
- Incident Response
- Tabletop Exercises
- Incident Command
Site Reliability Engineering - DesignIntermediateNew
Instrument a Model Monitoring Stack from Scratch
Pick the priority product (recommend the customer-service RAG assistant, around 40k queries/day). Define monitoring signals: input drift (Evidently/NannyML), output quality (LLM…
- Model Monitoring
- Data Drift Detection
- LLM Evaluation
ML Engineering and Production ML
How it works
From brief to credential, in six steps.
Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.
Related skill families
Browse all skillsIndustry teams behind a decade of practitioner briefs
Hiring from this pool?
Sponsor a challenge and meet candidates through actual work.
Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.



















































































