Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Index a Genomics Dataset with a Suffix Array for Read Matching
Code

Index a Genomics Dataset with a Suffix Array for Read Matching

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Implement a suffix array over a 720 MB DNA sequence (4-character alphabet) using DC3 (Difference Cover modulo 3) or SA-IS construction in Rust. Build pattern-matching utilities (locate, count) for 50-300bp reads. Benchmark construction time, query latency, and memory footprint against the provided FM-index baseline (BWA-MEM). Package as a Rust library with a clean API. Deliver source, a 6-page benchmark + design writeup, and an integration sample showing how the bioinformatics team would call the library from their pipeline.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Build an in-memory suffix-array index in Rust over a 720 MB genome and benchmark construction, query, and memory against an FM-index baseline.

Earning criteria — what you'll demonstrate

  • Implement a near-linear suffix-array construction algorithm in Rust
  • Reason about string-matching trade-offs (suffix array vs FM-index)
  • Benchmark on a realistic dataset, not toy strings
  • Design a library API that fits an existing scientific pipeline

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career mappings coming soon.

One more thing

You can put a credential on your CV by Friday.

Index a Genomics Dataset with a Suffix Array for Read Matching | Ewance Challenge