Durable Main-Memory Index Using Flash (2011)

Archival Note

The original contest site is no longer accessible. There is an Archive.org mirror available, from which the text on the rest of this page was copied.

The provided code for this contest is available at transactionalblog/sigmod-contest-2011GitHub. The exact provided code is preserved as commit 11ba1c5e. The main branch contains changes make to fix build issues, improve the build system, update instructions, etc. Links to code in the copied text below have been changed to point to the GitHub repo. There was an extra tarball listed on the contest site (scdb_final.tar.bz2) which I’ve so far been unable to find a copy of.

This contest was organized by MIT CSAIL. The winner of this contest was Thomas Kissinger and Benjamin Schlegel from TU Dresden. The winning submission was also later turned into a publication Efficient in-memory indexing with generalized prefix trees.


Task overview

The task for this contest is to implement a high-throughput main-memory index that uses flash-based SSDs for durability. The index will fit entirely in the main-memory, and all updates to it must be recoverable in the face of crashes. The system should use and optimize for modern flash-based SSDs to achieve durability.

The index offers an order-preserving key-value interface. It maintains key-value pairs that are byte strings and supports both random and ordered accesses to those pairs. All single-key operations are atomic. In addition to random and ordered access, the index offers a single-key atomic compare and swap primitive for building larger atomic operations. The index should support highly concurrent accesses. The optimization metric for this contest is overall operation throughput.

Contestants must supply the source code for their entries, and agree to license their code under the BSD or MIT open source license should their system win the contest. The contestants must be full time students at a degree granting post secondary institution. Both undergraduate and graduate students are eligible. Submissions may be written in any language, but it must be capable of linking against our x86-64 "test driver" on Linux. Details of the API and implementation requirements are below.

Task Details, API and Test Implementation

In addition to the requirements above, we list additional constraints and clarifications:

API

The commands which your system must support are specified in the API, which can be downloaded from the following link:

Unit Test and Performance Test

We have written a basic unit test and a performance test that you can use to test the performance and correctness of your implementation. This code might have bugs, so please discuss any issues on the mailing list, which we will also use to announce any updates. This also includes a reference implementation based on Berkley DB. It is not a high performance implementation, but it correctly implements the API. This implementation compiles on modern Linux systems with GCC.

Benchmark

The benchmark will be a mixture of inserts, updates, deletes, compare and swap, and ordered traversal operations. The workload will begin by generating sufficient data to ensure that the entire SSD is full, and to force the implementation to perform garbage collection. We will then begin measuring the overall operation throughput. The measurement period will be long enough to ensure our measurement includes garbage collection. The overall throughput will be the metric used to compare implementations. We will also test the durability feature using fault-injection. Implementations that fail to durably record data will be disqualified.

While we will be providing a "test workload," the final workload will not be the same. Thus, your implementation should provide good performance on a wide range of workloads, and should not be tuned for one specific benchmark.