Sorting (2019)

Archival Note

The original contest site is no longer accessible. There is an mirror available, from which the text on the rest of this page was copied.

The provided code for this contest was not saved by I’ve committed the code for gensort and valsort into transactionalblog/sigmod-contest-2019GitHub, and shingjan/Leyenda-SIGMOD19GitHub's can be used as a template for the test harness.[1][1]: If you make something nice and general, PRs to the contest repo are welcome.

This contest was organized by the Database Group at the University of Wisconsin - Madison and PITLab at IT University of Copenhagen. also did not save the page listing the finalists, so the winning submission is unknown. However, there was a paper published on the 2nd ranked submission.

Task Details

Sorting remains a key primitives for many uses of database systems. The task this year is to sort four datasets of varying size, skew, and data types. The key constraints here are that memory usage is limited, as only 30 GB of memory is allowed while the datasets vary in size from 10 GB to 60 GB.

Input to your program will be provided by a path to an input file to partition or sort and output must be output to a specified output file.

Testing Protocol

Our test harness will first give the path to the input file to sort to your program’s first argument, and the path to the output file to your program’s second input. The files to be used are in the format generated by gensort, where the first 10 bytes are the key and the last 90 bytes the payload. We will iteratively give your program each of the three files one at a time once we have verified the sorting of the current file is correct. Sorting will be validated by valsort. The data sizes are as follows: 10 GB, 20 GB, 60 GB. 20 GB will be in ASCII, 60 GB will have skewed keys. We will start by supporting small and medium evaluation, and expand to large through the weeks.

Your solution will be evaluated for correctness and execution time.