ColumnarTable execute(const Plan& plan, void* context);
In-Memory Join Pipeline (2025)
Archival Note
The original contest site is still available! (As I’m very timely in archiving this, and it’s hosted by github anyway.) If it’s unavailable in the future, it can be found on an Archive.org mirror instead.
The provided code for this contest is available at SIGMOD-25-Programming-Contest/base, with a fork saved at transactionalblog/sigmod-contest-2025
.
This contest was organized by the DAMS Lab at TU Berlin, the Data Engineering Systems Group at HPI, and the DBGroup at SUSTech.
Task Description
Given the joining pipeline and the pre-filtered input data, your task is to implement an efficient joining algorithm to accelerate the execution time of the joining pipeline. Specifically, You can check out our provided baseline solution and need to implement the following function in src/execute.cpp
Optionally, you can implement these two functions as well to prepare any global context (e.g., thread pool) to accelerate the execution.
void* build_context();
void destroy_context(void*);
Input format
The input plan in the above function is defined as the following struct.
struct ScanNode {
size_t base_table_id;
};
struct JoinNode {
bool build_left;
size_t left;
size_t right;
size_t left_attr;
size_t right_attr;
};
struct PlanNode {
std::variant data;
std::vector<> output_attrs;
};
struct Plan {
std::vector nodes;
std::vector inputs;
size_t root;
};
Scan:
-
The
base_table_id
member refers to which input table in theinputs
member of a plan is used by the Scan node. -
Each item in the
output_attrs
indicates which column in the base table should be output and what type it is.
Join:
-
The
build_left
member refers to which side the hash table should be built on, wheretrue
indicates building the hash table on the left child, andfalse
indicates the opposite. -
The
left
andright
members are the indexes of the left and right child of the Join node in thenodes
member of a plan, respectively. -
The
left_attr
andright_attr
members are the join condition of Join node. Supposing that there are two records,left_record
andright_record
, from the intermediate results of the left and right child, respectively. The members indicate that the two records should be joined whenleft_record[left_attr] == right_record[right_attr]
. -
Each item in the
output_attrs
indicates which column in the result of children should be output and what type it is. Supposing that the left child has \(n_l\) columns and the right child has \(n_r\) columns, the value of the index \(i \in \{0, ..., n_l + n_r - 1\}\), where the ranges \(\{0,...,n_l - 1\}\) and \(\{n_l, ..., n_l + n_r - 1\}\) indicate the output column is from left and right child respectively.
Root:
-
The
root
member of a plan indicates which node is the root node of the execution plan tree.
Data format
The input and output data both follow a simple columnar data format.
enum class DataType {
INT32, // 4-byte integer
INT64, // 8-byte integer
FP64, // 8-byte floating point
VARCHAR, // string of arbitary length
};
constexpr size_t PAGE_SIZE = 8192;
struct alignas(8) Page {
std::byte data[PAGE_SIZE];
};
struct Column {
DataType type;
std::vector pages;
};
struct ColumnarTable {
size_t num_rows;
std::vector columns;
};
A ColumnarTable
first stores how many rows the table has in the num_rows
member, then stores each column separately as a Column
. Each Column
has a type and stores the items of the column into several pages. Each page is of 8192 bytes. In each page:
-
The first 2 bytes are a uint16_t which is the number of rows \(n_r\) in the page.
-
The following 2 bytes are a uint16_t which is the number of non-NULL values \(n_v\) in the page.
-
The first \(n_r\) bits in the last \(\lfloor\frac{(n_r+7)}{8}\rfloor\) bytes is a bitmap indicating whether the corresponding row has value or is
NULL
.
Fixed-length attribute: There are \(n_v\) contiguous values begins at the first aligned position. For example, in a Page
of INT32
, the first value is at data + 4
. While in a Page
of INT64
and FP64
, the first value is at data + 8
.
Variable-length attribute: There are \(n_v\) contigous offsets (uint16_t
) begins at data + 4
in a Page
, followed by the content of the varchars which begins at char_begin = data + 4 + n_r * 2
. Each offset indicates the ending offset of the corresponding VARCHAR
with respect to the char_begin
.
Long string: When the length of a string is longer than PAGE_SIZE - 7
, it can not fit in a normal page. Special pages will be used to store such string. If \(n_r\) == 0xffff
or \(n_r\) == 0xfffe
, the Page
is a special page for long string. 0xffff
means the page is the first page of a long string and 0xfffe
means the page is the following page of a long string. The following 2 bytes is a uint16_t
indicating the number of chars in the page, beginning at data + 4
.
Requirement
-
You can only modify the file
src/execute.cpp
in the project or add new sources. -
You must not use any third-party libraries. If you are using libraries for development (e.g., for logging), ensure to remove them before the final submission.
-
The joining pipeline (including order and build side) is optimized by PostgreSQL for
Hash Join
only. However, in theexecute
function, you are free to use other algorithms and change the pipeline, as long as the result is correct. -
For any struct listed above, all of their members are public. You can manipulate them in free functions as desired as long as the original files are not changed and the manipulated objects can be destructed properly.
-
Your program will be evaluated on an unpublished benchmark sampled from the original JOB benchmark. You will not be able to access the test benchmark.
Quick start
Run all the following commands in the root directory of this project. |
First, download the imdb dataset.
./download_imdb.sh
Second, build the project.
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -Wno-dev cmake --build build -- -j $(nproc)
Third, prepare the DuckDB database for correctness checking.
./build/build_database imdb.db
Now, you can run the tests:
./build/run plans.json
If you want to use Ninja Multi-Config as the generator. The commands will look like: cmake -S . -B build -Wno-dev -G "Ninja Multi-Config" cmake --build build --config Release -- -j $(nproc) ./build/Release/build_database imdb.db ./build/Release/run plans.json |
Datasets
We use the IMDB dataset and JOB benchmark. The original JOB benchmark is included in the repository of the reference implementation. We change the predicates in the original JOB benchmark for each template to generate our evaluation set.
Name |
Description |
Number of queries |
JOB |
The original JOB benchmark |
113 |
JOB-sample |
The sampled JOB benchmark |
983 |
Evaluation
For the evaluation, we run your solution on four different servers, courtesy of the HPI data center:
-
Intel (Xeon E7-4880)
-
AMD (EPYC 7F72)
-
ARM (Ampere Altra Max)
-
IBM (Power8)
For the final evaluation after the submission deadline, four additional undisclosed server will be included. See starter code for more information on the provided hardware. On each server, we measure the runtime of every single query execution, including the context creation. Then, we sum up all execution times per server as \(R_{Intel}\), \(R_{AMD}\), \(R_{ARM}\), \(R_{IBM}\).
Our overall metric is the geometric mean over all server runtimes, \(R = \root{4}\of{ R_{Intel}* R_{AMD} * R_{ARM} * R_{IBM}}\).