Table Heap and MVCC
The TableHeap
(storage/table_heap.rs
) is the component responsible for managing the collection of pages that belong to a single table. While the TablePage
defines the layout within a single page, the TableHeap
provides the high-level API for inserting, updating, and deleting tuples, making it central to the implementation of MVCC.
Tuple
Serialization
A logical Tuple
struct in memory must be converted into a byte array to be stored on a page. This process is handled by storage/codec/tuple.rs
.
The serialized format of a tuple consists of two parts:
- Null Bitmap: A compact bitmap at the beginning of the tuple data. Each bit corresponds to a column in the schema; if the bit is
1
, the column's value isNULL
. This avoids storing any data for null fields. - Attribute Data: The actual data for all non-null columns, serialized one after another.
TupleMeta
: The Heart of MVCC
The most important part of a tuple's on-disk metadata is the TupleMeta
struct, which is stored directly within the TupleInfo
slot in the page header. This is the heart of QuillSQL's Multi-Version Concurrency Control (MVCC) implementation.
insert_txn_id
: The ID of the transaction that created this version of the tuple.delete_txn_id
: The ID of the transaction that "deleted" this version. (A value of0
orINVALID
means it's not deleted).is_deleted
: A boolean flag indicating the tuple version is considered deleted.prev_version: Option<RecordId>
: A link (RID) to the previous version of this logical row.next_version: Option<RecordId>
: A link (RID) to the next version of this logical row.
These fields allow QuillSQL to maintain a version chain for each logical row. When a row is updated, the TableHeap
's mvcc_update
method is called, which, instead of overwriting the data, performs the following steps:
- Creates a new tuple version with the new data by calling
insert_tuple
. - Sets the
prev_version
pointer of this new version to the RID of the old version. - Updates the
next_version
pointer of the old version to point to the new version. - Sets the
delete_txn_id
on the old version to mark it as "dead".
A transaction can then traverse this chain and use the transaction IDs in the TupleMeta
to determine which version of a row is visible to it based on its own transaction ID and isolation level, thus achieving transaction isolation without long-held read locks.
For Study & Discussion
-
Version Chain Traversal: Imagine a transaction with ID
T10
needs to read a row. It finds a version of the row that was inserted byT5
(committed) but deleted byT12
(in-progress). ShouldT10
see this version? What if the deleter wasT8
(committed)? What if the deleter wasT10
itself? Walk through the visibility check logic. -
Garbage Collection: The MVCC model creates many "dead" tuple versions that are no longer visible to any active or future transaction. What is the long-term problem with leaving these dead versions in the database? This problem is typically solved by a process called vacuuming or garbage collection. When is it safe to physically remove a dead tuple version?
-
Programming Exercise (Advanced): Implement a basic
VACUUM
function for aTableHeap
. This function should scan the table and identify dead tuple versions that are no longer visible to any currently active transaction. Once identified, it should physically remove them from their pages. This is a challenging exercise that touches transaction management, storage, and concurrency.