The Volcano Execution Model
Once the Query Planner has produced an optimized PhysicalPlan
, it's the job of the Execution Engine to run it and produce results. The execution engine is the component that brings the plan to life, interacting with the transaction manager and storage layer to process data.
QuillSQL uses the classic Volcano Model, also known as the Iterator Model. This is a pull-based execution model where each physical operator in the plan tree acts as an iterator that the parent operator can "pull" rows from.
1. The VolcanoExecutor
Trait
At the heart of the execution model is the VolcanoExecutor
trait (execution/mod.rs
). Every physical operator implements this simple trait:
#![allow(unused)] fn main() { pub trait VolcanoExecutor { fn init(&self, context: &mut ExecutionContext) -> QuillSQLResult<()>; fn next(&self, context: &mut ExecutionContext) -> QuillSQLResult<Option<Tuple>>; fn output_schema(&self) -> SchemaRef; } }
init()
: This method is called once at the beginning of execution. It allows an operator to set up its initial state (e.g., aSeqScan
operator would initialize its table iterator here).next()
: This is the core method. When called, the operator produces its next output tuple. It returnsSome(tuple)
if it has a row, orNone
if it has exhausted its data source. The top-levelExecutionEngine
simply callsnext()
on the root of the plan tree in a loop until it receivesNone
.
2. The ExecutionContext
Notice that both init()
and next()
take a mutable ExecutionContext
. This object is the "context" or "world" in which the query runs. It is passed down the entire operator tree and gives each operator access to crucial services:
Catalog
: To look up tables and indexes.TransactionManager
andTransaction
: To interact with the current transaction. This is how operators perform locking and visibility checks.MvccSnapshot
: The specific MVCC snapshot for the current transaction, used to determine which tuple versions are visible.
This design cleanly separates the operator's logic from the transactional context it runs in.
3. Anatomy of Physical Operators
Data flows up the tree from the leaves (scans) to the root. Let's see how it works by examining a few key operators.
Leaf Operator: PhysicalSeqScan
A sequential scan is at the leaf of a plan tree. It's responsible for reading tuples from a table on disk.
init()
: It acquires anIntentionShared
lock on the table and creates aTableIterator
for the table heap.next()
: In a loop, it does the following:- Pulls the next raw tuple
(rid, meta, tuple)
from theTableIterator
. - Calls
context.is_visible(&meta)
to perform an MVCC visibility check using the transaction's snapshot. - If the tuple version is visible, it acquires the necessary row lock (e.g.,
Shared
lock) and returns the tuple. - If the tuple is not visible, it ignores it and loops to get the next one.
- Pulls the next raw tuple
Unary Operator: PhysicalFilter
A filter has one child operator (its input
). It implements a WHERE
clause.
next()
: Its logic is a simple, tight loop:- It calls
self.input.next()
to get a tuple from its child. - If the child returns
None
, the filter is also exhausted and returnsNone
. - If it receives a tuple, it evaluates its predicate expression (e.g.,
age > 30
) against the tuple. - If the predicate evaluates to
true
, it returns the tuple. Otherwise, it loops back to step 1.
- It calls
Binary Operator: PhysicalNestedLoopJoin
A join has two children: a left (outer) and a right (inner).
next()
: It implements the classic nested loop join algorithm:- Fetch one tuple from the outer (left) child and hold onto it.
- Enter a loop: fetch tuples one by one from the inner (right) child.
- For each inner tuple, combine it with the held outer tuple and evaluate the join condition. If it matches, return the combined tuple.
- When the inner child is exhausted, rewind it by calling
self.right_input.init()
again. - Go back to step 1 to fetch the next tuple from the outer child.
- Repeat until the outer child is also exhausted.
4. Putting It All Together
Consider the query SELECT name FROM users WHERE age > 30
. The physical plan is Projection -> Filter -> SeqScan
.
- The
ExecutionEngine
callsnext()
on theProjection
operator. - The
Projection
operator needs a tuple, so it callsnext()
on its child,Filter
. - The
Filter
operator needs a tuple, so it callsnext()
on its child,SeqScan
. - The
SeqScan
operator fetches a raw tuple from theTableHeap
, checks its MVCC visibility, and finds a visible tuple for a user withage = 25
. SeqScan
returns this tuple up toFilter
.Filter
evaluatesage > 30
on the tuple. It's false, so it loops, callingSeqScan.next()
again.SeqScan
finds another visible tuple, this time for a user withage = 40
andname = 'Alice'
.SeqScan
returns this tuple up toFilter
.Filter
evaluatesage > 30
. It's true! It returns the tuple for Alice up toProjection
.Projection
takes the full tuple for Alice, creates a new tuple containing only thename
column ('Alice'
), and returns this new tuple as the result.
This process repeats, with tuples flowing up the tree one at a time, until the SeqScan
operator runs out of pages and returns None
, which then propagates up the tree, signaling the end of execution.
For Study & Discussion
-
Push vs. Pull Models: The Volcano model is a "pull-based" model. An alternative is a "push-based" model, where operators push their results to their parents as soon as they are ready. What are the potential advantages and disadvantages of each model, particularly concerning cache efficiency and control flow?
-
Blocking vs. Non-Blocking Operators: Some operators, like
PhysicalFilter
, can produce their first output row as soon as they receive their first input row. These are non-blocking. Other operators, likePhysicalSort
, must consume their entire input before they can produce even a single row of output. These are blocking. What is the impact of blocking operators on query latency and memory usage? -
Programming Exercise: The current
PhysicalNestedLoopJoin
is simple but can be inefficient as it re-scans the entire inner table for every outer row. Implement aPhysicalBlockNestedLoopJoin
operator. This version would read a block (a small batch) of tuples from the outer table into an in-memory buffer, and then iterate through the inner table once for that entire block. This can significantly reduce the number of times the inner table needs to be scanned. -
Programming Exercise: Implement the
PhysicalLimit
operator. Itsnext()
method should: a. Keep an internal counter. b. If the counter is less than theoffset
, pull and discard tuples from its child. c. If the counter is betweenoffset
andoffset + limit
, pull a tuple from its child and return it. d. Once the limit is reached, it should stop pulling from its child and returnNone
for all subsequent calls. This is important for efficiency, as it stops the execution of the entire sub-tree below it.