The Volcano Execution Model
Once the Query Planner has produced an optimized PhysicalPlan, it’s the job of the Execution Engine to run it and produce results. The execution engine is the component that brings the plan to life, interacting with the transaction manager and storage layer to process data.
QuillSQL uses the classic Volcano Model, also known as the Iterator Model. This is a pull-based execution model where each physical operator in the plan tree acts as an iterator that the parent operator can “pull” rows from.
1. The VolcanoExecutor Trait
At the heart of the execution model is the VolcanoExecutor trait (execution/mod.rs). Every physical operator implements this simple trait:
#![allow(unused)]
fn main() {
pub trait VolcanoExecutor {
fn init(&self, context: &mut ExecutionContext) -> QuillSQLResult<()>;
fn next(&self, context: &mut ExecutionContext) -> QuillSQLResult<Option<Tuple>>;
fn output_schema(&self) -> SchemaRef;
}
}
init(): This method is called once at the beginning of execution. It allows an operator to set up its initial state (e.g., aSeqScanoperator would initialize its table iterator here).next(): This is the core method. When called, the operator produces its next output tuple. It returnsSome(tuple)if it has a row, orNoneif it has exhausted its data source. The top-levelExecutionEnginesimply callsnext()on the root of the plan tree in a loop until it receivesNone.
2. The ExecutionContext
Notice that both init() and next() take a mutable ExecutionContext. This object is the “world” in which the query runs. It is passed down the operator tree and exposes:
Catalog+StorageEngine: Operators callcontext.table(&table_ref)to obtain aTableBinding. The binding encapsulates heap/index access (scan, insert, delete, update, prepare-row-for-write) so operators never touchTableHeaporMvccHeapdirectly.TxnContext: Provides the current transaction, MVCC snapshot, and helper methods for row/table locks and visibility checks.- Expression helpers:
eval_expr/eval_predicateevaluate AST expressions against a tuple without leakingScalarValueplumbing into operators.
Thanks to the binding abstraction, the operator code only expresses “what” should happen (scan/update/delete) while the binding implements “how” (MVCC chain navigation, undo logging, index maintenance).
3. Anatomy of Physical Operators
Data flows up the tree from the leaves (scans) to the root. Let’s see how it works by examining a few key operators.
Leaf Operator: PhysicalSeqScan
A sequential scan sits at the leaf of a plan tree and streams tuples from a table binding.
init():- Acquires an
IntentionSharedlock on the table viaTxnContext. - Requests a
TableBinding(context.table(&table_ref)) and callsbinding.scan(...)to obtain aTupleStream. The binding hides the actualTableIteratorimplementation and any MVCC plumbing.
- Acquires an
next():- Pops the next
(rid, meta, tuple)from its prefetch buffer (which in turn pulls from the binding’s stream). - Calls
TxnContext::read_visible_tupleto perform the MVCC visibility check and acquire the required shared row lock. - Returns the tuple if visible; otherwise, keep looping.
- Pops the next
Unary Operator: PhysicalFilter
A filter has one child operator (its input). It implements a WHERE clause.
next(): Its logic is a simple, tight loop:- It calls
self.input.next()to get a tuple from its child. - If the child returns
None, the filter is also exhausted and returnsNone. - If it receives a tuple, it evaluates its predicate expression (e.g.,
age > 30) against the tuple. - If the predicate evaluates to
true, it returns the tuple. Otherwise, it loops back to step 1.
- It calls
Binary Operator: PhysicalNestedLoopJoin
A join has two children: a left (outer) and a right (inner).
next(): It implements the classic nested loop join algorithm:- Fetch one tuple from the outer (left) child and hold onto it.
- Enter a loop: fetch tuples one by one from the inner (right) child.
- For each inner tuple, combine it with the held outer tuple and evaluate the join condition. If it matches, return the combined tuple.
- When the inner child is exhausted, rewind it by calling
self.right_input.init()again. - Go back to step 1 to fetch the next tuple from the outer child.
- Repeat until the outer child is also exhausted.
4. Putting It All Together
Consider the query SELECT name FROM users WHERE age > 30. The physical plan is Projection -> Filter -> SeqScan.
- The
ExecutionEnginecallsnext()on theProjectionoperator. - The
Projectionoperator needs a tuple, so it callsnext()on its child,Filter. - The
Filteroperator needs a tuple, so it callsnext()on its child,SeqScan. - The
SeqScanoperator asks itsTableBindingfor the next tuple, and after the MVCC check finds a visible tuple for a user withage = 25. SeqScanreturns this tuple up toFilter.Filterevaluatesage > 30on the tuple. It’s false, so it loops, callingSeqScan.next()again.SeqScanpulls another visible tuple (this timeage = 40,name = 'Alice') through the binding.SeqScanreturns this tuple up toFilter.Filterevaluatesage > 30. It’s true! It returns the tuple for Alice up toProjection.Projectiontakes the full tuple for Alice, creates a new tuple containing only thenamecolumn ('Alice'), and returns this new tuple as the result.
This process repeats, with tuples flowing up the tree one at a time, until the SeqScan operator runs out of pages and returns None, which then propagates up the tree, signaling the end of execution.
For Study & Discussion
-
Push vs. Pull Models: The Volcano model is a “pull-based” model. An alternative is a “push-based” model, where operators push their results to their parents as soon as they are ready. What are the potential advantages and disadvantages of each model, particularly concerning cache efficiency and control flow?
-
Blocking vs. Non-Blocking Operators: Some operators, like
PhysicalFilter, can produce their first output row as soon as they receive their first input row. These are non-blocking. Other operators, likePhysicalSort, must consume their entire input before they can produce even a single row of output. These are blocking. What is the impact of blocking operators on query latency and memory usage? -
Programming Exercise: The current
PhysicalNestedLoopJoinis simple but can be inefficient as it re-scans the entire inner table for every outer row. Implement aPhysicalBlockNestedLoopJoinoperator. This version would read a block (a small batch) of tuples from the outer table into an in-memory buffer, and then iterate through the inner table once for that entire block. This can significantly reduce the number of times the inner table needs to be scanned. -
Programming Exercise: Implement the
PhysicalLimitoperator. Itsnext()method should: a. Keep an internal counter. b. If the counter is less than theoffset, pull and discard tuples from its child. c. If the counter is betweenoffsetandoffset + limit, pull a tuple from its child and return it. d. Once the limit is reached, it should stop pulling from its child and returnNonefor all subsequent calls. This is important for efficiency, as it stops the execution of the entire sub-tree below it.