The Lifecycle of a Query
When you submit a SQL query to a database, it doesn't just magically produce a result. The database undertakes a sophisticated, multi-stage process to translate the declarative SQL statement (which describes what data you want) into an imperative, efficient execution plan (which describes how to get that data). This entire process is the responsibility of the Query Planner.
In QuillSQL, this process follows a classic, compiler-like pipeline, which is a cornerstone of modern database architecture as taught in courses like CMU 15-445.
The journey from a SQL string to an executable plan involves several transformations:
SQL String -> AST (Abstract Syntax Tree) -> Logical Plan -> Optimized Logical Plan -> Physical Plan
Let's break down each stage.
Stage 1: Parsing (SQL -> AST)
The first step is purely syntactic. The raw SQL query string is fed into a parser. QuillSQL uses the excellent sqlparser
crate for this. The parser checks if the SQL conforms to valid grammar and, if so, converts it into an Abstract Syntax Tree (AST).
An AST is a direct tree representation of the SQL query's structure. For example, SELECT id FROM users WHERE age > 30
would be parsed into a tree structure with nodes representing the SELECT
clause, the table users
, the WHERE
clause, and the predicate age > 30
.
Stage 2: Logical Planning (AST -> Logical Plan)
Next, the LogicalPlanner
(plan/logical_planner.rs
) walks the AST and converts it into a Logical Plan.
A Logical Plan is a tree of relational algebra operators. It describes the query in terms of high-level data operations, completely independent of how the data is stored or which algorithms will be used. It defines what to do, not how to do it.
Key logical operators in QuillSQL (plan/logical_plan/mod.rs
) include:
TableScan(users)
: Represents reading the entireusers
table.Filter(predicate: age > 30)
: Represents filtering rows based on a condition.Projection(columns: [id])
: Represents selecting specific columns.Join
: Represents joining two data sources.Aggregate
: Represents aGROUP BY
operation.Sort
: Represents anORDER BY
operation.
For our example query, the initial logical plan might look like this:
Projection(columns=[id])
└── Filter(predicate=age > 30)
└── TableScan(users)
Stage 3: Logical Optimization
Before executing the plan, we have a crucial opportunity to make it more efficient. The LogicalOptimizer
(optimizer/logical_optimizer.rs
) takes the logical plan and applies a series of transformation rules to produce a new, equivalent logical plan that is expected to be faster.
QuillSQL uses a simple but effective rule-based optimizer. A classic example of such a rule is Predicate Pushdown. Consider this query:
SELECT name FROM (SELECT * FROM users JOIN cities ON users.city_id = cities.id) WHERE users.age > 30;
A naive logical plan would first perform a full JOIN
between users
and cities
and then filter the massive result. Predicate pushdown is a rule that would rewrite the plan to apply the age > 30
filter before the join:
Before Optimization:
Filter(users.age > 30)
└── Join(users.city_id = cities.id)
├── TableScan(users)
└── TableScan(cities)
After Optimization (Predicate Pushdown):
Join(users.city_id = cities.id)
├── Filter(users.age > 30)
│ └── TableScan(users)
└── TableScan(cities)
By filtering early, we dramatically reduce the number of rows that need to be processed by the expensive Join
operator. QuillSQL implements similar rules, such as PushDownLimit
, which pushes LIMIT
clauses down the tree to reduce the amount of data processed.
Stage 4: Physical Planning (Logical Plan -> Physical Plan)
Finally, the PhysicalPlanner
(plan/physical_planner.rs
) converts the optimized logical plan into a Physical Plan.
A Physical Plan describes exactly how the query will be executed. It maps each logical operator to a concrete algorithm or implementation.
- A
LogicalPlan::TableScan
becomes aPhysicalSeqScan
(a sequential scan of the table heap). - A
LogicalPlan::Filter
becomes aPhysicalFilter
, which implements the filtering logic. - A
LogicalPlan::Join
becomes aPhysicalNestedLoopJoin
. This is where the database commits to a specific join algorithm. A more advanced database might have multiple options (e.g.,PhysicalHashJoin
,PhysicalSortMergeJoin
) and would use a cost model to choose the best one. QuillSQL currently implements Nested Loop Join.
Each node in the physical plan tree is an executor that the Execution Engine can run. This final plan is what gets executed to produce the query result.
Conclusion
This layered approach—from syntax to a logical representation, then to an optimized logical representation, and finally to a concrete physical execution plan—is fundamental to database design. It provides a clear separation of concerns and, most importantly, creates a dedicated optimization stage, which is the key to achieving high performance on a wide variety of SQL queries.
For Study & Discussion
-
Logical vs. Physical: Why is the separation between logical and physical plans so important? What would be the disadvantages of a simpler system that converted the AST directly into a physical plan?
-
Join Algorithms: QuillSQL currently only implements
NestedLoopJoin
. What are two other common join algorithms? Describe how they work and in what scenarios they would be more performant than a nested loop join. -
Programming Exercise (Advanced): Implement a
PhysicalHashJoin
operator. This is a significant undertaking that involves: a. Creating aPhysicalHashJoin
struct that implements theVolcanoExecutor
trait. b. In theinit()
phase, it should consume the entire "build" side (typically the smaller, right-hand table) and build an in-memory hash table from its rows. c. In thenext()
phase, it should read from the "probe" side (the left-hand table) one row at a time, probe the hash table for matches, and emit the joined tuples. d. Modify thePhysicalPlanner
to choosePhysicalHashJoin
instead ofPhysicalNestedLoopJoin
for equi-joins. -
Programming Exercise: Add support for the
UNION ALL
operator. This would involve: a. Adding aUnion
variant to theLogicalPlan
enum. b. Updating theLogicalPlanner
to recognize theUNION
syntax in the AST and create aLogicalPlan::Union
node. c. Creating aPhysicalUnion
executor that pulls tuples from its first child until it's exhausted, and then pulls tuples from its second child.