Module Overview
This page gives a teaching-friendly tour of every QuillSQL subsystem. Each section
explains where the code lives (src/<module>), the most important types, and the
design decisions that make the module fit into the whole system. Read it together
with architecture.md for the big-picture data flow.
1. SQL Front-End (src/sql)
| Submodule | Responsibilities | Key Types / Functions |
|---|---|---|
lexer.rs, parser.rs | Translate raw SQL into an AST using sqlparser plus Quill-specific extensions. | parse_sql(&str) -> Vec<Statement> |
ast/mod.rs | Normalises identifiers, handles multi-part names, attaches spans for diagnostics. | NormalizedIdent, ObjectNameExt |
plan/lowering.rs | Bridges AST → planner structs for DDL extras not in upstream sqlparser. | CreateIndexSpec, ColumnDefExt |
Implementation notes:
- We intentionally keep the AST “SQL-shaped”. No premature desugaring occurs in the parser, which keeps the step-by-step teaching narrative simple: SQL text → AST → logical plan.
- Error messages bubble up with span information, so labs around parser extensions can show students exactly which token misbehaved.
- Suggested exercises: extend the parser with
ALTER TABLEor window functions, then observe how the new AST nodes flow into the planner layer.
2. Logical Planning (src/plan)
| Component | Description |
|---|---|
LogicalPlanner | Converts AST nodes into strongly typed LogicalPlan variants. Responsible for type checking, alias resolution, and scope handling. |
PlannerContext | Surface for catalog lookups while planning. |
PhysicalPlanner | Lowers optimized logical plans into physical Volcano operators (PhysicalPlan). |
Highlights:
- Every
LogicalPlanvariant stores its child plans inArc, so the tree is cheap to clone for optimizer passes or debugging prints. - Planner enforces column binding rules: scope stacks keep track of aliases, CTEs, and correlation so students can see how real compilers resolve identifiers.
- DDL nodes capture
TableReference+Schemaobjects. Later phases use them to skip repeated catalog lookups (helpful for labs about metadata caching).
3. Optimizer (src/optimizer)
| Piece | Responsibility |
|---|---|
LogicalOptimizer | Applies a pipeline of rule-based rewrites (predicate pushdown, projection pruning, constant folding). |
rules | Individual OptimizerRule implementations. |
Teaching hooks:
- Each rule implements
OptimizerRule::rewrite(&LogicalPlan) -> Option<LogicalPlan>. The return type makes it obvious whether a rewrite fired, so labs can instrument hit counts or add tracing. - Because rules are pure functions, students can safely reorder or A/B test heuristics in isolation (e.g., “what if we only push predicates below joins when the selectivity estimate exceeds X?”).
- Example lab: add constant folding or join commutation, run the sqllogictest suite, and compare plan dumps to see new shapes.
4. Execution Engine (src/execution)
| Element | Details |
|---|---|
PhysicalPlan | Enum covering all Volcano operators. Each variant implements VolcanoExecutor. |
VolcanoExecutor | init(&mut ExecutionContext) and next(&mut ExecutionContext) pair define the iterator model. |
ExecutionContext | Supplies catalog access, expression evaluation, and (most importantly) a pluggable StorageEngine. |
Design notes:
- Operators stay declarative. They describe what to scan or modify and delegate the
how to storage handles. For example,
PhysicalSeqScannow requests aTupleStreamviaExecutionContext::table_stream, so it never touchesTableHeapinternals. - Helper methods such as
eval_predicate,insert_tuple_with_indexes, orprepare_row_for_writeencapsulate MVCC/locking rules so physical plans remain short. - ExecutionContext caches
TxnContext, making it straightforward to teach isolation semantics: examineTxnContext::lock_tableandread_visible_tupleto see when locks are taken or released. - Suggested lab: implement a new physical operator (e.g., hash join) by wiring two child
PhysicalPlans without ever touching storage-specific code. This highlights the payoff of the handle abstraction.
5. Transaction Runtime (src/transaction)
| Type | Purpose |
|---|---|
TransactionManager | Creates/commits/aborts transactions, manages undo chains, and coordinates WAL durability. |
TxnContext | Per-command wrapper passed to execution. Provides MVCC snapshot (TransactionSnapshot), lock helpers, and command ids. |
LockManager | Multi-granularity 2PL with deadlock detection. |
Deeper dive:
Transactionstores a cachedTransactionSnapshotplus its undo records. Students can inspectTransaction::set_snapshotto see how Repeatable Read/Serializable keep a stable view.TxnRuntimepairs a transaction with a command id. Every SQL statement increments the command id so MVCC can distinguish between tuples created earlier in the same transaction vs. the current statement—great for explaining “recent writes are invisible during UPDATE scanning”.LockManagerexposes functions likelock_table/lock_row. Internally it keeps a wait-for graph and victim selection policy, which makes deadlock detection tangible for concurrency lectures.- Undo entries (
UndoRecord) store heap + index data. When an abort occurs the engine generates CLRs, demonstrating ARIES-style logical undo.
6. Storage Engine & Handles (src/storage)
| Component | Highlights |
|---|---|
engine.rs | Defines StorageEngine, TableHandle, IndexHandle, TupleStream, and IndexScanRequest. |
table_heap | Slotted-page heap with MVCC metadata (TupleMeta, forward/back pointers). |
index | B+Tree (B-link) implementation with iterators and codecs. |
Key ideas for teaching: Topics to emphasise:
- Handle abstraction: Execution asks the engine for a
TableHandle, then callsfull_scan()to receive aTupleStream. The default engine simply wrapsTableHeap/BPlusTreeIndex, but students can plug in their own engines for research. - TupleStream: Minimal interface returning
(RecordId, TupleMeta, Tuple)triples. Operators layer MVCC visibility on top, while the stream hides buffering details. - Scan extensions: Consider extending
full_scan()to accept projection/batch hints— a natural exercise for showing how execution and storage negotiate capabilities. table_heapdemonstrates MVCC version chains (forward/back pointers) and the slotted page layout. Encourage students to traceMvccHeap::updatealongside WAL entries to see how new versions link together.index/btree_index.rsimplements a B-link tree with separate codecs. Scanning viaTreeIndexIteratorshows how to perform lock-free range scans using sibling pointers— perfect for advanced systems lectures.
7. Buffer Manager & Disk I/O (src/buffer, src/storage/disk_*)
| Module | Description |
|---|---|
buffer::BufferManager | Page table + LRU-K/TinyLFU replacer, dirty tracking, guard types for safe borrowing. |
storage::disk_scheduler | io_uring-powered async I/O. Foreground threads enqueue read/write/fsync commands and await completions. |
storage::disk_manager | Thin wrapper for file layout, page enlargement, and durability fences. |
Extra details:
- Page guards come in three flavours: read, write, and upgradeable. Each implements
Dropsemantics that release latches automatically, reinforcing RAII patterns. - Replacement policy combines LRU-K history with TinyLFU admission. Labs can toggle the feature flag to measure hit-rate differences under sqllogictest or custom workloads.
- DiskScheduler uses lock-free queues plus dedicated worker tasks. A teaching exercise is
to run with
RUST_LOG=debugand observe how read/write/fsync commands are batched.
8. Recovery & WAL (src/recovery)
| Item | Notes |
|---|---|
WalManager | Allocates LSNs, buffers log records, drives background WAL writer, and integrates with checkpoints. |
RecoveryManager | Implements ARIES analysis/redo/undo. Uses ControlFileManager snapshots to seed restart. |
wal_record.rs | Defines logical (HeapInsert, IndexDelete, index structure/root records) and physical (PageWrite FPWs) records. |
Teaching hook:
- WAL and data share the disk scheduler. Students can trace one UPDATE from log append,
to buffer dirtying, to redo/undo via the exact same
RecordId. - Recovery exports statistics (redo count, loser transactions) so labs can check their WAL experiments automatically.
recovery/analysis.rsshows how Dirty Page Table and Active Transaction Table are reconstructed—ideal for demystifying ARIES’ first phase.- Students can implement “logical replay only” or “page image replay” modes by toggling the commit record types, then verify behaviour using the provided transaction tests.
9. Background Services (src/background)
| Worker | What it does |
|---|---|
| WAL writer | Flushes WAL buffer at configured cadence. |
| Checkpoint | Captures the Dirty Page Table + Active Transaction Table. |
| Buffer writer | Flushes dirty frames to keep checkpoints short. |
| MVCC vacuum | Reclaims tuple versions older than safe_xmin. |
More context:
- Workers share a
BackgroundWorkersregistry so the database can spawn/stop them as a group (handy for tests). The registry exposesshutdown_all()which unit tests call to ensure a clean exit. - Config structs (
IndexVacuumConfig,MvccVacuumConfig, etc.) live insrc/configand are exposed throughDatabaseOptionsfor easy fiddling in labs. - WAL writer and checkpoint worker simply wrap closures around
tokio::task::JoinHandle. This design keeps async runtime code out of the core engine, making it simpler for students to trace background effects. - Exercise idea: tweak
MvccVacuumConfig::batch_limitand observe how many tuple versions stay behind by querying hidden statistics tables.
10. Configuration & Session Layer (src/database, src/session, src/config)
| Type | Role |
|---|---|
Database | Boots disk/WAL/buffer components, runs recovery, wires background workers, and executes SQL strings. |
SessionContext | Tracks per-connection defaults (autocommit, isolation) and holds the active transaction handle. |
config::* | Central place for WAL, buffer pool, vacuum, and HTTP/CLI tuning knobs. |
Extra pointers:
- Front-ends (
bin/client,bin/server) both embed aDatabase, proving that the core library can serve multiple UIs without change. DatabaseOptionsshow how to construct dev/test setups (temporary files, alternate WAL directories) in a few lines.session::SessionContextdemonstrates auto-commit semantics: it lazily starts a transaction and usesTransactionScopeto interpretSET TRANSACTIONstatements.- Configuration structs derive
Clone + Debug, making them easy to print in labs or feed from environment variables (HTTP server usesQUILL_DB_FILE,PORT, etc.).
11. Testing & Documentation (tests/, docs/)
| Area | Purpose |
|---|---|
tests/sql_example/*.slt | sqllogictest suites for SQL coverage. |
tests/transaction_tests.rs | Unit tests for MVCC invariants, locking, and visibility. |
docs/ | This mdBook. Every module adds its own deep-dive chapter, making it straightforward for students to jump from code to guided explanations. |
Testing strategy:
- Developers can run
cargo test -qfor fast feedback. Long-running suites can be wrapped withtimeoutas suggested inAGENTS.md. - Example-driven docs (like this page) mirror the repository layout, so onboarding students can find code and tests with minimal guesswork.
- Encourage students to add sqllogictest cases alongside code changes. Because each case lives
in
tests/sql_example, git diffs double as documentation. - For modules with heavy concurrency (lock manager, WAL), pair unit tests with tracing: the CI logs become walkthroughs for tricky paths.
12. Suggested Reading Order for Learners
- Introduction ➜ Architecture – get the 10,000ft view.
- Module Overview (this page) – map names to directories.
- Execution + Storage chapters – understand TupleStreams, handles, and MVCC.
- Recovery + Transaction – see how WAL && MVCC interlock.
- Buffer, Index, Background – dive into advanced systems topics once the basics stick.
Treat this document as a living index: update it whenever a subsystem gains new entry points (e.g., asynchronous scans, new index types) so future contributors always know where to look next.