Keyboard shortcuts

Press or to navigate between chapters

Press ? to show this help

Press Esc to hide this help

Catalog Module

src/catalog/ acts as QuillSQL’s data dictionary. It tracks schema/table/index metadata, statistics, and the mapping between logical names and physical storage objects such as TableHeap and BPlusTreeIndex. Every layer—planner, execution, background workers—uses the catalog to discover structure.


Responsibilities

  • Persist definitions for schemas, tables, columns, indexes, and constraints.
  • Map logical TableReferences to physical handles (heap files, index roots, file ids).
  • Store table statistics (row counts, histograms) that drive ANALYZE and optimization.
  • Manage the DDL lifecycle: creation and deletion update the in-memory registry and the on-disk metadata pages.

Directory Layout

PathDescriptionKey Types
mod.rsPublic API surface.Catalog, TableHandleRef
schema.rsSchema objects and table references.Schema, Column, TableReference
registry/Thread-safe registry for heaps (MVCC vacuum).TableRegistry
statistics.rsANALYZE output and helpers.TableStatistics
loader.rsBoot-time metadata loader.load_catalog_data

Core Concepts

TableReference

Unified identifier (database, schema, table). Logical planner, execution, and transaction code all use it when requesting handles from the catalog.

Registries

TableRegistry maps internal IDs to Arc<TableHeap> plus logical names. It is used by the MVCC vacuum worker to iterate user tables without poking directly into catalog data.

Schema & Column

Schema stores column definitions (type, default, nullability). Execution uses it when materialising tuples; the planner uses it to check expression types. Schema::project helps physical operators build projected outputs.

TableStatistics

ANALYZE writes row counts and histograms into the catalog. Optimizer rules and planner heuristics can consult these stats when deciding whether to push filters or pick indexes. Each column tracks null/non-null counts, min/max values, and a sample-based distinct estimate, enabling DuckDB-style selectivity heuristics (1/distinct, uniform ranges).


Interactions

  • SQL / Planner – DDL planning calls Catalog::create_table / create_index; name binding relies on Schema.
  • ExecutionExecutionContext::table_handle and index_handle fetch physical handles through the catalog, so scans never hard-code heap locations.
  • Background workers – MVCC and index vacuum iterate the registries via Arc clones.
  • Recoveryload_catalog_data rebuilds the in-memory catalog from control files and metadata pages during startup.

Teaching Ideas

  • Extend the schema system with hidden or computed columns and teach the catalog to store the extra metadata.
  • Add histogram bins to TableStatistics and demonstrate how a simple cost heuristic can choose better plans.
  • Turn on RUST_LOG=catalog=debug to observe how DDL mutates the registries.