Skip to content

CUBRID Partitioning — Range/Hash/List Strategies, Partition Pruning, and Per-Partition Execution

Contents:

A relational table is, semantically, a single set of tuples. When the set grows past the point at which a single heap and a single index fit comfortably in working memory or on a single device, physical horizontal partitioning splits the set into disjoint subsets — each a “partition” — keyed by a deterministic function of the row. The logical table is then implemented as the union of its partitions.

Database System Concepts (Silberschatz, Korth, Sudarshan, 7th ed., Ch. 22 “Parallel and Distributed Storage”) gives the canonical treatment: a partitioning function P(tuple) → {0, 1, …, N-1} is chosen by the DBA (or generated by the system) so that each tuple maps to exactly one partition. The textbook then enumerates three canonical partitioning strategies:

  1. Range partitioning. P(t) = i where t.k ∈ [low_i, high_i). The partition key column has a total order; partitions cover non-overlapping intervals of that order. Good for time-series data where most queries hit a recent slice.
  2. Hash partitioning. P(t) = h(t.k) mod N. The partition key is hashed; the modulus picks the partition. Good when queries want even distribution across partitions and there is no natural ordering to exploit (or when the natural ordering is skewed).
  3. List partitioning. Each partition declares an explicit set of key values it owns; P(t) = i if t.k ∈ values_i. Good for categorical data — e.g., country codes or tenant IDs.

Petrov’s Database Internals (Ch. 11 “Distributed Systems”) emphasizes a fourth design dimension: partition awareness in the optimizer and executor. A partitioned table only pays for itself when the engine can:

  • Prune at plan time. Given a predicate WHERE k = 42 over a range-partitioned table, the optimizer should compute, before generating any scan, that only the partition containing 42 needs to be scanned. The other N-1 partitions are eliminated. This is partition pruning.
  • Dispatch at execute time. Even after pruning, the surviving partitions are independent heaps with independent indexes. The executor must produce a single tuple stream by iterating through each surviving partition’s SCAN_ID in turn (or in parallel).
  • Resolve per-row at DML time. For INSERT INTO t VALUES (...) the engine must compute P(row) to know which heap to write to. For UPDATE it must do the same on the post-image and, if the partition assignment changed, move the row from the old heap to the new one.

The textbook calls out two further refinements that more aggressive engines support:

  • Partition-wise joins. If T1 and T2 are both partitioned on the same key with the same partitioning function, then the join T1 ⨝ T2 ON T1.k = T2.k can be decomposed into N parallel joins of T1.partition_i ⨝ T2.partition_i. No row in T1.partition_i can match a row in T2.partition_j for i ≠ j, so the cross-partition matches are eliminated.
  • Local vs global indexes. A local index is partitioned along with the table — each partition owns its own B-tree. A global index spans all partitions; its leaf entries carry the partition identifier in addition to the row pointer. Local indexes are cheaper to maintain (DDL on a partition only touches that partition’s index) but cannot enforce uniqueness across the whole table without help. Global indexes are more expensive to maintain but enforce table-wide uniqueness directly.

Partition pruning itself reduces to a simple algebraic question: given a predicate Φ over the partition key column and a partition descriptor P_i (a range, a hash modulus position, or a value list), is there any value v such that Φ(v) ∧ v ∈ P_i is satisfiable? If no, the partition is pruned. The pruning algorithm walks the predicate tree and, for each leaf comparison, asks each partition descriptor whether it can supply a witness. AND nodes intersect the results, OR nodes union them. NOT and unfamiliar functions conservatively keep all partitions.

The remainder of this document follows CUBRID’s realization of these concepts: how it represents partition descriptors in catalog memory, how the optimizer hooks pruning into access-spec preparation, how the executor dispatches per-partition scans, and how the locator routes DML to the correct partition heap.

Every relational engine that supports partitioning makes the same set of decisions in slightly different shapes; understanding the variants clarifies CUBRID’s choices.

PostgreSQL — declarative partitioning, native pruning, partition-wise joins

Section titled “PostgreSQL — declarative partitioning, native pruning, partition-wise joins”

PostgreSQL added declarative partitioning in 10 (range/list, 2017), hash in 11 (2018), partition-wise joins in 12, and tightened pruning in subsequent releases. The model:

  • Catalog. Each partitioned table is a parent pg_class row; each partition is a child pg_class row connected by pg_inherits. The parent has no heap of its own (it is a “logical parent”). The partition descriptor lives in pg_partitioned_table
    • pg_class.relpartbound.
  • Pruning. Two layers. Plan-time pruning (partprune.c::prune_append_rel_partitions) eliminates partitions when the predicate is fully known at plan time. Execution-time pruning re-runs the pruning logic each time the executor opens the Append node, so parameter values bound at execute time (e.g., from a generic plan or from a parameterized nested-loop) can still drop partitions.
  • Plan shape. The query plan replaces the parent table with an Append (or MergeAppend) of per-partition scans. The optimizer treats each partition as an independent relation for cost purposes; statistics are maintained per partition.
  • Partition-wise join. Optional (enable_partitionwise_join). When both inputs share the same partition shape, the planner decomposes the join into per-partition joins under a single Append.
  • Indexes. Local. The parent table has no heap, so a “global” index would be storage with no parent heap to index. Each partition owns its own indexes; uniqueness enforced across all partitions requires the partition key to be part of the unique constraint.

MySQL — partitioning syntax, type-pruning at plan time

Section titled “MySQL — partitioning syntax, type-pruning at plan time”

MySQL has supported partitioning since 5.1. The model:

  • Catalog. A single .frm (now mysql.tables row) describes the partitioned table including the per-partition definition. Partitions are not separate tables; the storage engine (ha_partition historically, native InnoDB partitioning since 5.7) is responsible for fan-out.
  • Pruning. sql/sql_partition.c walks the WHERE clause and fills a bitmap of “used partitions”. The pruning engine understands range, hash, key, and list partitioning, and composite partitioning (subpartition by hash within range).
  • Plan shape. The optimizer hands a partition bitmap to the storage engine; the storage engine iterates over the surviving partitions internally. There is no Append node visible at the optimizer level.
  • Indexes. Local only. A unique index that does not include the partition key is rejected at DDL time.

Oracle — range/hash/list/composite plus interval partitioning

Section titled “Oracle — range/hash/list/composite plus interval partitioning”

Oracle has the most mature partitioning feature set in the commercial space:

  • Strategies. Range, hash, list, plus composite partitioning (range-hash, range-list, list-hash, list-range, list-list) and interval partitioning where new partitions are auto-created as data arrives.
  • Pruning. Static (plan-time) and dynamic (execute-time, for bind variables and join keys). Documented in the Oracle Database VLDB and Partitioning Guide.
  • Indexes. Both local and global indexes. Global indexes contain (key, ROWID, partition_no) so they can resolve rows across partitions; index-organized tables get special treatment.
  • Partition-wise join. Supported, both full (both sides identically partitioned) and partial (one side partitioned, serial broadcast or hash redistribution of the other).

CUBRID’s partitioning is closest in spirit to MySQL’s: range, hash, and list strategies; pruning at plan time only; local indexes; no partition-wise joins; no composite partitioning. But CUBRID’s implementation is closer to PostgreSQL’s: each partition is a real CUBRID class (so it has its own OID, HFID, attributes inherited from the parent, and its own catalog entries), and partition pruning is a server-side helper invoked by the executor when it opens an access spec rather than a parser-level rewrite.

The CUBRID-specific twist is that the master class — the “partitioned class” — is not just a logical parent like Postgres’s; it is a real class object that holds the partition rule in its SM_PARTITION member, and partition discovery walks the class-superclass chain (heap_get_class_supers) because each partition’s class has the master class as its single superclass. This reuses the OODB inheritance machinery described in cubrid-class-object.md rather than introducing a new catalog table for partitions. The cost is that a partition of a partitioned table looks, to the schema layer, like a subclass — and CUBRID consequently forbids ordinary subclassing on partitioned tables (ER_SM_NO_PARTITION_ON_HIERARCHIES).

CUBRID partitions a table by creating a master class (the partitioned table the user references in SQL) and N child classes, one per partition. The master class holds the partition rule (range bounds / hash count / list values) in its SM_PARTITION member; each child class also has a SM_PARTITION that says “I am partition i of class M with values V_i”. The DDL path in execute_schema.c materializes both. The runtime path in partition.c reads them, builds a PRUNING_CONTEXT, and uses it to (a) eliminate partitions before scan generation, (b) compute the destination partition for each inserted row, (c) compute the partition that owns a given index key.

The whole subsystem is server-side: partition.c is wrapped in #if !defined (SERVER_MODE) && !defined (SA_MODE) guards (see partition_sr.h:26-28) and is called from query_executor.c, locator_sr.c, btree.c, and btree_load.c. The client-side DDL in execute_schema.c runs in client mode and writes the partition catalog through ordinary schema-template machinery.

The catalog model: SM_PARTITION on the master class

Section titled “The catalog model: SM_PARTITION on the master class”

The persistent partition descriptor is SM_PARTITION, defined in class_object.h:

// sm_partition — class_object.h
struct sm_partition
{
struct sm_partition *next; /* currently not used, always NULL */
const char *pname; /* partition name */
int partition_type; /* partition type (range, list, hash) */
DB_CLASS_PARTITION_TYPE class_partition_type;
/* partitioned class | partition class */
DB_SEQ *values; /* values for range and list partition types */
const char *expr; /* partition expression */
const char *comment;
};

SM_CLASS carries it as SM_PARTITION *partition (class_object.h:798). For the master class, partition is a single descriptor with class_partition_type = DB_PARTITIONED_CLASS, the strategy in partition_type (one of the three values from the DB_PARTITION_TYPE enum), the user’s partition expression in expr, and the partition predicate sequence in values. The partition predicate is a serialized FUNC_PRED (a regu-variable expression) that, when evaluated against a row, yields the value used to assign the row to a partition. For a child class partition->class_partition_type is DB_PARTITION_CLASS and values carries the per-partition data — for range it is [min, max), for list it is the explicit value set, for hash it is unused.

The DB_PARTITION_TYPE and DB_CLASS_PARTITION_TYPE enums are in storage_common.h:

// DB_PARTITION_TYPE — storage_common.h
typedef enum
{
DB_PARTITION_HASH = 0,
DB_PARTITION_RANGE,
DB_PARTITION_LIST
} DB_PARTITION_TYPE;
typedef enum
{
DB_NOT_PARTITIONED_CLASS = 0,
DB_PARTITIONED_CLASS = 1,
DB_PARTITION_CLASS = 2
} DB_CLASS_PARTITION_TYPE;

On disk the same information is split across two structures. The per-class summary on the partitioned class lives in OR_CLASSREP::has_partition_info (a single bit), and the actual per-partition tuple array materialized at server startup time is OR_PARTITION from object_representation_sr.h:

// or_partition — object_representation_sr.h
typedef struct or_partition OR_PARTITION;
struct or_partition
{
OID class_oid; /* class OID */
HFID class_hfid; /* class HFID */
int partition_type; /* partition type (range, list, hash) */
REPR_ID rep_id; /* class representation id */
DB_SEQ *values; /* values for range and list partition types */
};

This is the runtime form the server hands to partition.c. Conceptually OR_PARTITION[0] is the master (the partitioned class itself) and OR_PARTITION[1..N] are the partitions. The master entry carries the serialized partition predicate in its values (see partition_load_partition_predicate); the partition entries carry the partition’s data (range bounds or list values) in values. The bridge between the SM form (in the client workspace) and the OR form (on the server) is the heap_get_class_partitions helper called from partition_load_pruning_context.

classDiagram
    class SM_CLASS {
        SM_PARTITION* partition
        DB_OBJLIST* users
    }
    class SM_PARTITION {
        const char* pname
        int partition_type
        DB_CLASS_PARTITION_TYPE class_partition_type
        DB_SEQ* values
        const char* expr
    }
    class OR_PARTITION {
        OID class_oid
        HFID class_hfid
        int partition_type
        REPR_ID rep_id
        DB_SEQ* values
    }
    class PRUNING_CONTEXT {
        OID root_oid
        OR_PARTITION* partitions
        int count
        DB_PARTITION_TYPE partition_type
        ATTR_ID attr_id
        func_pred* partition_pred
    }
    SM_CLASS *-- SM_PARTITION
    SM_PARTITION ..> OR_PARTITION : serialized to disk and reloaded
    OR_PARTITION ..> PRUNING_CONTEXT : array, [0]=master, [1..N]=partitions

Two design choices stand out. First, partitions are real CUBRID classes — each child class has its own OID, its own HFID, its own attribute representations, its own indexes — and they are linked to the master class by single inheritance (SM_CLASS::users on the master, SM_CLASS::inheritance on the partition). Pruning discovery exploits this: partition_find_root_class_oid calls heap_get_class_supers and treats the single super as the partitioned class root. Second, the partition predicate is stored as a serialized regu-variable expression rather than as a column-id plus operator. This is more flexible (you can PARTITION BY HASH (YEAR(birthdate)) and the predicate is a function call) but it means pruning has to deserialize and evaluate it through stx_map_stream_to_func_pred and fetch_peek_dbval rather than do a fast attribute compare.

The DDL path: do_create_partition and friends

Section titled “The DDL path: do_create_partition and friends”

When the user writes CREATE TABLE t (...) PARTITION BY HASH (id) PARTITIONS 8, the parser builds a PT_CREATE_ENTITY whose partition_info describes the strategy and (for HASH) the partition count. The schema executor calls do_create_partition (in execute_schema.c:4036+) which:

  1. Fetches the master class (au_fetch_class) so it has the attribute list and OID handy.
  2. Rejects the partitioning DDL if the master class is part of an inheritance hierarchy: if (smclass->users != NULL || smclass->inheritance != NULL) { … ER_SM_NO_PARTITION_ON_HIERARCHIES }. This is the constraint that prevents a partitioned table from being subclassed and a subclass from being partitioned.
  3. For HASH: synthesizes the partition names <class>__p__p<i> for i in [0, hashsize), and for each one calls dbt_create_class and do_create_local to materialize the partition class. PARTITIONED_SUB_CLASS_TAG is the delimiter "__p__" that lets later code grep partition names.
  4. For RANGE/LIST: walks the explicit parts list from the parse tree, validates that range bounds are monotonic and disjoint (or that list values across partitions are pairwise disjoint), and creates one class per partition entry.
  5. Each created partition class is linked to the master via single inheritance (partition_parent_atts = smclass->attributes), gets its own SM_PARTITION record with class_partition_type = DB_PARTITION_CLASS, and finally is committed through the schema-template flow.

Companion DDL helpers:

  • do_alter_partitioning_pre / do_alter_partitioning_post — pair of pre/post-pruning hooks that run before and after the actual ALTER TABLE body, e.g., for ADD PARTITION, DROP PARTITION, COALESCE PARTITION, REORGANIZE PARTITION.
  • do_remove_partition_pre / _post — drops a partition and detaches it from the master.
  • do_coalesce_partition_pre / _post — collapses N HASH partitions to N-K (rehashes existing data).
  • do_reorganize_partition_pre / _post — restructures range or list partitions, redistributing data.
  • do_create_partition_constraints and do_create_partition_constraint — replicate indexes and constraints from the master to each partition (since CUBRID uses local indexes; see below).

The DDL writes the partition info into the master class’s SM_PARTITION member and into each child class’s own member; both ride through the standard schema-template flush, so the catalog records (_db_class, _db_partition if present, and the representation chain in _db_class_repr) are updated atomically. There is no special-purpose partition catalog table — the data is all on _db_class.partition (a DB_SEQ literal embedded in the class record) plus the inheritance edges.

When an executor wants to scan, insert, or update a partitioned table, it builds a server-side PRUNING_CONTEXT. The struct lives in partition_sr.h:

// pruning_context — partition_sr.h
struct pruning_context
{
THREAD_ENTRY *thread_p;
OID root_oid; /* OID of the root class */
REPR_ID root_repr_id;
access_spec_node *spec; /* class to prune */
val_descr *vd;
DB_PARTITION_TYPE partition_type; /* hash, range, list */
OR_PARTITION *partitions; /* partitions array, [0] is master */
OR_PARTITION *selected_partition;
SCANCACHE_LIST *scan_cache_list;
int count; /* number of entries (master+children) */
xasl_unpack_info *fp_cache_context;
func_pred *partition_pred; /* deserialized partition expression */
int attr_position; /* attribute position in index key */
ATTR_ID attr_id; /* attribute id of partition key */
HEAP_CACHE_ATTRINFO attr_info;
int error_code;
int pruning_type; /* DB_PARTITIONED_CLASS | DB_PARTITION_CLASS */
bool is_attr_info_inited;
bool is_from_cache;
};

partitions[0] is the master, partitions[1..count-1] are the partitions, partition_pred is the deserialized regu-variable that computes the partition key, attr_id is the column id of the partition key (extracted from the partition predicate by walking the regu tree), and attr_info is the heap attribute cache used to evaluate the predicate against a RECDES.

The lifecycle is clear:

  1. partition_init_pruning_context zeroes the struct.
  2. partition_load_pruning_context populates it. First it tries to load from a per-server hash cache (db_Partition_Ht, keyed by root class OID, with PARTITION_CACHE_SIZE = 200). On miss it calls heap_get_class_partitions to read the OR_PARTITION array from disk, then partition_load_partition_predicate to deserialize the regu tree from partitions[0].values[2] (a serialized FUNC_PRED stream), then it caches the result.
  3. The caller does pruning (per spec, per insert, per update, per index lookup).
  4. partition_clear_pruning_context releases scan caches and the deserialized predicate.

The cache is essential for INSERT INTO t SELECT ... FROM t2 style bulk DML: without it every inserted row would re-read the catalog. The cache hash table is created by partition_cache_init at server boot and walked on schema invalidation by partition_decache_class.

Pruning at optimize time: partition_prune_spec

Section titled “Pruning at optimize time: partition_prune_spec”

The principal optimizer-side entry point is partition_prune_spec in partition.c:3313. It is called from query_executor.c:8656 (regular scan preparation) and from query_executor.c:13361 (parallel-query class lookup), and from the parallel px_heap_scan task. The contract: given an access_spec_node whose pruning_type == DB_PARTITIONED_CLASS, walk spec->where_pred, spec->where_key, and spec->where_range against the partition descriptor; produce a PARTITION_SPEC_TYPE linked list in spec->parts listing the surviving partitions.

// partition_prune_spec — partition.c
int
partition_prune_spec (THREAD_ENTRY * thread_p, val_descr * vd, access_spec_node * spec)
{
...
if (spec->pruning_type != DB_PARTITIONED_CLASS) { spec->pruned = true; return NO_ERROR; }
...
partition_load_pruning_context (thread_p, &ACCESS_SPEC_CLS_OID (spec), spec->pruning_type, &pinfo);
...
pruningset_init (&pruned, PARTITIONS_COUNT (&pinfo));
pruningset_set_all (&pruned);
if (pinfo.spec->where_pred != NULL)
{
pruningset_init (&pruned_pred, PARTITIONS_COUNT (&pinfo));
status = partition_match_pred_expr (&pinfo, pinfo.spec->where_pred, &pruned_pred);
if (status != MATCH_NOT_FOUND)
pruningset_intersect (&pruned, &pruned_pred);
}
if (pinfo.spec->where_key != NULL) { /* same idea, intersect */ }
if (pinfo.spec->where_range != NULL) { /* same idea, intersect */ }
error = pruningset_to_spec_list (&pinfo, &pruned);
...
spec->pruned = true;
}

The bitset pruned starts as “all partitions kept”. Each of the three predicate slots — where_pred (regular WHERE), where_key (key-range filter for index scans), where_range (MVCC-reevaluation predicate for UPDATE/DELETE) — contributes a bitset; the result is the intersection. If a slot returns MATCH_NOT_FOUND (predicate doesn’t reference the partition key), it is not intersected — its bit pattern is treated as “no information”, which is the safe default. pruningset_to_spec_list then walks the bitset and produces a PARTITION_SPEC_TYPE * chain in spec->parts, copying (class_oid, class_hfid, btid) for each surviving partition. For index scans, it also calls heap_get_index_with_name to find the local index with the same name on each partition.

The bitset itself (PRUNING_BITSET) is hand-rolled at partition.c:70-83, sized for MAX_ELEMENTS = 1024 partitions (matching the MAX_PARTITIONS = 1024 macro in partition.h). The operations are unsurprising — pruningset_init, _set_all, _add, _remove, _intersect, _union, _is_set, _popcount, plus an iterator. The encoding is one bit per partition; partition i corresponds to OR_PARTITION[i+1] because index 0 is reserved for the master.

partition_match_pred_expr walks the predicate tree:

  • T_PRED with B_AND: recurse on both children, intersect the resulting bitsets. If one side is MATCH_NOT_FOUND, take the other side’s bitset alone. (Because A AND ? is at most A — anything A excludes is excluded; whatever ? would exclude is a free additional reduction.)
  • T_PRED with B_OR: recurse on both children, union the resulting bitsets. If either side is MATCH_NOT_FOUND, return MATCH_NOT_FOUND (because A OR ? is at least A and ? could be anything). This is the conservative-OR rule.
  • T_EVAL_TERM / T_COMP_EVAL_TERM: a comparison lhs op rhs. If lhs matches the partition expression, prune by rhs; if rhs matches, prune by lhs. If neither matches but the partition expression is a function (e.g., YEAR(date)) and one side of the predicate contains the partition expression as a sub-tree, fall through to partition_prune_for_function to try a tighter analysis.
  • T_EVAL_TERM / T_ALSM_EVAL_TERM: handles IN (...) / NOT IN (...) and = ANY (...) / <> ANY (...). The op is rewritten to PO_IN / PO_NOT_IN based on the eq_flag.
  • T_LIKE_EVAL_TERM, T_RLIKE_EVAL_TERM, T_NOT_TERM: returns MATCH_NOT_FOUND. CUBRID does not currently extract a range from a LIKE pattern even when one is feasible (e.g., name LIKE 'A%' over a list-partitioned table).

The leaf operation is partition_prune_db_val, which dispatches on partition_type:

// partition_prune_db_val — partition.c
static MATCH_STATUS
partition_prune_db_val (PRUNING_CONTEXT * pinfo, const DB_VALUE * val,
const PRUNING_OP op, PRUNING_BITSET * pruned)
{
switch (pinfo->partition_type)
{
case DB_PARTITION_HASH: return partition_prune_hash (pinfo, val, op, pruned);
case DB_PARTITION_RANGE: return partition_prune_range (pinfo, val, op, pruned);
case DB_PARTITION_LIST: return partition_prune_list (pinfo, val, op, pruned);
default: return MATCH_NOT_FOUND;
}
}

partition_prune_hash is the simplest:

// partition_prune_hash — partition.c (excerpted)
case PO_EQ:
if (TP_DOMAIN_TYPE (col_domain) != DB_VALUE_TYPE (val_p))
{
/* coerce val_p into the partition column's domain */
if (tp_value_cast (val_p, &val, col_domain, false) == DOMAIN_INCOMPATIBLE)
{ /* impossible; treat as no-match */ er_clear (); status = MATCH_OK; }
}
else
pr_clone_value (val_p, &val);
idx = mht_get_hash_number (hash_size, &val);
pruningset_add (pruned, idx);
status = MATCH_OK;
break;
case PO_IN:
/* iterate through values in the IN-list, hash each, set its bit */
case PO_IS_NULL:
pruningset_add (pruned, 0); /* first partition holds NULLs */
break;
default:
status = MATCH_NOT_FOUND;

The key call is mht_get_hash_number (hash_size, &val) — the generic CUBRID hash function (the same one used for hash-tables in mht.c). For =, exactly one bit is set; for IN, one bit per IN-list element; for < > <= >= and other ops the result is “don’t know” so the hash type returns MATCH_NOT_FOUND and all partitions stay alive. NULLs are pinned to partition 0 by convention.

partition_prune_range is the most involved because it must reason about interval overlap. For each partition i, it reads min and max from partitions[i+1].values[0..1] (where min == NULL means MINVALUE — first partition — and max == NULL means MAXVALUE — last partition). It then compares val against both:

// partition_prune_range — partition.c (excerpted)
rmin = (min is null ? DB_LT : tp_value_compare (&min, val, 1, 1));
rmax = (max is null ? DB_LT : tp_value_compare (val, &max, 1, 1));
switch (op)
{
case PO_EQ:
/* min <= val < max */
if ((rmin == DB_EQ || rmin == DB_LT) && rmax == DB_LT)
pruningset_add (pruned, i), goto cleanup; /* exactly one fits */
break;
case PO_LT:
/* val < max for any partition where min < val */
if (rmin == DB_LT) pruningset_add (pruned, i);
break;
case PO_LE: ...
case PO_GT: ...
case PO_GE: ...
case PO_IS_NULL:
if (DB_IS_NULL (&min)) pruningset_add (pruned, i), goto cleanup;
break;
}

The range interval is [min, max) — left-closed, right-open. For PO_GT (strict >), the code calls partition_decrement_value on max first to convert the half-open interval to a fully closed comparison. The trick is that > on a value that equals max should not match the partition [min, max), but the bound comparison wants val < max; by decrementing max to max - 1 (only valid for integer-like and date-like types) the comparison stays correct. For types where decrement fails the function returns false and the code falls through to the default rule. For collection-typed values (val IN (1, 2, 3) etc.), the function recursively prunes by each element and unions the per-element bitsets (or intersects for = since equality with a collection is strictly impossible — but the pre-existing code path handles it defensively).

partition_prune_list walks the per-partition value set and asks membership questions:

// partition_prune_list — partition.c (excerpted)
case PO_EQ:
if (db_set_ismember (part_collection, (DB_VALUE *) val))
{
pruningset_add (pruned, i);
return MATCH_OK; /* values are disjoint, no need to look further */
}
break;
case PO_IN:
/* perform intersection of part_collection and val_collection;
if non-empty, this partition qualifies */
for (j = 0; j < size; j++)
{
db_set_get (part_collection, j, &col);
if (db_set_ismember (val_collection, &col))
{ pruningset_add (pruned, i); break; }
}
case PO_IS_NULL:
if (db_set_has_null (part_collection))
pruningset_add (pruned, i), return MATCH_OK;
case PO_NE: case PO_LT: case PO_LE: case PO_GT: case PO_GE:
return MATCH_NOT_FOUND; /* list partitions are unordered */

The disjointness optimization in PO_EQ (“no need to look any further”) is key: list partitions, by definition, have pairwise disjoint value sets (the DDL enforces this), so the first match is also the only match. Comparison operators other than = and IN always return MATCH_NOT_FOUND because list partitioning imposes no order on the value set.

flowchart TD
    Start[partition_prune_spec called] --> Init[partition_load_pruning_context<br/>cache hit ? load from cache : read OR_PARTITION array]
    Init --> Bitset[pruned = all_ones bitset]
    Bitset --> WhereP{where_pred?}
    WhereP -->|yes| MatchP[partition_match_pred_expr<br/>walks T_PRED tree]
    MatchP --> Intersect1[pruned = pruned ∩ pruned_pred]
    WhereP -->|no| WhereK
    Intersect1 --> WhereK{where_key?}
    WhereK -->|yes| MatchK[partition_match_pred_expr<br/>on key filter]
    MatchK --> Intersect2[pruned = pruned ∩ pruned_key]
    WhereK -->|no| WhereR
    Intersect2 --> WhereR{where_range?}
    WhereR -->|yes| MatchR[partition_match_pred_expr<br/>on MVCC reeval predicate]
    MatchR --> Intersect3[pruned = pruned ∩ pruned_range]
    WhereR -->|no| ToList
    Intersect3 --> ToList[pruningset_to_spec_list<br/>materialize PARTITION_SPEC_TYPE chain]
    ToList --> SpecParts[access_spec.parts = chain]
    SpecParts --> Done[spec->pruned = true]

Per-row partition resolution at INSERT/UPDATE

Section titled “Per-row partition resolution at INSERT/UPDATE”

For DML, pruning must produce a single partition: the row’s target heap. This is partition_find_partition_for_record in partition.c:3465. It is wrapped by partition_prune_insert and partition_prune_update to provide INSERT-side and UPDATE-side semantics.

// partition_find_partition_for_record — partition.c (excerpted)
static int
partition_find_partition_for_record (PRUNING_CONTEXT * pinfo, const OID * class_oid,
RECDES * recdes, OID * partition_oid, HFID * partition_hfid)
{
...
/* lazy-init heap attribute cache for partition column */
if (pinfo->is_attr_info_inited == false)
{
heap_attrinfo_start (pinfo->thread_p, &pinfo->root_oid, 1, &pinfo->attr_id, &pinfo->attr_info);
partition_set_cache_info_for_expr (pinfo->partition_pred->func_regu, pinfo->attr_id, &pinfo->attr_info);
pinfo->is_attr_info_inited = true;
}
/* read the partition column value out of the record */
repr_id = or_rep_id (recdes);
or_set_rep_id (recdes, pinfo->root_repr_id); /* read as if for the master */
heap_attrinfo_read_dbvalues (pinfo->thread_p, ..., recdes, &pinfo->attr_info);
or_set_rep_id (recdes, repr_id); /* restore */
/* evaluate the partition expression */
fetch_peek_dbval (pinfo->thread_p, pinfo->partition_pred->func_regu, ..., &result);
if (db_value_is_null (result)) op = PO_IS_NULL;
/* prune by the resulting value */
status = partition_prune_db_val (pinfo, result, op, &pruned);
count = pruningset_popcount (&pruned);
if (status != MATCH_OK || count != 1)
return ER_PARTITION_NOT_EXIST; /* row has no home */
pos = pruningset_iterator_next (&it);
COPY_OID (partition_oid, &pinfo->partitions[pos + 1].class_oid);
HFID_COPY (partition_hfid, &pinfo->partitions[pos + 1].class_hfid);
/* rewrite the record's REPR_ID to the partition's representation */
if (!OID_EQ (class_oid, partition_oid))
or_set_rep_id (recdes, pinfo->partitions[pos + 1].rep_id);
}

Three interesting moments:

  1. Read with the master REPR_ID, write with the partition REPR_ID. The record arrives with whichever REPR_ID the caller built it under. To get the partition column out, the function temporarily flips the REPR_ID to the master class’s so the heap-attribute machinery decodes the layout. After the evaluation, it flips it back. Then, before returning, if the target partition has a different REPR_ID, it overwrites the REPR_ID bits in place. The comment in the source explains why this shortcut is safe: partitions inherit their attribute layout from the master, so the disk image is identical except for the REPR_ID bits.
  2. NULL handling: PO_IS_NULL op. A NULL result of the partition expression is pruned with PO_IS_NULL, which routes to partition 0 for hash, to whichever range partition has min == NULL (the MINVALUE partition) for range, and to a list partition that explicitly accepts NULL for list. If no partition accepts NULL the function returns ER_PARTITION_NOT_EXIST.
  3. Exactly-one assertion. For INSERT, more than one partition matching means the partition descriptor is corrupt; zero matching means the row has no home. Both are ER_PARTITION_NOT_EXIST.

partition_prune_insert is a thin wrapper that loads the pruning context (cached or fresh), calls partition_find_partition_for_record, and — for the DB_PARTITION_CLASS pruning type (caller is targeting a specific partition like INSERT INTO t__p__p3 VALUES …) — verifies that the computed partition matches the targeted one, otherwise returns ER_INVALID_DATA_FOR_PARTITION.

partition_prune_update is similar but starts from the partition class OID instead of the master OID (because UPDATE always operates on a specific partition’s heap, not on the master). It calls partition_find_root_class_oid to climb to the master, loads the pruning context, computes the destination partition, and reports it back to the caller. The locator (locator_update_force / locator_move_record) then decides whether to rewrite the row in place (same partition) or move it between partitions (different partition).

flowchart TD
    Insert[INSERT INTO t VALUES ...] --> Locator[locator_insert_force<br/>pruning_type != DB_NOT_PARTITIONED_CLASS]
    Locator --> Prune[partition_prune_insert]
    Prune --> Find[partition_find_partition_for_record]
    Find --> Read[heap_attrinfo_read_dbvalues<br/>read partition column with master REPR_ID]
    Read --> Eval[fetch_peek_dbval on partition_pred]
    Eval --> Dispatch{partition_type}
    Dispatch -->|HASH| Hash[mht_get_hash_number mod N → idx]
    Dispatch -->|RANGE| Range[binary search for [min, max)]
    Dispatch -->|LIST| List[for each partition: db_set_ismember]
    Hash --> One[exactly one bit must be set]
    Range --> One
    List --> One
    One -->|yes| Rewrite[or_set_rep_id to partition's REPR_ID]
    Rewrite --> Cache[locator_get_partition_scancache for target heap]
    Cache --> Heap[heap_insert_logical into target partition's HFID]
    One -->|no| Err[ER_PARTITION_NOT_EXIST]

Per-partition scan dispatch at execute time

Section titled “Per-partition scan dispatch at execute time”

Once partition_prune_spec has populated spec->parts, the executor iterates partitions one at a time. The relevant access_spec_node fields are in xasl.h:

// access_spec_node — xasl.h
struct access_spec_node
{
...
int pruning_type; /* DB_PARTITIONED_CLASS | DB_NOT_PARTITIONED_CLASS | DB_PARTITION_CLASS */
...
#if defined (SERVER_MODE) || defined (SA_MODE)
SCAN_ID s_id; /* scan identifier */
PARTITION_SPEC_TYPE *parts; /* partitions of the current spec */
PARTITION_SPEC_TYPE *curent; /* current partition */
...
bool pruned; /* true if partition pruning has been performed */
#endif
};
// partition_spec_node — xasl.h
struct partition_spec_node
{
OID oid; /* class oid */
HFID hfid; /* class hfid */
BTID btid; /* index id */
PARTITION_SPEC_TYPE *next; /* next partition */
SCAN_STATS scan_stats;
};

The scan loop in the executor (in query_executor.c and scan_manager.c) re-opens the underlying SCAN_ID once per partition, walking the linked list parts via curent, and emits each partition’s tuples in turn. For an aggregate that collapses across all partitions, the executor treats the partitions as logically one stream; for a sort/hash that preserves per-partition identity, the scan stats are tracked per partition_spec_node (scan_stats).

The lock-acquisition step in query_executor.c:8680+ is important: after pruning, the executor walks spec->parts and takes an IS_LOCK (or IX_LOCK for UPDATE/DELETE) on each surviving partition’s class OID via lock_subclass. The unpruned partitions are not locked, which is one of the performance wins of pruning: if a 100-partition table has 99 partitions pruned, only one class lock is taken instead of 100.

Locator integration: routing DML to the partition heap

Section titled “Locator integration: routing DML to the partition heap”

locator_sr.c is the bridge between the heap layer (per-class HFIDs) and the partition layer. The relevant integration points:

  • locator_insert_force (locator_sr.c:4937+): when called with pruning_type != DB_NOT_PARTITIONED_CLASS, it calls partition_prune_insert to compute the target (real_class_oid, real_hfid). It then takes an IX-lock on the partition class via lock_subclass, fetches a per-partition scan cache via locator_get_partition_scancache, and routes the actual heap_insert_logical call to the partition’s heap rather than the master’s. The master class has no heap of its own (in the partitioned case), so an insert that bypassed the pruning step would fail.

  • locator_get_partition_scancache (locator_sr.c:12314+): this is the helper that backs partition_get_scancache / partition_new_scancache. Given a target partition OID and HFID, it either finds an existing PRUNING_SCAN_CACHE in the pruning context’s scan_cache_list or allocates a new one and starts a HEAP_SCANCACHE on it. The cache lives for the lifetime of the pruning context — typically the whole multi-row INSERT/UPDATE statement — so each partition pays the scan-cache setup cost only once.

  • locator_move_record (locator_sr.c:5295+): when an UPDATE on a partitioned table changes the partition key value such that the post-image belongs to a different partition than the pre-image, the locator must delete-and-re-insert the row in the new partition. This function does both atomically: it fetches the destination partition’s scan cache, calls locator_insert_force on the new heap, then locator_delete_force_for_moving on the old. The pruning layer has already rewritten the record’s REPR_ID to the destination partition’s, so the insert just lays the bytes down. The outermost driver (locator_update_force) calls partition_prune_update first to discover whether the partition assignment changed, and dispatches to locator_move_record only when it did (see locator_sr.c:5915, 5953).

  • locator_area_op_to_pruning_type (locator_sr.c:12372+): maps LC_FLUSH_INSERT_PRUNEDB_PARTITIONED_CLASS, LC_FLUSH_INSERT_PRUNE_VERIFYDB_PARTITION_CLASS, etc. The client-side workspace flush marks each dirty record with one of these LC_FLUSH_* codes based on whether the user inserted via the master or a specific partition; the server uses this to decide between full pruning and pruning-as-verification.

The unique-index integrity check in locator_sr.c:4139 is another pruning consumer: when checking a unique-index constraint on insert, the server uses partition_prune_unique_btid to find the partition that owns the unique key, then checks just that partition’s local index. This works because CUBRID uses local indexes — the partition that owns the row also owns the unique-index entry — and because the unique-index column must include the partition key when pruning is to identify a single partition. (Without that, uniqueness across partitions would not be enforceable locally.)

CUBRID currently uses local indexes on partitioned tables. When do_create_partition_constraints runs, it iterates over the master class’s constraints and creates a copy of each one on every partition. The B-tree IDs (BTIDs) are different on each partition — heap_get_index_with_name is the lookup that finds a partition’s local BTID given the master’s index name. The implication of local-only:

  • Cheap DDL. Adding/dropping a partition only affects that partition’s indexes; no global rebuild.
  • Cheap maintenance. Inserts and updates only touch the local BTID in the partition where the row lives.
  • No partition-wise unique enforcement without partition key in the unique columns. If the user defines UNIQUE(x) on a table partitioned by y, two different partitions could each have a row with the same x and the local-only enforcement would not catch it. CUBRID’s DDL therefore requires that the partition key is part of any UNIQUE or PRIMARY KEY on a partitioned table.
  • partition_prune_unique_btid is the helper that, given a unique-index search key, computes which partition owns the key (via the partition key value embedded in the search key) and rewrites (class_oid, hfid, btid) to the partition-local triple.

There is a feature flag partition_is_global_index in partition.c:4250, but it is wrapped in #if 0 (disabled). The function distinguishes between a primary-key index (which it would treat as global) and a non-primary-key unique index (whose globality depends on whether the index columns include the partition key). Today it is dead code. The current production behavior is “all indexes are local”.

query_planner.c shows several qo_partition / QO_PARTITION references — but these are planner partitions, not table-level partitions. In query_planner.c, QO_PARTITION is the planner’s term for a connected component of the join graph (Selinger-style partitioning of the query graph for join enumeration). Symbols like qo_search_partition, qo_combine_partitions, QO_PARTITION_NODES, QO_PARTITION_EDGES operate on join-graph partitions, not horizontal table partitions.

A grep over the optimizer for table-partition awareness comes up mostly empty. Partition pruning is run after plan generation, when the executor opens each access spec — not folded into the optimizer’s cost computation. Consequence: CUBRID’s optimizer plans a partitioned-table scan as a single scan against the master class, with pruning applied opaquely at runtime. There is no partition-wise join optimization. Two partitioned tables with the same partition shape get joined as if they were unpartitioned tables; the partitions of T1 each scan against all partitions of T2.

flowchart LR
    Parser[Parser] --> Resolve[Name resolution]
    Resolve --> SemCheck[Semantic check]
    SemCheck --> Opt[Optimizer<br/>cost-based plan]
    Opt --> XASL[XASL generation<br/>access_spec.pruning_type set]
    XASL --> Stream[xasl_to_stream → stream_to_xasl]
    Stream --> Exec[query_executor<br/>opens each spec]
    Exec --> Prune[partition_prune_spec<br/>fills spec->parts]
    Prune --> Lock[lock_subclass on each surviving partition]
    Lock --> ScanLoop[for each PARTITION_SPEC_TYPE in spec->parts:<br/>scan_open_scan, scan_next_scan, scan_close_scan]
    ScanLoop --> Output[merged tuple stream]

The diagram shows clearly that pruning is between XASL deserialization and scan execution, not folded into the plan itself. This is a deliberate trade-off: pruning is fast (a few hundred microseconds for 1024 partitions) and runs lazily, so the cost of “always plan as if all partitions are alive” is small in practice. The cost is that partition-wise join is not available — the planner does not see per-partition relations to combine.

Symbols are grouped by subsystem so that the reader can navigate by role rather than by file order.

  • SM_PARTITION (struct, class_object.h) — the in-memory partition descriptor on each SM_CLASS. Carries partition_type, class_partition_type, pname, expr, values. Per-class instance: SM_CLASS::partition.
  • OR_PARTITION (struct, object_representation_sr.h) — the server-side reloaded form: (class_oid, class_hfid, partition_type, rep_id, values). Materialized by heap_get_class_partitions (in heap_file.c).
  • OR_CLASSREP::has_partition_info — the single bit on each class representation that says “this class has partition info, load it”.
  • DB_PARTITION_TYPE (enum, storage_common.h)DB_PARTITION_HASH=0, DB_PARTITION_RANGE=1, DB_PARTITION_LIST=2.
  • DB_CLASS_PARTITION_TYPE (enum, storage_common.h)DB_NOT_PARTITIONED_CLASS=0, DB_PARTITIONED_CLASS=1, DB_PARTITION_CLASS=2. The first marks ordinary tables; the second marks the master class; the third marks a partition.
  • classobj_make_partition_info, classobj_free_partition_info, classobj_copy_partition_info — the SM_PARTITION life-cycle helpers (in class_object.c).
  • do_create_partition — entry for CREATE TABLE … PARTITION BY … and ALTER TABLE … ADD PARTITION and ALTER TABLE … ADD HASH PARTITION. Materializes the partition classes, links them to the master via inheritance, fills the SM_PARTITION records.
  • do_alter_partitioning_pre / do_alter_partitioning_post — pre/post hook pair that runs around the schema-template flush for partition-related ALTER variants.
  • do_remove_partition_pre / _post — drops a partition.
  • do_coalesce_partition_pre / _post — collapses N HASH partitions to fewer (rehashes rows).
  • do_reorganize_partition_pre / _post — restructures RANGE/LIST partitions (redistribution).
  • do_create_partition_constraints / do_create_partition_constraint — replicates indexes and constraints from the master to each partition.
  • SM_PARTITION_ALTER_INFO (struct) — context bundle passed through the alter-partition pipeline (root class OP, key column, promoted partition names).
  • PART_CLASS_INFO (struct) — per-partition scratch info during create (name, template, object).
  • PARTITION_VARCHAR_LEN, PARTITIONED_SUB_CLASS_TAG ("__p__") — naming-policy constants.

Per-row partition resolution (server-side, partition.c)

Section titled “Per-row partition resolution (server-side, partition.c)”
  • partition_prune_insert — outer entry for INSERT-side routing. Loads/uses a PRUNING_CONTEXT, calls partition_find_partition_for_record, returns (pruned_class_oid, pruned_hfid).
  • partition_prune_update — outer entry for UPDATE-side routing. Climbs to the root class first (partition_find_root_class_oid), then same as INSERT.
  • partition_find_partition_for_record (static) — the heart of per-row routing. Reads the partition column out of the record, evaluates the partition expression, prunes by the result, returns the unique target partition.
  • partition_find_root_class_oid — given any class OID (master or partition), returns the master class OID. Uses heap_get_class_supers.
  • partition_prune_unique_btid — given a unique-index search key, returns the partition that owns the key plus the partition-local BTID. Used by btree.c for unique-constraint enforcement and by query_executor.c for unique scans.
  • partition_prune_partition_index — given a search key and the master’s index BTID, returns the position (index) of the partition that owns the key. Backs partition_prune_unique_btid.
  • partition_attrinfo_get_key — extracts the partition-key DB_VALUE out of a (possibly multi-column) index search key.

Pruning at optimize time (server-side, partition.c)

Section titled “Pruning at optimize time (server-side, partition.c)”
  • partition_prune_spec — outer entry for scan-side pruning. Applies where_pred ∩ where_key ∩ where_range to the partition descriptor, fills spec->parts with the surviving partitions.
  • partition_match_pred_expr (static, recursive) — walks a PRED_EXPR tree (T_PRED / T_EVAL_TERM / T_COMP_EVAL_TERM / T_ALSM_EVAL_TERM / T_NOT_TERM / T_LIKE_EVAL_TERM) and returns a bitset of surviving partitions plus a MATCH_STATUS.
  • partition_match_index_key — pruning specialization for index key ranges.
  • partition_match_key_range — handles KEY_RANGE lower and upper bounds for index scans.
  • partition_prune — leaf wrapper that resolves a REGU_VARIABLE to a DB__VALUE (via partition_get_value_from_regu_var) and then calls partition_prune_db_val.
  • partition_prune_db_val — strategy dispatch: HASH → partition_prune_hash, RANGE → partition_prune_range, LIST → partition_prune_list.
  • partition_prune_hashmht_get_hash_number (hash_size, &val) → bit position.
  • partition_prune_range — interval comparison against (min, max) per partition; uses partition_decrement_value to convert (min, max] to (min, max) for > comparisons.
  • partition_prune_listdb_set_ismember per partition; short-circuits on first match because list partitions are pairwise disjoint.
  • partition_prune_for_function — pruning for predicates whose left or right side contains the partition expression as a sub-expression (e.g., YEAR(date) BETWEEN 2020 AND 2024 when partitioning by YEAR(date)).
  • partition_supports_pruning_op_for_function — lookup table that says which (op, function) combinations the function-pruning logic can handle.
  • partition_do_regu_variables_match / partition_do_regu_variables_contain — structural matchers on regu trees: “are these the same partition expression?” vs. “does this regu tree contain the partition expression?”.
  • partition_get_value_from_key, partition_get_value_from_inarith, partition_get_value_from_regu_var — three flavors of “pull a DB_VALUE out of a regu node” depending on shape.
  • partition_is_reguvar_const — whether a regu variable is a constant value (and therefore pruning-safe).
  • partition_decrement_value — type-aware integer/date decrement, used to massage range comparisons.
  • partition_get_attribute_id — extracts the underlying attribute id out of a partition-expression regu tree. Used during context load to populate PRUNING_CONTEXT::attr_id.
  • partition_set_cache_info_for_expr / partition_set_cache_dbvalp_for_attribute — wire the heap-attribute cache into the regu tree before evaluation, so fetch_peek_dbval can read the column value out of pinfo->attr_info rather than via a fresh disk fetch.
  • partition_get_position_in_key — for index pruning, finds the position of the partition column within the index key column list.
  • partition_rel_op_to_pruning_opREL_OPPRUNING_OP enum mapping (R_LTPO_LT, R_EQPO_EQ, etc.).

Partition descriptor lifecycle (server-side, partition.c)

Section titled “Partition descriptor lifecycle (server-side, partition.c)”
  • partition_init_pruning_context — zero-init.
  • partition_load_pruning_context — populate (try cache first; on miss read OR_PARTITION array from heap and cache).
  • partition_clear_pruning_context — release scan caches and predicate.
  • partition_load_partition_predicate / partition_free_partition_predicate — deserialize and free the partition expression (stx_map_stream_to_func_pred).
  • partition_set_specified_partition — when caller explicitly named a partition (DB_PARTITION_CLASS mode), point selected_partition at it.
  • PARTITION_CACHE_ENTRY (struct) — per-class cache entry.
  • partition_cache_init / partition_cache_finalize / partition_decache_class — process-wide hash table of cached pruning contexts (db_Partition_Ht, PARTITION_CACHE_SIZE = 200).
  • partition_cache_pruning_context / partition_load_context_from_cache / partition_cache_entry_to_pruning_context / partition_pruning_context_to_cache_entry — bidirectional conversion.
  • partition_free_cache_entry / partition_free_cache_entry_kv — destructors.
  • PRUNING_SCAN_CACHE (struct, partition_sr.h) — wraps a HEAP_SCANCACHE plus function-index unpack info per partition.
  • SCANCACHE_LIST (struct) — singly-linked list of PRUNING_SCAN_CACHE hanging off PRUNING_CONTEXT.
  • partition_get_scancache — find an existing scan cache by partition OID.
  • partition_new_scancache — create and prepend a new one.
  • locator_get_partition_scancache (locator_sr.c:12314+) — full-service wrapper used by locator_insert_force / locator_move_record: find or create, and start the inner HEAP_SCANCACHE if new.
  • PRUNING_BITSET (struct)unsigned int set[]; plus count, hand-rolled bitset for up to 1024 partitions.
  • PRUNING_BITSET_ITERATOR — companion iterator.
  • pruningset_init / _set_all / _copy / _add / _remove / _intersect / _union / _is_set / _popcount — bitset arithmetic.
  • pruningset_iterator_init / pruningset_iterator_next — iteration.
  • pruningset_to_spec_list — bitset → linked list of PARTITION_SPEC_TYPE for the executor.
  • access_spec_node (struct, xasl.h) — the optimizer’s access spec; carries pruning_type, pruned, parts (linked list of surviving partitions), curent (current iterator position).
  • partition_spec_node / PARTITION_SPEC_TYPE (struct, xasl.h)(oid, hfid, btid, next, scan_stats).
  • prepare_mvcc_reev_data (in query_executor.c) — calls partition_prune_spec and lock_subclass per partition before scan open.
  • qexec_clear_partition_expression (referenced from partition_free_partition_predicate) — releases the deserialized partition expression’s regu tree.
  • locator_insert_force — inserts a record; calls partition_prune_insert if the class is partitioned and routes the heap insert to the partition.
  • locator_update_force — updates a record; calls partition_prune_update and dispatches to locator_move_record if the partition assignment changed.
  • locator_move_record — delete-then-insert across two partitions atomically.
  • locator_get_partition_scancache — see above.
  • locator_area_op_to_pruning_typeLC_FLUSH_*DB_*_PARTITIONED_CLASS enum mapping.
  • FOR_INSERT_OR_DELETE / FOR_MOVE — internal enum that tags a delete-and-reinsert pair as a partition move (so MVCC treats the new row as a continuation rather than a fresh insert for some bookkeeping purposes).
  • redistribute_partition_data — bulk path used by REORGANIZE PARTITION.
  • btree.c::partition_prune_unique_btid callers (line 6080, 22719) — uniqueness checks on insert.
  • btree_load.c::partition_prune_partition_index caller (line 4559) — bulk index build sees per-partition keys.
  • btree.c::partition_load_pruning_context caller (line 22707) — referencing-side foreign-key lookup needs the FK’s pruning context.
  • statistics_sr.c::partition_get_partition_oids caller (line 198) — analyze runs per partition.
  • system_catalog.c::partition_get_partition_oids caller (line 5449) — catalog accessor.
  • file_manager.c::partition_get_partition_oids caller (line 6854) — file-cleanup walks all partitions.
  • partition_load_aggregate_helper — for queries with aggregates over partitioned tables, builds an HIERARCHY_AGGREGATE_HELPER containing per-partition (HFID, BTID) arrays so the aggregate executor can iterate partitions in turn.
  • HIERARCHY_AGGREGATE_HELPER (in query_aggregate.hpp) — the recipient struct.

The following table records the symbol positions observed during this document’s updated: date. Symbol names are the canonical anchor; the line numbers age with refactors.

SymbolFileLine
MAX_PARTITIONS = 1024src/query/partition.h26
pruning_scan_cachesrc/query/partition_sr.h50
scancache_listsrc/query/partition_sr.h60
pruning_contextsrc/query/partition_sr.h67
partition_init_pruning_context (decl)src/query/partition_sr.h95
partition_load_pruning_context (decl)src/query/partition_sr.h97
partition_prune_spec (decl)src/query/partition_sr.h112
partition_prune_insert (decl)src/query/partition_sr.h114
partition_prune_update (decl)src/query/partition_sr.h118
partition_prune_unique_btid (decl)src/query/partition_sr.h122
partition_get_partition_oids (decl)src/query/partition_sr.h125
partition_load_aggregate_helper (decl)src/query/partition_sr.h128
partition_find_root_class_oid (decl)src/query/partition_sr.h135
partition_prune_partition_index (decl)src/query/partition_sr.h137
pruning_bitsetsrc/query/partition.c71
partition_cache_entrysrc/query/partition.c95
partition_init_pruning_contextsrc/query/partition.c2589
partition_load_pruning_contextsrc/query/partition.c2674
partition_clear_pruning_contextsrc/query/partition.c2805
partition_load_partition_predicatesrc/query/partition.c2862
partition_set_specified_partitionsrc/query/partition.c2929
partition_get_attribute_idsrc/query/partition.c3052
partition_get_position_in_keysrc/query/partition.c3125
partition_prune_specsrc/query/partition.c3313
partition_find_partition_for_recordsrc/query/partition.c3465
partition_prune_insertsrc/query/partition.c3603
partition_prune_updatesrc/query/partition.c3710
partition_get_scancachesrc/query/partition.c3831
partition_new_scancachesrc/query/partition.c3861
partition_get_partition_oidssrc/query/partition.c3891
partition_decrement_valuesrc/query/partition.c3976
partition_prune_unique_btidsrc/query/partition.c4064
partition_find_inherited_btidsrc/query/partition.c4106
partition_load_aggregate_helpersrc/query/partition.c4150
partition_attrinfo_get_keysrc/query/partition.c4337
partition_prune_partition_indexsrc/query/partition.c4441
partition_prune_listsrc/query/partition.c1086
partition_prune_hashsrc/query/partition.c1221
partition_prune_rangesrc/query/partition.c1340
partition_prune_db_valsrc/query/partition.c1545
partition_prunesrc/query/partition.c1574
partition_match_pred_exprsrc/query/partition.c1933
partition_prune_for_functionsrc/query/partition.c2105
partition_match_key_rangesrc/query/partition.c2406
pruningset_to_spec_listsrc/query/partition.c873
pruningset_init, etc.src/query/partition.c187-396
sm_partitionsrc/object/class_object.h728
SM_CLASS::partitionsrc/object/class_object.h798
or_partitionsrc/base/object_representation_sr.h196
DB_PARTITION_TYPEsrc/storage/storage_common.h1211
DB_CLASS_PARTITION_TYPEsrc/storage/storage_common.h1218
partition_spec_nodesrc/query/xasl.h1029
access_spec_node::pruning_typesrc/query/xasl.h1052
access_spec_node::partssrc/query/xasl.h1057
access_spec_node::prunedsrc/query/xasl.h1061
do_create_partitionsrc/query/execute_schema.c4036
do_alter_partitioning_pre (decl)src/query/execute_schema.c334
do_alter_partitioning_post (decl)src/query/execute_schema.c335
do_remove_partition_pre (decl)src/query/execute_schema.c337
do_coalesce_partition_pre (decl)src/query/execute_schema.c339
do_reorganize_partition_pre (decl)src/query/execute_schema.c341
do_create_partition_constraintssrc/query/execute_schema.c5728
SM_PARTITION_ALTER_INFOsrc/query/execute_schema.c195
partition_prune_spec callersrc/query/query_executor.c8656
partition_prune_spec parallel-path callersrc/query/query_executor.c13361
locator_insert_force partition branchsrc/transaction/locator_sr.c4969
locator_move_recordsrc/transaction/locator_sr.c5295
locator_update_force::partition_prune_updatesrc/transaction/locator_sr.c5915
locator_get_partition_scancachesrc/transaction/locator_sr.c12314
locator_area_op_to_pruning_typesrc/transaction/locator_sr.c12372
partition_prune_unique_btid (in btree.c)src/storage/btree.c6080, 22719
partition_prune_partition_index (in btree_load.c)src/storage/btree_load.c4559
partition_get_partition_oids (statistics)src/storage/statistics_sr.c198

This document was written after cubrid-class-object.md, cubrid-query-optimizer.md, cubrid-locator.md, and cubrid-catalog-manager.md. The points of contact and disagreement:

  • cubrid-class-object.md introduces SM_PARTITION as a member of SM_CLASS and notes that partitions are subclasses of the master class wired through the OODB inheritance machinery (SM_CLASS::users / SM_CLASS::inheritance). This document fills in the run-time half: how SM_PARTITION (an in-memory schema artifact) maps to OR_PARTITION (an on-disk reload artifact) and to PRUNING_CONTEXT (a request-scoped evaluation artifact). All three carry the partition rule, but in three different shapes and lifetimes.

  • cubrid-query-optimizer.md describes the optimizer-side job — XASL generation and cost-based plan selection — and observes that the optimizer’s notion of “partition” (as in qo_partition, QO_PARTITION) is the planner’s term for a connected component of the join graph, not a horizontal table partition. This document confirms the cross-check: CUBRID’s optimizer does not know about table partitions; pruning runs lazily, just before scan open, in partition_prune_spec. There is no partition-wise join.

  • cubrid-locator.md describes the locator’s MOP routing and per-class scan-cache infrastructure. The DML routing in this document — locator_insert_force calling partition_prune_insert, locator_move_record for partition moves, locator_get_partition_scancache reusing per-partition caches — slots cleanly into the locator narrative. The new detail here is that pruning is performed server-side with a full deserialized regu-variable expression, not client-side via the workspace. The client-side workspace code in locator_cl.c only sets mop->pruning_type = DB_PARTITIONED_CLASS on the master class’s MOP so that the flush stage emits LC_FLUSH_INSERT_PRUNE requests, which the server then routes through partition_prune_insert.

  • cubrid-catalog-manager.md describes how the schema flushes to _db_class records. SM_PARTITION data persists as a DB_SEQ literal embedded in the class object (no dedicated partition catalog table). The OR_PARTITION array is built at server side from the master class’s _db_class row and the _db_class_repr rows of the partitions; the partition predicate stream lives in the master partition’s values[2] slot. Confirmed against this document’s reading of partition_load_partition_predicate.

  • cubrid-btree.md describes the B-tree subsystem; this document adds the “B-trees on partitioned tables are local per partition” rule. partition_prune_unique_btid is the bridge that translates a search on the master’s logical BTID into a search on the right partition’s physical BTID.

  1. Subpartitioning / composite partitioning. Is a CUBRID partition itself allowed to be partitioned (Oracle-style range-hash composite, or MySQL’s HASH-within-RANGE)? The DDL forbids CREATE TABLE on a partition class (SM_NO_PARTITION_ON_HIERARCHIES), but this document did not trace whether ALTER TABLE partition__p__p3 PARTITION BY HASH(...) is reachable through some code path. Worth confirming with a small test (./csql -S and DDL).

  2. Online repartitioning. do_reorganize_partition_pre/post suggests there is a path that redistributes data across partitions when bounds change. Does it run online (without a full table lock) or offline? The redistribute_partition_data symbol in locator_sr.c:218 is the worth-tracing entry; does it use MVCC to serve readers while it copies, or does it take an exclusive lock?

  3. Foreign keys across partitions. When a partitioned table is the referencing side of a foreign key, the FK has to look up the referenced table’s index. btree_load.c line 4559 shows partition_prune_partition_index being used in an FK context on the referenced side. What about when the FK references a partitioned table (the referenced side is partitioned) and the index is local? Is the cross-partition uniqueness assumption documented? Is the FK guaranteed to find the referenced row?

  4. Partition pruning for parameterized predicates. This document focused on plan-time pruning, but partition_prune_spec is called from the executor every time the access spec is re-opened. Does CUBRID re-prune when a host variable bound to the partition key changes between executions of a prepared statement? The cache partition_load_context_from_cache argues yes (the context is reusable across executions, but the bitset is rebuilt fresh each time partition_prune_spec runs). A side-by-side comparison with PostgreSQL’s “execution-time pruning” (which also re-runs pruning per execution to catch parameter changes) would clarify whether CUBRID matches.

  5. The disabled partition_is_global_index. The function sits behind #if 0 in partition.c:4250. Is global-index support a planned future feature, or is it permanently dead code from a feature that was prototyped and abandoned? Annotating with a git log -p pass on that block would answer.

  6. partition_match_pred_expr and T_NOT_TERM. Currently any NOT predicate returns MATCH_NOT_FOUND (no pruning). The textbook treatment of partition pruning under negation is straightforward: De Morgan-rewrite, then prune. Is the current behavior a deliberate choice (the rewrite is unsafe in CUBRID’s predicate language) or just unfinished?

  • src/query/partition.c — server-side partition logic; pruning, per-row resolution, partition descriptor cache.
  • src/query/partition.hMAX_PARTITIONS = 1024.
  • src/query/partition_sr.hPRUNING_CONTEXT, PRUNING_SCAN_CACHE, SCANCACHE_LIST, public API.
  • src/optimizer/query_planner.c — checked for table-partition awareness; the optimizer’s QO_PARTITION is the join graph’s connected components, not table partitions; no partition-wise joins.
  • src/transaction/locator_sr.clocator_insert_force partition routing, locator_move_record, locator_get_partition_scancache, locator_area_op_to_pruning_type.
  • src/query/execute_schema.cdo_create_partition, do_alter_partitioning_*, do_remove_partition_*, do_coalesce_partition_*, do_reorganize_partition_*, do_create_partition_constraints.
  • src/object/class_object.c / src/object/class_object.hSM_PARTITION, SM_CLASS::partition, lifecycle helpers.
  • src/base/object_representation_sr.hOR_PARTITION (the on-disk-reload form).
  • src/storage/storage_common.hDB_PARTITION_TYPE, DB_CLASS_PARTITION_TYPE enums.
  • src/query/xasl.haccess_spec_node::pruning_type, access_spec_node::parts, partition_spec_node.
  • src/query/query_executor.cpartition_prune_spec callers, per-partition lock and scan dispatch.
  • src/storage/btree.c and src/storage/btree_load.cpartition_prune_unique_btid and partition_prune_partition_index consumers.

Theoretical references:

  • Silberschatz, Korth, Sudarshan. Database System Concepts (7th ed.), Ch. 22 (“Parallel and Distributed Storage”), §22.2 on horizontal partitioning strategies and §22.5 on partition pruning.
  • Petrov. Database Internals, Ch. 11 on distributed systems and the section on partition placement and partition awareness in query planning.
  • PostgreSQL documentation: “Table Partitioning” chapter in the PostgreSQL manual; the partprune.c source file in the PostgreSQL source tree for the prune algorithm.
  • MySQL documentation: “Partitioning” chapter in the MySQL Reference Manual; the sql/sql_partition.cc source file for the prune algorithm.
  • Oracle Database VLDB and Partitioning Guide (Oracle Database 19c or later) for range/hash/list/composite + interval partitioning.