CUBRID Query Rewrite — Pre-Optimization Tree Transformations and the LIMIT-Clause Case Study
Contents:
- Theoretical Background
- Common DBMS Design
- CUBRID’s Approach
- Source Walkthrough
- Cross-check Notes
- Open Questions
- Sources
Theoretical Background
Section titled “Theoretical Background”Query rewriting is the layer of query processing that lives between parsing (which only enforces syntax) and cost-based optimization (which enumerates physical plans and prices them). Its job is logical: take the parse tree the user wrote and produce an equivalent — but more amenable — parse tree, before the optimizer ever scores a plan. Database Internals (Petrov, Ch. 12 “Query Planning and Optimization”) frames it as the “rule-based” stage; Database System Concepts (Silberschatz et al., Ch. 16) calls it “transformation of relational expressions”. Either way, the contract is equivalence- preserving syntactic surgery.
Goetz Graefe’s The Cascades Framework for Query Optimization
(IEEE Data Eng. Bull., 1995) makes the dichotomy explicit:
transformation rules (rewrites that always improve, applied
unconditionally) versus implementation rules (alternatives
that the optimizer must cost). Predicate pushdown is the
canonical transformation — it is never worse to filter early —
so it lives in the rewrite layer. Join-order enumeration is the
canonical implementation — it has to be costed — so it lives in
the optimizer proper. CUBRID adheres to this split: the
optimizer’s rewriter/ directory implements transformations,
and query_planner.c / plan_generation.c implement
enumeration and costing.
The classic transformation catalogue:
- Predicate pushdown — move WHERE conjuncts as close to the scan as the schema allows. Reduces tuples earlier in the pipeline.
- View inlining (mq_translate, in CUBRID) — substitute a view’s defining query for the view reference, exposing the underlying tables to subsequent rewrites.
- Subquery flattening — turn an uncorrelated
IN (SELECT ...)into a join with a derived table, exposing the inner tables to join enumeration. - Outer-join reduction — promote a left/right outer join to an inner join when a predicate above it is null-rejecting on the optional side. Database Internals §12.3 walks through why this is safe.
- CNF normalisation — convert the WHERE filter to conjunctive normal form so each conjunct is a candidate for pushdown and for index-driven evaluation.
- Constant folding / equality reduction — replace
WHERE x = 5 AND ... x ...with5everywhere x dominated by that conjunct appears, exposing more constants for further folding. - Redundant-join elimination — drop a joined table whose columns are not referenced and whose join condition is FK→PK, i.e., the join cannot remove or duplicate rows.
- LIMIT pushdown / row-counting predicate lowering — the
topic of this document. The user-visible
LIMIT nbecomes a row-numbering predicate on the scan side, so early termination is expressible inside the existing predicate machinery rather than requiring a new “stop after n” plan operator.
The last item — LIMIT pushdown — is interesting because it is
not pure pushdown. A LIMIT n over an unsorted scan is trivially
expressible as INST_NUM() <= n on the scan’s tuple counter. A
LIMIT n after a sort is not a scan-side predicate; it must
sit on the sort output, where the counter is well-defined. A
LIMIT n after a GROUP BY similarly sits on the group output.
CUBRID encodes the choice via three different numbering
operators (PT_INST_NUM, PT_ORDERBY_NUM, PT_GROUPBY_NUM)
and routes the LIMIT into the right predicate slot
(where, orderby_for, or having) based on the surrounding
clauses. The lowering is a decision tree, walked once during
semantic checking and (for some shapes) revisited during
mq_rewrite.
Common DBMS Design
Section titled “Common DBMS Design”PostgreSQL’s subquery_planner (in src/backend/optimizer/plan/ planner.c) is structured as an ordered sequence of rewrite
passes: pull_up_sublinks → inline_set_returning_functions →
pull_up_subqueries → flatten_simple_union_all →
reduce_outer_joins → remove_useless_result_rtes →
preprocess_expression (which does constant folding,
canonicalize_qual, eval_const_expressions) → preprocess_qual_ conditions. LIMIT lives as parse->limitOffset /
parse->limitCount and is consumed by the executor’s
ExecLimit node — Postgres does not lower it into a row-
counting predicate at rewrite time; it stays a top-of-plan
operator throughout.
MySQL’s optimizer entrypoint JOIN::optimize calls
optimize_cond (a CNF-and-constant-folding pass), then
prune_partitions, then make_join_statistics,
optimize_keyuse, and finally cost-based join order. LIMIT n
in MySQL is also a top-of-plan operator (limit field on
SELECT_LEX), with two known exceptions where the rewrite layer
peeks at it: index-condition pushdown and the loose-index-scan
optimization for MIN/MAX over indexes — both of which can
short-circuit a scan when LIMIT is small. There is no general-
purpose lowering of LIMIT into a counting predicate.
Oracle’s view-merging (the heuristic version, before the cost-
based variant arrived in 10g) is the closest precedent for
CUBRID’s mq_translate: an inline-eligible view is substituted
into its parent query, exposing the underlying tables to the
optimizer. Oracle’s _complex_view_merging parameter and the
MERGE / NO_MERGE hints are the operator-visible knobs.
CUBRID’s choice — to lower LIMIT into a row-counting predicate during semantic checking, then re-run a rewrite pass over the resulting expression — sits between Postgres (“LIMIT stays structural”) and the index-scan-internal LIMIT optimizations (“LIMIT influences the bottom of the plan”). The choice has two consequences:
- LIMIT participates in CNF, constant folding, and auto-
parameterization for free, because by the time those
passes run, LIMIT is already a
PT_LEexpression with aPT_VALUEright-hand side. - The optimizer must un-recognise the lowered form when it
wants to apply the multi-range index-scan LIMIT
optimization. CUBRID does this in
qo_check_iscan_for_multi_ range_opt(inplan_generation.c), which inspectsquery->info.query.orderby_for— the post-lowering slot — to decide whether the LIMIT is in the right shape to drive the multi-range optimization at scan time.
CUBRID’s Approach
Section titled “CUBRID’s Approach”The query-processing pipeline that surrounds rewrite is a six-stage path from SQL text to executable XASL. Rewrite is distributed across stages 3 and 5, not localised in one pass:
flowchart LR SQL["SQL text"] --> LX["Lexer<br/>csql_lexer.l"] LX --> PR["Parser<br/>csql_grammar.y<br/>→ PT_NODE tree"] PR --> NR["Name resolution<br/>pt_resolve_names"] NR --> SC["Semantic check<br/>pt_semantic_check<br/>type-check + LIMIT lowering<br/>(pt_eval_type_pre)"] SC --> VT["View inlining<br/>mq_translate"] VT --> RW["Rewrite pass<br/>mq_rewrite (qo_rewrite_queries)<br/>CNF, predicate pushdown,<br/>subquery flattening,<br/>residual LIMIT lowering,<br/>auto-parameterize"] RW --> OPT["Cost optimizer<br/>qo_optimize_statement"] OPT --> XG["XASL generation<br/>xasl_generation.c"] XG --> EX["Execution<br/>query_executor.c"] classDef rw fill:#fff5d6,stroke:#a07000; class SC,VT,RW rw;
The yellow stages all touch the parse tree; the rewrite layer
proper is the third one (mq_rewrite). The earlier two are
relevant because they (a) lower LIMIT into a numbering predicate
and (b) produce a tree where view references have been
substituted with their definitions. Both happen before
mq_rewrite runs, so the rewrite layer can assume LIMIT is
already a predicate in most cases and that it sees real tables.
The driver is in src/compat/db_vdb.c:
// db_compile_statement_local — src/compat/db_vdb.c (condensed)intdb_compile_statement_local (DB_SESSION * session){ // ... parser already populated session->statements ...
/* prefetch and lock classes to avoid deadlock */ (void) pt_class_pre_fetch (parser, statement);
/* semantic check + LIMIT lowering happens inside */ statement_result = pt_compile (parser, statement);
/* view substitution + post-translate rewrites */ statement_result = mq_translate (parser, statement); // ... mq_translate calls mq_rewrite internally for SELECTs ...
/* re-prefetch translated real classes */ (void) pt_class_pre_fetch (parser, statement);
// ... do_prepare_statement → XASL generation ...}pt_compile is the entrypoint to semantic checking; mq_translate
is the entrypoint to view inlining and post-translate rewrite
(it calls mq_rewrite for the resulting SELECTs). Between them,
all parse-tree shape changes complete before XASL generation.
LIMIT lowering during semantic checking
Section titled “LIMIT lowering during semantic checking”The first rewrite the LIMIT clause undergoes happens inside
pt_eval_type_pre (in type_checking.c), which is called as
the pre-walk callback of pt_semantic_type on every node in the
tree. The decision tree is the heart of the LIMIT case study:
flowchart TD
S["LIMIT clause<br/>node.info.query.limit != NULL<br/>flag.rewrite_limit == 1"]
S -->|node_type| K{node_type?}
K -->|PT_SELECT| SEL{has<br/>ORDER BY?}
K -->|PT_UNION/<br/>PT_INTERSECTION/<br/>PT_DIFFERENCE| UNI{has<br/>ORDER BY?}
K -->|PT_DELETE| DEL["INST_NUM lowering →<br/>delete_.search_cond"]
K -->|PT_UPDATE| UPD{has<br/>ORDER BY?}
SEL -->|yes,<br/>no order_siblings| OBN["ORDERBY_NUM lowering →<br/>query.orderby_for"]
SEL -->|no,<br/>has GROUP BY| GBN["GROUPBY_NUM lowering →<br/>q.select.having"]
SEL -->|no GBY,<br/>has DISTINCT| OBN2["ORDERBY_NUM lowering →<br/>query.orderby_for"]
SEL -->|none of the above| INST["INST_NUM lowering →<br/>q.select.where"]
UNI -->|yes| OBN3["ORDERBY_NUM lowering →<br/>query.orderby_for"]
UNI -->|no| RESID["leave for mq_rewrite<br/>(see qo_rewrite_queries)"]
UPD -->|yes| OBN4["ORDERBY_NUM lowering →<br/>update.orderby_for"]
UPD -->|no| INST2["INST_NUM lowering →<br/>update.search_cond"]
The branch order is critical: ORDER BY beats GROUP BY beats
DISTINCT beats “nothing”, and only the first match is taken.
A SELECT with both GROUP BY and ORDER BY uses ORDERBY_NUM,
not GROUPBY_NUM. The cheat-sheet’s enumeration of the cases
matches the source one-for-one; the pt_eval_type_pre arm for
PT_SELECT is:
// pt_eval_type_pre (PT_SELECT arm) — src/parser/type_checking.ccase PT_SELECT: /* rewrite limit clause as numbering expression and add it to the corresponding predicate */ if (node->info.query.limit && node->info.query.flag.rewrite_limit) { PT_NODE *limit, *t_node; PT_NODE **expr_pred;
if (node->info.query.order_by && !node->info.query.flag.order_siblings) { expr_pred = &node->info.query.orderby_for; limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_ORDERBY_NUM, false); } else if (node->info.query.q.select.group_by) { expr_pred = &node->info.query.q.select.having; limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_LAST_OPCODE, true); } else if (node->info.query.all_distinct == PT_DISTINCT) { expr_pred = &node->info.query.orderby_for; limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_ORDERBY_NUM, false); } else { expr_pred = &node->info.query.q.select.where; limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_INST_NUM, false); }
if (limit != NULL) { /* append `limit` to the chosen predicate slot */ // ... condensed: walk expr_pred to its tail, splice in `limit` ... node->info.query.flag.rewrite_limit = 0; } } break;Two non-obvious facts about this arm. First, order_siblings
(a CONNECT BY hierarchy modifier) intentionally vetoes the
ORDER BY branch — siblings ordering is per-level, not global,
so ORDERBY_NUM would count across siblings and overshoot the
caller’s intent. Second, the GROUP BY case uses PT_LAST_OPCODE
(a sentinel) for the operator and passes is_gby_num = true —
the helper interprets this as “build a PT_GROUPBY_NUM
function node”, not a PT_EXPR operator, because GROUPBY_NUM
is a function in the parse tree taxonomy.
The flag node->info.query.flag.rewrite_limit is set to 1 by
the grammar when LIMIT is present (csql_grammar.y writes it in
the SELECT/UPDATE/DELETE production rules) and cleared to 0
once the lowering succeeds. After clearing, the LIMIT
expression is still on node->info.query.limit for diagnostic
printing, but it no longer participates in execution — the
real predicate is the lowered numbering expression in the
chosen predicate slot.
parser_print_tree checks flag.rewrite_limit and suppresses
LIMIT output when the flag is 0 — i.e., after lowering, the
printed query has no LIMIT clause but the equivalent
WHERE inst_num() <= n (or its sibling). The cheat-sheet’s
PT_CONVERT_RANGE formatting variant exposes the lowered form
explicitly; this is what the raw analysis quotes for the
select code, name FROM athlete ... LIMIT 1 example.
How the lowered predicate is constructed
Section titled “How the lowered predicate is constructed”pt_limit_to_numbering_expr (in parser_support.c) is a small
factory. The user-facing LIMIT has up to two values: [offset,] row_count. The function builds either lhs <= row_count (when
no offset) or lhs > offset AND lhs <= offset + row_count
(when offset is present). The lhs is one of:
PT_EXPRwithop = PT_INST_NUM— counts rows after the scan.PT_EXPRwithop = PT_ORDERBY_NUM— counts rows after sort.PT_FUNCTIONwithfunction_type = PT_GROUPBY_NUM— counts groups after grouping.
// pt_limit_to_numbering_expr — src/parser/parser_support.c (condensed)PT_NODE *pt_limit_to_numbering_expr (PARSER_CONTEXT * parser, PT_NODE * limit, PT_OP_TYPE num_op, bool is_gby_num){ PT_NODE *lhs, *part1, *part2, *node;
/* Build the LHS: GROUPBY_NUM is a function node, the others are expr nodes. */ if (is_gby_num) { lhs = parser_new_node (parser, PT_FUNCTION); lhs->type_enum = PT_TYPE_INTEGER; lhs->info.function.function_type = PT_GROUPBY_NUM; } else { lhs = parser_new_node (parser, PT_EXPR); lhs->info.expr.op = num_op; /* PT_INST_NUM or PT_ORDERBY_NUM */ }
if (limit->next == NULL) { /* `LIMIT n` → lhs <= n */ node = parser_new_node (parser, PT_EXPR); node->info.expr.op = PT_LE; node->info.expr.arg1 = lhs; node->info.expr.arg2 = parser_copy_tree (parser, limit); } else { /* `LIMIT off, n` → lhs > off AND lhs <= off + n */ part1->info.expr.op = PT_GT; part1->info.expr.arg1 = lhs; part1->info.expr.arg2 = parser_copy_tree (parser, limit); /* offset */
part2->info.expr.op = PT_LE; part2->info.expr.arg1 = parser_copy_tree (parser, lhs); /* part2->arg2 = (offset + row_count); folded if both are PT_VALUE */ // ... condensed: build PT_PLUS sum, fold if both literals ...
node = parser_new_node (parser, PT_EXPR); node->info.expr.op = PT_AND; node->info.expr.arg1 = part1; node->info.expr.arg2 = part2; } return node;}The PT_VALUE && PT_VALUE constant-folding branch in the
two-arg case is a small win: LIMIT 100, 10 becomes inst_num () > 100 AND inst_num() <= 110 directly, with 110 already
computed at rewrite time. If either bound is a host variable,
the PT_PLUS expression node is kept and folded later.
The mq_rewrite stage — what happens after view inlining
Section titled “The mq_rewrite stage — what happens after view inlining”mq_translate substitutes view references with their defining
queries. After substitution, new SELECTs may appear in the
tree that did not exist before semantic checking, and these
SELECTs may carry a LIMIT that came from the view definition
or that was attached by mq_translate itself. The mq_rewrite
pass at the bottom of mq_translate therefore has to:
- Re-lower any residual LIMIT clauses that were not lowered in semantic checking (the UNION case, primarily).
- CNF-normalise predicates (which now include the lowered numbering predicates from semantic checking).
- Push predicates down where safe.
- Flatten uncorrelated subqueries to joins with derived tables.
- Reduce equality terms (constant propagation through
=). - Reduce outer joins to inner joins where safe.
- Auto-parameterize constants for the XASL plan cache.
The driver is mq_rewrite, which parser_walk_trees the
statement with qo_rewrite_queries as the pre-callback and
qo_rewrite_queries_post as the post-callback:
// mq_rewrite — src/optimizer/rewriter/query_rewrite.cPT_NODE *mq_rewrite (PARSER_CONTEXT * parser, PT_NODE * statement){ return parser_walk_tree (parser, statement, qo_rewrite_queries, NULL, qo_rewrite_queries_post, NULL);}qo_rewrite_queries is a 500-line dispatch with three phases.
Phase 1 — pre-rewrite
Section titled “Phase 1 — pre-rewrite”For each statement type, the pre-rewrite phase identifies the
predicate slots that subsequent phases will operate on
(wherep, havingp, orderby_for_p, etc.) and applies
shape-only transformations:
qo_move_on_of_explicit_join_to_where— move ON-clause predicates of explicit joins into the WHERE list, tagged with their original location. The post-callback uses the location tag to undo this for outer joins. Inner joins keep the move; outer joins get the predicate restored to the ON clause unless it has been transformed into a copy-pushed term.qo_rewrite_index_hints— drop hints that reference dropped tables (after view inlining, the hint may no longer apply).- For
PT_UNION/PT_DIFFERENCE/PT_INTERSECTION, the residual LIMIT lowering runs here. This is the case semantic checking left for later: a UNION with LIMIT but no ORDER BY:
// qo_rewrite_queries (PT_UNION arm, residual LIMIT) — query_rewrite.ccase PT_UNION:case PT_DIFFERENCE:case PT_INTERSECTION: // ... condensed: rewrite hidden columns of arg1, arg2 as derived ... if (node->info.query.limit && node->info.query.flag.rewrite_limit) { limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_INST_NUM, false); if (limit != NULL) { node->info.query.flag.rewrite_limit = 0;
/* Push limit into UNION arms when safe (no ORDER BY at top, no DISTINCT, no NO_PUSH_PRED hint). */ if (node->info.query.order_by == NULL && !qo_check_distinct_union (parser, node) && !qo_check_hint_union (parser, node, PT_HINT_NO_PUSH_PRED)) { node = qo_push_limit_to_union (parser, node, limit_node); } /* Wrap the UNION as a derived table and apply the lowered predicate on top: SELECT * FROM (UNION) T WHERE INST_NUM() <= n. */ derived = mq_rewrite_query_as_derived (parser, node); derived->info.query.q.select.where = limit; node = derived; } } orderby_for_p = &node->info.query.orderby_for; break;Two transformations stack here. The first is per-arm LIMIT
push: qo_push_limit_to_union walks each branch of the UNION
tree and copies the LIMIT (or LIMIT offset+row_count, summed
to a single bound) onto each arm, so each arm independently
stops after enough rows. The second is wrap-as-derived:
the UNION becomes the FROM clause of a synthetic outer SELECT
whose WHERE is the lowered INST_NUM() <= n, so the final
top-of-pipeline cap is also expressible. The combination means
UNION-with-LIMIT can short-circuit each arm independently and
also stop the merged pipeline.
Phase 2 — optimization rewrites
Section titled “Phase 2 — optimization rewrites”Phase 2 runs only when OPTIMIZATION_ENABLED(level) is true
(which it is for normal queries; the parameter optimization_ level can disable it for debugging).
// qo_rewrite_queries (phase 2 sketch) — query_rewrite.cif (OPTIMIZATION_ENABLED (level)) { if (node->node_type == PT_SELECT) { int continue_walk, idx = 0; qo_rewrite_subqueries (parser, node, &idx, &continue_walk); }
/* convert predicate lists to CNF */ if (*wherep) *wherep = pt_cnf (parser, *wherep); if (*havingp) *havingp = pt_cnf (parser, *havingp); if (*startwithp) *startwithp = pt_cnf (parser, *startwithp); if (*connectbyp) *connectbyp = pt_cnf (parser, *connectbyp); if (*aftercbfilterp)*aftercbfilterp= pt_cnf (parser, *aftercbfilterp); if (*orderby_for_p) *orderby_for_p = pt_cnf (parser, *orderby_for_p); // ... merge_upd_wherep, merge_ins_wherep, merge_del_wherep ...
/* in HAVING with GROUP BY, move non-aggregate terms to WHERE */ if (PT_IS_SELECT (node) && node->info.query.q.select.group_by && *havingp) { // ... condensed: HAVING-to-WHERE pushdown for non-aggregate terms ... }
/* replace `WHERE x = const AND ... x ...` with `const` everywhere */ if (*wherep) { if (PT_IS_SELECT (node)) parser_walk_tree (parser, node, NULL, NULL, qo_reduce_equality_terms_post, NULL); else QO_CHECK_AND_REDUCE_EQUALITY_TERMS (parser, node, wherep); } if (*havingp) QO_CHECK_AND_REDUCE_EQUALITY_TERMS (parser, node, havingp);
/* per-clause term rewrites: range merging, OR→IN promotion, etc. */ qo_rewrite_terms (parser, spec, wherep); qo_rewrite_terms (parser, spec, havingp); qo_rewrite_terms (parser, spec, startwithp); qo_rewrite_terms (parser, spec, connectbyp); qo_rewrite_terms (parser, spec, aftercbfilterp); qo_rewrite_terms (parser, spec, merge_upd_wherep); qo_rewrite_terms (parser, spec, merge_ins_wherep); qo_rewrite_terms (parser, spec, merge_del_wherep);
/* outer-join reduction, predicate copy, redundant-join elimination */ if (node->node_type == PT_SELECT) { if (!qo_rewrite_select_queries (parser, &node, wherep, &seqno)) return node; } }The HAVING-to-WHERE move on line “in HAVING with GROUP BY” deserves a closer look. A HAVING conjunct that does not reference an aggregate and does not reference a non-grouped column can be evaluated before grouping — moving it to WHERE is a textbook predicate pushdown. CUBRID’s variant adds two guards: the conjunct must not contain a pseudocolumn (CONNECT BY pseudocolumns are level-sensitive), and the GROUP BY must not have ROLLUP (ROLLUP groups depend on the pre-aggregate row count, so eliminating rows changes the result).
qo_rewrite_subqueries is the subquery-flattening pass. For
each WHERE conjunct of the form attr op (SELECT ...) where
the operator is one of =, IN, =SOME, >SOME, >=SOME,
<SOME, <=SOME, the subquery’s columns are indexable, and
the subquery is uncorrelated (correlation_level == 0), the
pass:
- Wraps the subquery as a derived table spec and appends it to the FROM list.
- Replaces the operator with
=,>,>=,<, or<=. - For the
*SOMEvariants, replaces the subquery’s select list withMIN()orMAX()of the original column —> SOMEbecomes> MIN,< SOMEbecomes< MAX.
This trades a correlated lookup (one subquery execution per outer row) for a single derived-table join, which the join enumerator can then index-drive.
qo_rewrite_select_queries (in query_rewrite_select.c) does
three things in order: rewrite outer joins to inner joins where
a null-rejecting predicate dominates the optional side
(qo_rewrite_outerjoin), reduce outer-joined tables when the
join is uniquely keyed and unreferenced
(qo_reduce_outer_joined_tbls), and remove tables joined by
FK→PK whose columns aren’t projected (qo_reduce_joined_tbls_ ref_by_fk). The latter two are the redundant-join elimination
pass.
Phase 3 — auto-parameterization
Section titled “Phase 3 — auto-parameterization”After all structural rewrites, qo_auto_parameterize walks
each predicate list and replaces literal constants with input
host variables (input markers). The motivation is the XASL
plan cache: a cached plan keyed on the lowered parse tree
is reusable across executions that differ only in literal
values, which dramatically improves cache hit rate. The pass is
gated by `prm_get_integer_value (PRM_ID_XASL_CACHE_MAX_ENTRIES)
0
andnode->flag.cannot_prepare == 0`, and it runs only when the parser is not in static SQL or skip-auto-parameterize mode.
// qo_auto_parameterize_limit_clause — query_rewrite_auto_parameterize.c (condensed)voidqo_auto_parameterize_limit_clause (PARSER_CONTEXT * parser, PT_NODE * node){ PT_NODE *limit_offsetp, *limit_row_countp;
/* Pull (offset, row_count) out of the per-statement-type slot. */ // ... condensed: switch on node_type, set *_offsetp, *_row_countp ...
/* Replace each literal with `pt_rewrite_to_auto_param` (creates an input host var). */ if (limit_offsetp && pt_is_const_not_hostvar (limit_offsetp)) new_limit_offsetp = pt_rewrite_to_auto_param (parser, limit_offsetp); if (limit_row_countp && pt_is_const_not_hostvar (limit_row_countp)) new_limit_row_countp = pt_rewrite_to_auto_param (parser, limit_row_countp);
/* Splice back into the slot. */ // ... condensed: switch on node_type again to write back ...}Note that qo_auto_parameterize_limit_clause operates on the
original node->info.query.limit slot, even though the
lowered numbering predicate has already been built. Both copies
exist after rewrite: the limit slot holds the host-var-
parameterized LIMIT ? for diagnostic and printing purposes,
while the lowered INST_NUM() <= ? (also host-var-
parameterized via qo_auto_parameterize walking the WHERE
list) is what XASL actually evaluates. This is the
“NOTE: Dummy로 가지고 있고…” comment in the raw analysis —
the question of whether keeping the dummy LIMIT is the right
call is left open there, and remains so here.
The example, end-to-end
Section titled “The example, end-to-end”The raw analysis walks the example
SELECT code, name INTO c, m FROM athlete WHERE gender = 'M' AND nation_code = 'KOR' LIMIT 1 through the three stages. With
the source verified:
Stage 1 — parse. The grammar production for SELECT ... LIMIT 1 writes node->info.query.limit = PT_VALUE(1) and
node->info.query.flag.rewrite_limit = 1. The original
gender = 'M' AND nation_code = 'KOR' predicate goes on
q.select.where.
Stage 2 — semantic check (pt_eval_type_pre, PT_SELECT
arm). The query has no ORDER BY, no GROUP BY, no DISTINCT;
the fall-through arm runs:
expr_pred = &node->info.query.q.select.where;limit = pt_limit_to_numbering_expr (parser, node->info.query.limit, PT_INST_NUM, false);pt_limit_to_numbering_expr returns
PT_LE(PT_INST_NUM(), PT_VALUE(1)), which is appended to
q.select.where. Both the existing gender and nation_code
conjuncts and the new inst_num() <= 1 conjunct are now in the
WHERE list. The rewrite_limit flag is cleared. Printing the
tree with PT_CONVERT_RANGE formatting shows:
SELECT [public.athlete].code, [public.athlete].[name]FROM [public.athlete] [public.athlete]WHERE (inst_num() <= 1) AND [public.athlete].gender = 'M' AND [public.athlete].nation_code = 'KOR'Stage 3 — mq_rewrite. CNF-normalises (already CNF), reduces equality terms (no opportunity here), then auto-parameterizes all literals:
SELECT [public.athlete].code, [public.athlete].[name]FROM [public.athlete] [public.athlete]WHERE (inst_num() <= ?:0) AND [public.athlete].gender = ?:1 AND [public.athlete].nation_code = ?:2The plan cache key is now insensitive to the specific values 1, ‘M’, ‘KOR’.
Multi-range LIMIT optimization at plan-generation time
Section titled “Multi-range LIMIT optimization at plan-generation time”The lowered LIMIT is consumed by the optimizer in two ways. The
first is the ordinary one: INST_NUM() <= n participates in
the scan’s predicate evaluation, and the executor stops calling
the scan once enough rows are collected. The second is a
special case for index scans with ORDER BY: when the
ordering matches an index prefix and the LIMIT is small,
CUBRID can drive the scan with a priority queue of index
ranges, evaluating only enough of each range to satisfy the
LIMIT. This is the “multi-range optimization” (MRO) and lives
in qo_check_iscan_for_multi_range_opt (in
plan_generation.c).
// qo_check_iscan_for_multi_range_opt — src/optimizer/plan_generation.c (condensed)boolqo_check_iscan_for_multi_range_opt (QO_PLAN * plan){ if (!qo_is_iscan (plan)) return false; if (QO_NODE_IS_CLASS_HIERARCHY (plan->plan_un.scan.node)) return false;
query = QO_ENV_PT_TREE (env); if (!PT_IS_SELECT (query)) return false; if (query->info.query.q.select.hint & PT_HINT_NO_MULTI_RANGE_OPT) return false; if (pt_has_aggregate (parser, query)) return false; /* CBRD-22696 */ if (query->info.query.order_by == NULL) return false; if (query->info.query.all_distinct == PT_DISTINCT) return false; if (query->info.query.orderby_for == NULL) return false; /* must have lowered LIMIT */
/* the order_by columns must occupy consecutive index positions, in matching * direction (or all reversed), starting at first_col_idx_pos */ // ... build orderby_nodes from select_list using sort_spec.pos_descr.pos_no ... qo_check_plan_index_for_multi_range_opt (orderby_nodes, order_by, plan, &can_optimize, &first_col_idx_pos, &reverse);
/* scan terms / key filter terms must allow MRO; subqueries must not depend on the LIMIT */ qo_check_terms_for_multiple_range_opt (plan, first_col_idx_pos, &can_optimize); can_optimize = qo_check_subqueries_for_multi_range_opt (plan, first_col_idx_pos);
/* upper bound must be < PRM_ID_MULTI_RANGE_OPT_LIMIT system parameter */ if (!pt_check_ordby_num_for_multi_range_opt (parser, query, &env->multi_range_opt_candidate, NULL)) return false;
return true;}The MRO predicate explicitly checks for the post-lowering
shape: orderby_for non-null means pt_eval_type_pre lowered
LIMIT into the ORDERBY_NUM slot, which is the only slot MRO
can act on. Equivalently, MRO is dead for the INST_NUM and
GROUPBY_NUM lowerings; the optimization only applies when the
LIMIT sits above an index-driven sort.
The pt_check_ordby_num_for_multi_range_opt predicate itself
(in parser_support.c:9743) inspects the predicate shape:
the upper bound has to be less than prm_get_integer_value (PRM_ID_MULTI_RANGE_OPT_LIMIT) (default in
system_parameter.c), the predicate has to look like
ORDERBY_NUM <= n or ORDERBY_NUM < n or n > ORDERBY_NUM or
n >= ORDERBY_NUM, AND-combined; lower bounds are not allowed
because MRO has no efficient way to skip leading rows of a
priority queue.
Why the rewrite-time lowering matters
Section titled “Why the rewrite-time lowering matters”Three downstream consequences fall out of LIMIT being a
predicate by the time mq_rewrite runs:
- CNF and equality reduction work on it for free. The
INST_NUM() <= nconjunct is a normal CNF term; if the user wroteLIMIT 1and a prior rewrite folded another conjunct into a tautology, the CNF simplifier sees the survivor asinst_num() <= 1, not as a special LIMIT clause. - Auto-parameterization works on it for free. The literal
nis recognised and replaced with a host-variable marker by the same walk that processes user-written constants. - Index-condition planning sees it as just another predicate — the optimizer’s usual machinery for identifying index-driven evaluation does not need a LIMIT- specific code path. The MRO machinery is the one place that does require LIMIT-awareness, and it is opt-in.
The price is the duplication noted earlier: the original
info.query.limit field is preserved (for printing and for
auto-parameterization of the offset/row_count host vars used
by the keylimit and other downstream code paths), even though
it is functionally a dummy after lowering. The cheat-sheet
flags this as an open question; the source does not resolve it
either.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. The CUBRID source moves; the position table at the end is scoped to the doc’s
updated:date.
Parse-tree slots and flags
Section titled “Parse-tree slots and flags”pt_query_info::limit(parse_tree.h) — the user-writtenLIMIT [offset,] row_countchain, list ofPT_VALUE/PT_HOST_VAR/PT_EXPRnodes. Survives lowering.pt_query_info::flag.rewrite_limit:1(parse_tree.h) — set by the grammar when LIMIT is present, cleared by the lowering arm inpt_eval_type_pre(or inqo_rewrite_queriesfor the UNION-without-ORDER-BY case).pt_delete_info::limit/pt_delete_info::rewrite_limit:1(parse_tree.h) — same idea, DELETE-side.pt_update_info::limit/pt_update_info::rewrite_limit:1(parse_tree.h) — same idea, UPDATE-side.PT_INST_NUM/PT_ORDERBY_NUM(parse_tree.h) — the two numbering operators in thePT_OP_TYPEenum.PT_GROUPBY_NUM(parse_tree.h) — the numbering function in theFUNC_TYPEenum (functions and operators are different PT_NODE shapes; GROUPBY_NUM is a function because it is evaluated post-grouping).PT_EXPR_INFO_GROUPBYNUM_LIMIT(parse_tree.h) — flag set on each conjunct of a GROUPBY_NUM-based lowered LIMIT, so the HAVING-to-WHERE move logic knows not to relocate it.
Grammar-side LIMIT handling
Section titled “Grammar-side LIMIT handling”csql_grammar.y— three productions writeflag.rewrite_ limit = 1: the SELECT/UNION limit production, the UPDATE limit production (setsupdate.rewrite_limit = 1), and the DELETE limit production (setsdelete_.rewrite_limit = 1).
Semantic-check entry points
Section titled “Semantic-check entry points”pt_compile(compile.c) — top-level entry, callspt_semantic_checkand returns. Owns the longjmp environment for parse-time errors.pt_semantic_check/pt_check_with_info(semantic_check.c) — per-node-type semantic checks, dispatches topt_resolve_names,pt_check_where,pt_semantic_type.pt_semantic_type(type_checking.c) — owner of the pre/post walk that runspt_eval_type_preandpt_eval_typeover the tree. Where LIMIT lowering lives.pt_eval_type_pre(type_checking.c) — pre-walk callback, contains the LIMIT-lowering decision tree (PT_SELECT, PT_UPDATE, PT_DELETE, PT_UNION/PT_DIFFERENCE/PT_INTERSECTION arms).pt_limit_to_numbering_expr(parser_support.c) — factory that builds thelhs <= row_countorlhs > offset AND lhs <= offset + row_countlowered expression.
View inlining and rewrite driver
Section titled “View inlining and rewrite driver”mq_translate(view_transform.c) — top-level view-inlining and post-translate rewrite entry. Callsmq_rewritefor SELECT-shaped statements.mq_rewrite(query_rewrite.c) — the parser_walk_tree driver of the rewrite layer. Wrapsqo_rewrite_queries(pre) andqo_rewrite_queries_post(post).qo_rewrite_queries(query_rewrite.c) — the three-phase rewrite dispatcher (pre-rewrite, optimization rewrites, auto-parameterize). The 500-line core.qo_rewrite_queries_post(query_rewrite.c) — moves ON-clause predicates back to the join spec for outer joins; removes COPYPUSH terms.
Subquery flattening
Section titled “Subquery flattening”qo_rewrite_subqueries(query_rewrite_subquery.c) — flatten uncorrelated subqueries in WHERE to derived-table joins.qo_rewrite_hidden_col_as_derived(query_rewrite_subquery.c) — when an ORDER BY in a subquery would lose its hidden columns on the outer-projection side, wrap as a derived table so the order is preserved.qo_add_limit_clause(query_rewrite_subquery.c) — injection used by EXISTS-rewriting: if a subquery would benefit fromLIMIT 1, attach it. Bumpsrewrite_limitback to 1 so the residual UNION-arm path picks it up.mq_rewrite_query_as_derived(view_transform.c) — wrap a query as the FROM-list derived table of a synthetic outer SELECT.mq_make_derived_spec(view_transform.c) — build the PT_SPEC node + new attribute names for a derived table.
CNF and equality reduction
Section titled “CNF and equality reduction”pt_cnf(cnf.c) — convert a predicate list to conjunctive normal form, tagging taggable terms.pt_do_cnf(cnf.c) — top-level CNF wrapper that walks all predicate-bearing nodes of a statement.qo_reduce_equality_terms/qo_reduce_equality_terms_post(query_rewrite_term.c) —WHERE x = 5 AND p(x)→WHERE x = 5 AND p(5).QO_CHECK_AND_REDUCE_EQUALITY_TERMS(query_rewrite.h) — guard macro that skips re-running reduction on the same node.qo_rewrite_terms(query_rewrite_term.c) — per-clause range-merging, OR-to-IN, redundant-disjunct removal.
Outer-join reduction and redundant joins
Section titled “Outer-join reduction and redundant joins”qo_rewrite_select_queries(query_rewrite_select.c) — driver for the SELECT-only structural rewrites.qo_rewrite_outerjoin(query_rewrite_select.c) — promote outer to inner when a null-rejecting predicate dominates.qo_rewrite_innerjoin(query_rewrite_select.c) — symmetric cleanup.qo_reduce_outer_joined_tbls(query_rewrite_select.c) — drop a uniquely-keyed outer-joined table that is not projected.qo_reduce_joined_tbls_ref_by_fk(query_rewrite_select.c) — drop FK→PK joins when the parent’s columns are unreferenced.qo_reduce_predicate_for_parent_spec(query_rewrite_select.c) — companion that adjusts surviving predicates after a join is reduced.
LIMIT-specific helpers
Section titled “LIMIT-specific helpers”qo_push_limit_to_union(query_rewrite_set.c) — copy LIMIT onto each branch of a UNION when the top-level has no ORDER BY, no DISTINCT, and no NO_PUSH_PRED hint.qo_check_distinct_union/qo_check_hint_union(query_rewrite_set.c) — the safety predicates forqo_push_limit_to_union.qo_auto_parameterize_limit_clause(query_rewrite_auto_parameterize.c) — auto-parameterize the preserved LIMIT host vars.qo_auto_parameterize_keylimit_clause(query_rewrite_auto_parameterize.c) — same for the per-index keylimit hint.qo_auto_parameterize(query_rewrite_auto_parameterize.c) — generic literal-to-host-var pass over CNF terms.pt_rewrite_to_auto_param(parse_tree_cl.c) — the individual literal-to-host-var transform.
Multi-range LIMIT optimization (plan-gen time)
Section titled “Multi-range LIMIT optimization (plan-gen time)”qo_check_iscan_for_multi_range_opt(plan_generation.c) — the “can this index scan use MRO?” predicate; gates onquery.orderby_fornon-null (i.e., post-lowering shape).qo_check_plan_index_for_multi_range_opt(plan_generation.c) — checks that ORDER BY columns occupy consecutive index positions, all in matching direction.pt_check_ordby_num_for_multi_range_opt(parser_support.c) — checks the loweredORDERBY_NUM ≤ npredicate shape and thePRM_ID_MULTI_RANGE_OPT_LIMITupper-bound system parameter.qo_plan_multi_range_opt(plan_generation.c) — boolean on a finished plan: was MRO actually applied?PRM_ID_MULTI_RANGE_OPT_LIMIT(system_parameter.h) — upper bound onnfor MRO eligibility.
Per-statement compile driver
Section titled “Per-statement compile driver”db_compile_statement_local(db_vdb.c) — top-of-pipeline driver: parser →pt_class_pre_fetch→pt_compile→mq_translate→pt_class_pre_fetchagain →do_prepare_statement. Where the rewrite layer fits into the broader compile flow.
Position hints as of 2026-04-30
Section titled “Position hints as of 2026-04-30”| Symbol | File | Line |
|---|---|---|
pt_query_info::flag.rewrite_limit:1 | src/parser/parse_tree.h | 2141 |
pt_delete_info::rewrite_limit:1 | src/parser/parse_tree.h | 2850 |
pt_update_info::rewrite_limit:1 | src/parser/parse_tree.h | 2972 |
PT_INST_NUM / PT_ORDERBY_NUM enumerators | src/parser/parse_tree.h | 1399 |
pt_compile | src/parser/compile.c | 381 |
pt_class_pre_fetch | src/parser/compile.c | 432 |
pt_eval_type_pre (PT_SELECT LIMIT arm) | src/parser/type_checking.c | 7179 |
pt_eval_type_pre (PT_UNION LIMIT arm) | src/parser/type_checking.c | 7138 |
pt_eval_type_pre (PT_DELETE LIMIT arm) | src/parser/type_checking.c | 7242 |
pt_eval_type_pre (PT_UPDATE LIMIT arm) | src/parser/type_checking.c | 7274 |
pt_limit_to_numbering_expr | src/parser/parser_support.c | 4567 |
pt_check_ordby_num_for_multi_range_opt | src/parser/parser_support.c | 9743 |
pt_cnf | src/parser/cnf.c | 941 |
pt_do_cnf | src/parser/cnf.c | 1183 |
mq_translate | src/parser/view_transform.c | (top) |
mq_rewrite | src/optimizer/rewriter/query_rewrite.c | 736 |
qo_rewrite_queries | src/optimizer/rewriter/query_rewrite.c | 44 |
qo_rewrite_queries_post | src/optimizer/rewriter/query_rewrite.c | 586 |
qo_rewrite_queries (PT_UNION residual LIMIT branch) | src/optimizer/rewriter/query_rewrite.c | 173 |
qo_rewrite_subqueries | src/optimizer/rewriter/query_rewrite_subquery.c | 40 |
qo_rewrite_hidden_col_as_derived | src/optimizer/rewriter/query_rewrite_subquery.c | 331 |
qo_add_limit_clause | src/optimizer/rewriter/query_rewrite_subquery.c | 473 |
qo_push_limit_to_union | src/optimizer/rewriter/query_rewrite_set.c | 40 |
qo_check_distinct_union | src/optimizer/rewriter/query_rewrite_set.c | 101 |
qo_check_hint_union | src/optimizer/rewriter/query_rewrite_set.c | 132 |
qo_rewrite_select_queries | src/optimizer/rewriter/query_rewrite_select.c | 62 |
qo_rewrite_index_hints | src/optimizer/rewriter/query_rewrite_select.c | 151 |
qo_move_on_of_explicit_join_to_where | src/optimizer/rewriter/query_rewrite_select.c | 516 |
qo_reduce_outer_joined_tbls | src/optimizer/rewriter/query_rewrite_select.c | 1909 |
qo_reduce_joined_tbls_ref_by_fk | src/optimizer/rewriter/query_rewrite_select.c | 2105 |
qo_rewrite_outerjoin | src/optimizer/rewriter/query_rewrite_select.c | 3382 |
qo_rewrite_nonnull_count_select_list | src/optimizer/rewriter/query_rewrite_select.c | 3777 |
qo_auto_parameterize | src/optimizer/rewriter/query_rewrite_auto_parameterize.c | 41 |
qo_auto_parameterize_limit_clause | src/optimizer/rewriter/query_rewrite_auto_parameterize.c | 129 |
qo_auto_parameterize_keylimit_clause | src/optimizer/rewriter/query_rewrite_auto_parameterize.c | 293 |
qo_check_iscan_for_multi_range_opt | src/optimizer/plan_generation.c | 4030 |
qo_check_plan_index_for_multi_range_opt | src/optimizer/plan_generation.c | 4199 |
qo_plan_multi_range_opt | src/optimizer/plan_generation.c | 3141 |
db_compile_statement_local | src/compat/db_vdb.c | 531 |
PRM_ID_MULTI_RANGE_OPT_LIMIT | src/base/system_parameter.h | 295 |
Cross-check Notes
Section titled “Cross-check Notes”- The raw analysis says
qo_optimize_queriesruns the rewrite phases. That symbol no longer exists; the actual driver ismq_rewrite(query_rewrite.c:736), which callsqo_rewrite_queries(query_rewrite.c:44) as the parser_walk_tree pre-callback. The function name carries the original “optimize” framing in the PDF; the source has since renamed it to “rewrite” (the optimizer-proper now lives inquery_planner.candplan_generation.c). - The cheat-sheet’s
qo_do_auto_parameterizeis nowqo_auto_parameterize. Both the WHERE-walker and the LIMIT-clause helper were renamed when the rewriter was modularised intosrc/optimizer/rewriter/. The behaviour is the same. - The cheat-sheet lists the parameter checks for
is_parsing_static_sql/is_skip_auto_parameterizeonly on the LIMIT-clause helper. The current source applies them consistently — the WHERE-walking variant is gated bycall_auto_parameterizeinqo_rewrite_queries, which evaluates the same conditions. - The raw analysis treats LIMIT lowering as a single stage
inside semantic checking. That is true for SELECT, UPDATE,
and DELETE. UNION/DIFFERENCE/INTERSECTION are split: the
ORDER-BY case is lowered in
pt_eval_type_pre, but the no-ORDER-BY case is left forqo_rewrite_queries(which picks it up in the PT_UNION arm and additionally pushes the limit into each arm). The cheat-sheet acknowledges UNION’s “ORDER BY가 있는 경우” branch but not the residual handling. - The deck quotes the
where_type()/eval_expr_type()walkers for SELECT inpt_eval_type. The LIMIT-lowering decision tree fired by thept_eval_type_prepre-callback runs before those walkers — i.e., LIMIT is already a numbering predicate by the time type evaluation per se starts on the surrounding WHERE. - The “Dummy LIMIT host variable” question raised in the
raw analysis — whether to keep the original LIMIT after
lowering — is still unresolved in the source. Both the
semantic-check arms and
qo_auto_parameterize_limit_clausepreserve the originalinfo.query.limit. The likely reason is parser_print_tree — operators dump the original LIMIT for diagnostics — but if no diagnostic mode reads it after auto-parameterization, the field is dead. Investigation would require auditing all readers ofinfo.query.limitaftermq_rewrite.
Open Questions
Section titled “Open Questions”- What exactly invokes
mq_rewriteinsidemq_translate? The raw cheat-sheet sketchesmq_translatecallsmq_rewrite“for SELECT”, butmq_translatehas multiple per-type variants (mq_translate_select,mq_translate_update,mq_translate_delete,mq_translate_merge). Tracing each variant’s call intomq_rewritewould clarify whether DELETE and UPDATE LIMITs re-enterqo_rewrite_queriesor only see semantic-check-time lowering. PT_LAST_OPCODEas a sentinel. The GROUP BY arm passesPT_LAST_OPCODEasnum_op. The helper ignoresnum_opwhenis_gby_numis true (it builds a function node, not an expr node), so the value is harmless. But is there a place where the sentinel could leak — e.g., into a copied expression — and confuse a downstream switch onop?- The “two LIMIT representations” trade-off. Why does
qo_auto_parameterize_limit_clauseparameterize the originalinfo.query.limitseparately from the numbering-predicate path? The lowered predicate already carries the value (auto-parameterized by the WHERE walker). Is the dual representation still needed for printing, keylimit, or some other path the raw analysis did not identify? - Interaction with CONNECT BY / hierarchical queries. The
order_siblingsveto in the ORDER BY arm prevents lowering intoORDERBY_NUM, leaving the LIMIT asinfo.query.limit. Doesqo_rewrite_queriesre-pick this up, or does it stay a non-lowered LIMIT through XASL generation, evaluated by a different mechanism? The cheat-sheet treats CONNECT BY as “분석 생략” (analysis skipped); the source path needs tracing. - MRO and
PRM_ID_MULTI_RANGE_OPT_LIMITdefaults. The default value of the system parameter governs how large a LIMIT can be while still benefiting from MRO. The default should be checked against the system_parameter.c entry (line 3197). Workloads withLIMIT 1000may or may not trigger MRO depending on the chosen default. - DBLINK and remote DML LIMIT.
db_compile_statement_localskipsmq_translatefor DBLINK DML queries (PT_IS_DBLINK_DML_QUERY); the LIMIT is not lowered in that path either, because the remote server is expected to parse and execute the query itself. Does the local rewrite layer ever see DBLINK SELECTs whose LIMITs need lowering for the local optimization side? The grammar production for DBLINK SELECTs and the call intopt_rewrite_for_dblinkwould clarify.
Sources
Section titled “Sources”Raw analyses (raw/code-analysis/cubrid/query-processing/)
Section titled “Raw analyses (raw/code-analysis/cubrid/query-processing/)”code_analysis_Query_rewrite-LIMIT_clause.pdf— focused Korean analysis of LIMIT-clause rewriting with the per-statement-type decision tree and theselect ... LIMIT 1walked example.code_analysis_Query_Processing_Cheat_Sheet.pdf— global parser-and-semantic-check map; locates LIMIT lowering insidept_eval_type_preand the auto-parameterization inside what it callsqo_optimize_queries._converted/codeanalysisqueryrewrite-limitclause.pdf.md— markitdown extract of the LIMIT PDF._converted/codeanalysisqueryprocessingcheatsheet.pdf.md— markitdown extract of the cheat-sheet PDF.
Sibling docs
Section titled “Sibling docs”knowledge/code-analysis/cubrid/cubrid-btree.md— the index scan that the loweredINST_NUM/ORDERBY_NUMpredicate guards; MRO sits in the same plan-generation neighbourhood.knowledge/code-analysis/cubrid/cubrid-mvcc.md— the visibility check the scan applies to each row that survives the lowered predicate.
Textbook chapters / papers
Section titled “Textbook chapters / papers”- Database Internals (Petrov), Ch. 12 “Query Planning and Optimization” — rule-based vs cost-based split, predicate pushdown, view inlining.
- Database System Concepts (Silberschatz, Korth, Sudarshan, 7th ed.), Ch. 16 “Query Optimization” — equivalence rules, the relational-algebra-level transformation catalogue.
- Goetz Graefe, The Cascades Framework for Query Optimization, IEEE Data Eng. Bull. 18(3), 1995 — the formal split between transformation rules (always-improving rewrites) and implementation rules (cost-priced alternatives).
- Goetz Graefe, Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25(2), 1993 — historical context for the index-driven LIMIT optimizations CUBRID’s MRO inherits from.
CUBRID source (/data/hgryoo/references/cubrid/)
Section titled “CUBRID source (/data/hgryoo/references/cubrid/)”src/parser/csql_grammar.y— grammar productions that setrewrite_limit = 1on SELECT/UNION/UPDATE/DELETE LIMIT clauses.src/parser/parse_tree.h— the LIMIT-bearing structs and thePT_INST_NUM/PT_ORDERBY_NUM/PT_GROUPBY_NUMenumerations.src/parser/compile.c—pt_compileentry; semantic check driver.src/parser/type_checking.c—pt_semantic_typeandpt_eval_type_pre; the LIMIT-lowering decision tree.src/parser/parser_support.c—pt_limit_to_numbering_expr, the lowering factory;pt_check_ordby_num_for_multi_range_opt, the MRO predicate-shape check.src/parser/cnf.c—pt_cnf,pt_do_cnf.src/parser/view_transform.c—mq_translate(view inlining) andmq_rewrite_query_as_derived/mq_make_derived_spec(derived-table wrappers used by subquery flattening and UNION-with-LIMIT).src/optimizer/rewriter/query_rewrite.c—mq_rewriteandqo_rewrite_queries; the rewrite-layer driver.src/optimizer/rewriter/query_rewrite_subquery.c—qo_rewrite_subqueries,qo_rewrite_hidden_col_as_derived,qo_add_limit_clause.src/optimizer/rewriter/query_rewrite_set.c—qo_push_limit_to_union,qo_check_distinct_union,qo_check_hint_union.src/optimizer/rewriter/query_rewrite_select.c—qo_rewrite_select_queries, the outer-join-reduction and redundant-join helpers;qo_rewrite_index_hints.src/optimizer/rewriter/query_rewrite_term.c—qo_reduce_equality_terms,qo_rewrite_terms.src/optimizer/rewriter/query_rewrite_auto_parameterize.c—qo_auto_parameterize,qo_auto_parameterize_limit_clause,qo_auto_parameterize_keylimit_clause.src/optimizer/plan_generation.c—qo_check_iscan_for_multi_range_optand the MRO machinery that consumes the lowered ORDERBY_NUM predicate.src/base/system_parameter.{c,h}—PRM_ID_MULTI_RANGE_OPT_LIMITand related parameters.src/compat/db_vdb.c—db_compile_statement_local, the per-statement compile driver that wires parser → semantic check → view inlining → rewrite → prepare.