PostgreSQL COPY — Bulk Load/Unload, Formats, and Multi-Insert

Contents:

Theoretical Background
Common DBMS Design
PostgreSQL’s Approach
Source Walkthrough
Source verification (as of 2026-06-05)
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Sources

Theoretical Background

Every database engine offers two doors into and out of its tables. The narrow door is the per-row INSERT/SELECT path: each row passes through the full SQL machinery — parse, plan, execute, log, index — and pays that machinery’s fixed per-statement overhead. The wide door is a bulk loader: a command that streams a large external dataset into a table (or out of it) while amortizing as much of that overhead as possible across many rows. PostgreSQL’s wide door is COPY.

The problem a bulk loader solves is fundamentally an impedance-matching one. On one side sits an external byte stream — a file, a pipe to a program, or a client socket — laid out in some serialization format (delimited text, CSV, a binary tuple encoding). On the other side sits the engine’s internal row representation: a TupleTableSlot of Datum values typed according to the table’s TupleDesc, destined for heap pages through the table access method. The loader’s job is to convert between these two representations as fast as correctness allows, and the unloader’s job is the exact inverse.

Database System Concepts (Silberschatz, Korth, Sudarshan), in its chapter on storage and the physical organization of data, frames the cost of loading in terms of the number of distinct I/O and bookkeeping operations per row. A naïve loader that inserts one row at a time pays, for each row: a tuple-insert call, a WAL record, a per-row index update, and a free-space-map probe. A good bulk loader collapses these: it batches tuple inserts so one page write carries many rows, it can skip the free-space-map when it knows the relation is new, and — when the engine’s transactional rules permit — it can write tuples pre-frozen so they never need a later visibility-establishing rewrite by VACUUM. Each of these is a recognized bulk-load optimization; PostgreSQL’s COPY implements all three.

The design space a bulk loader chooses within has four axes:

Format negotiation. Does the loader hard-code one wire format, or does it support several (human-readable delimited text, RFC-4180 CSV, a compact binary tuple encoding) behind a common dispatch layer? A pluggable format layer lets the same row-handling core serve very different serializations.
Parsing strategy. Text formats require a state machine that finds row and field boundaries while honoring quoting and escaping rules; binary formats require a framed reader that trusts a length-prefixed layout. The parser is where most of a text loader’s CPU goes, so it is heavily tuned.
Batching and durability. How many rows accumulate before a write, and what transactional guarantees does the load make? Batching trades memory for throughput; freezing trades MVCC strictness for the elimination of a later rewrite.
Error tolerance. Does one malformed row abort the whole load, or can the loader skip bad rows and keep going up to some limit? This is the difference between a brittle and a production-grade loader.

PostgreSQL’s COPY is a single command (DoCopy) that resolves all four axes from an option list and then runs one of two engines — a load engine (CopyFrom) or an unload engine (DoCopyTo) — sharing a common format-routine abstraction. The rest of this document follows that structure.

Common DBMS Design

Bulk loaders across systems converge on a small set of engineering conventions. Naming them makes PostgreSQL’s specific symbols read as one set of choices within a shared playbook.

A format-dispatch vtable

Rather than thread if (csv) ... else if (binary) ... through every function, mature loaders factor each supported format into a small table of function pointers: start, per-field type setup, per-row, end. The core engine calls through the vtable and never re-tests the format. PostgreSQL names these CopyToRoutine and CopyFromRoutine (copyapi.h) and selects one with CopyFromGetRoutine / CopyToGetRoutine. The same indirection is what lets out-of-core extensions register custom COPY formats.

A staged buffer pipeline for text parsing

Text/CSV parsing is split into stages with a buffer between each, so each stage has one job and the expensive ones (encoding conversion, line splitting, field de-escaping) are isolated and individually tunable. The universal shape is: read raw bytes -> transcode to server encoding -> split into lines -> split each line into de-escaped fields -> call type input functions. PostgreSQL implements exactly this: raw_buf -> input_buf -> line_buf -> attribute_buf.

Multi-row insert batching

The single most important load optimization is to insert many rows per physical operation. Loaders accumulate parsed rows in an in-memory buffer and flush the buffer as a batch to the storage layer, which writes one (or few) pages and one WAL record covering all of them. The batch size is bounded by both a row count and a byte count to keep memory predictable. PostgreSQL’s buffer is CopyMultiInsertBuffer; the batch flush is table_multi_insert.

Conditional fast paths and correctness fallbacks

Batching is only safe when nothing observes the partially-loaded table mid-load. Triggers that query the table, volatile default expressions, and certain foreign-data-wrapper limitations all force a fallback to one-row-at-a-time insertion. Every batching loader carries this escape hatch. PostgreSQL encodes it as the CopyInsertMethod enum (CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITIONAL).

Pre-frozen / minimally-logged loads

When a load targets a relation created in the same transaction (so no other session can yet see it), the engine can take liberties: skip the free-space map, and write tuples already marked as committed-and-frozen so a later VACUUM need not revisit them. PostgreSQL exposes this as COPY ... FREEZE and implements it with the TABLE_INSERT_FROZEN and TABLE_INSERT_SKIP_FSM table-AM flags.

Soft error handling

A production loader distinguishes hard errors (protocol corruption, I/O failure — always fatal) from soft errors (a single field that won’t parse as its declared type). Soft errors can be trapped, the offending row skipped, and the load continued up to a configurable limit. PostgreSQL routes type-input failures through an ErrorSaveContext and InputFunctionCallSafe, governed by ON_ERROR, REJECT_LIMIT, and LOG_VERBOSITY.

PostgreSQL’s Approach

One entry point, two engines

DoCopy (in copy.c) is the SQL-level entry point. It performs the privilege checks that make server-side file access safe, opens and locks the relation, transforms an optional WHERE clause, and then branches on direction:

// DoCopy — src/backend/commands/copy.c
if (is_from)
{
    CopyFromState cstate;
    Assert(rel);
    /* check read-only transaction and parallel mode */
    if (XactReadOnly && !rel->rd_islocaltemp)
        PreventCommandIfReadOnly("COPY FROM");
    cstate = BeginCopyFrom(pstate, rel, whereClause,
                           stmt->filename, stmt->is_program,
                           NULL, stmt->attlist, stmt->options);
    *processed = CopyFrom(cstate);  /* copy from file to database */
    EndCopyFrom(cstate);
}
else
{
    CopyToState cstate;
    cstate = BeginCopyTo(pstate, rel, query, relid,
                         stmt->filename, stmt->is_program,
                         NULL, stmt->attlist, stmt->options);
    *processed = DoCopyTo(cstate);  /* copy from database to file */
    EndCopyTo(cstate);
}

The privilege model deserves a note: anyone may COPY ... TO STDOUT or FROM STDIN (the data flows over the client connection), but reading or writing a server-side file or program requires membership in pg_read_server_files / pg_write_server_files / pg_execute_server_program. That check is the first thing DoCopy does. A second subtlety: if row-level security is enabled on the target, COPY relation TO is silently rewritten into a query-based COPY (SELECT ...) TO so the rewriter can inject the RLS quals; and COPY FROM under RLS is simply refused.

Options become a `CopyFormatOptions`

ProcessCopyOptions walks the DefElem list from the grammar and transposes it into a flat CopyFormatOptions struct, applying a long sequence of self-consistency checks. This is where FORMAT csv|binary, the delimiter/quote/escape characters, HEADER, FREEZE, ON_ERROR, LOG_VERBOSITY, and REJECT_LIMIT are all resolved, and where the defaults are filled in:

// ProcessCopyOptions — src/backend/commands/copy.c
/* Set defaults for omitted options */
if (!opts_out->delim)
    opts_out->delim = opts_out->csv_mode ? "," : "\t";
if (!opts_out->null_print)
    opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
opts_out->null_print_len = strlen(opts_out->null_print);
if (opts_out->csv_mode)
{
    if (!opts_out->quote)
        opts_out->quote = "\"";
    if (!opts_out->escape)
        opts_out->escape = opts_out->quote;
}

Note the format-coupled defaults: the field delimiter is a tab in text mode but a comma in CSV mode; the NULL marker is \N in text mode but an empty unquoted field in CSV. Many of the subsequent checks enforce that options only legal in one mode are rejected in the others — for example only ON_ERROR STOP is allowed in BINARY mode, and REJECT_LIMIT requires ON_ERROR to be set to IGNORE.

The three error-handling options are parsed by dedicated helpers that map keywords to enum values:

// defGetCopyOnErrorChoice — src/backend/commands/copy.c
if (pg_strcasecmp(sval, "stop") == 0)
    return COPY_ON_ERROR_STOP;
if (pg_strcasecmp(sval, "ignore") == 0)
    return COPY_ON_ERROR_IGNORE;
// defGetCopyLogVerbosityChoice maps silent/default/verbose to
// COPY_LOG_VERBOSITY_{SILENT,DEFAULT,VERBOSE}

Three formats behind one vtable

The format abstraction is the spine of the modern COPY code. Each built-in format is a static const routine table; CopyFromGetRoutine picks one based on the resolved options:

// CopyFromRoutineCSV / CopyFromGetRoutine — src/backend/commands/copyfrom.c
static const CopyFromRoutine CopyFromRoutineCSV = {
    .CopyFromInFunc = CopyFromTextLikeInFunc,
    .CopyFromStart = CopyFromTextLikeStart,
    .CopyFromOneRow = CopyFromCSVOneRow,
    .CopyFromEnd = CopyFromTextLikeEnd,
};

static const CopyFromRoutine *
CopyFromGetRoutine(const CopyFormatOptions *opts)
{
    if (opts->csv_mode)
        return &CopyFromRoutineCSV;
    else if (opts->binary)
        return &CopyFromRoutineBinary;
    return &CopyFromRoutineText;     /* default is text */
}

Text and CSV share the same start, infunc, and end callbacks and differ only in the per-row callback (CopyFromTextOneRow vs. CopyFromCSVOneRow), both thin wrappers around the inlined CopyFromTextLikeOneRow(..., is_csv). Binary is wholly separate: it uses the type receive functions instead of input functions, and its start callback reads and validates the PGCOPY file header. The CopyToRoutine side mirrors this exactly.

The load engine: `CopyFrom`

CopyFrom is the heart of COPY FROM. After validating that the target is a plain/foreign/partitioned table (or a view with an INSTEAD OF trigger), it decides the insert method, sets up executor state and a ResultRelInfo, installs the error-context callback, and then loops: NextCopyFrom → optional soft-error skip → WHERE filter → partition routing → triggers/constraints → insert (batched or single).

flowchart TD
    A["DoCopy: COPY FROM"] --> B["BeginCopyFrom<br/>ProcessCopyOptions, open file/pipe/socket,<br/>look up input functions, alloc buffers"]
    B --> C["CopyFrom: choose CopyInsertMethod<br/>CIM_SINGLE / CIM_MULTI / CIM_MULTI_CONDITIONAL"]
    C --> D{"for each row"}
    D --> E["NextCopyFrom<br/>parse one row into values[]/nulls[]"]
    E --> F{"soft error?<br/>(ON_ERROR IGNORE)"}
    F -->|yes| G["num_errors++, skip row,<br/>check REJECT_LIMIT"]
    G --> D
    F -->|no| H{"WHERE matches?"}
    H -->|no| D
    H -->|yes| I{"partitioned?"}
    I -->|yes| J["ExecFindPartition,<br/>map root rowtype to leaf"]
    I -->|no| K["BEFORE ROW triggers,<br/>constraints, gen cols"]
    J --> K
    K --> L{"multi-insert<br/>enabled?"}
    L -->|yes| M["buffer slot<br/>(CopyMultiInsertInfoStore)"]
    M --> N{"buffer full?"}
    N -->|yes| O["CopyMultiInsertInfoFlush<br/>(table_multi_insert)"]
    N -->|no| D
    O --> D
    L -->|no| P["table_tuple_insert,<br/>index tuples, AFTER triggers"]
    P --> D
    D -->|EOF| Q["flush remaining buffers,<br/>NOTICE skipped rows, cleanup"]

Multi-insert buffering

When batching is allowed, parsed tuples are not inserted immediately; they are materialized into slots held in a CopyMultiInsertBuffer and flushed in bulk. The buffer is bounded two ways — a tuple count and a byte count — so a load of wide rows flushes on bytes and a load of narrow rows flushes on count:

// CopyMultiInsertInfoIsFull / limits — src/backend/commands/copyfrom.c
#define MAX_BUFFERED_TUPLES 1000
#define MAX_BUFFERED_BYTES  65535
static inline bool
CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
{
    if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
        miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
        return true;
    return false;
}

For a partitioned target there is one buffer per partition, all tracked in CopyMultiInsertInfo->multiInsertBuffers. To bound memory when a load sprays across thousands of partitions, after each flush the list is trimmed back to MAX_PARTITION_BUFFERS (32), evicting the oldest-created buffers first — never the one currently in use. This is the quadratic-memory guard called out in the source comment on MAX_BUFFERED_TUPLES.

COPY FREEZE and the insert flags

COPY FROM builds up a ti_options bitmask. Two flags matter for bulk loads. TABLE_INSERT_SKIP_FSM is set whenever the relation’s storage was created in this (sub)transaction — there is no point probing the free-space map for a brand-new relfilenode. TABLE_INSERT_FROZEN is the FREEZE option: it writes tuples already marked frozen, but only after a strict eligibility check that guarantees the rows can never become visible to a transaction that should not see them.

// CopyFrom (FREEZE path) — src/backend/commands/copyfrom.c
if (cstate->opts.freeze)
{
    /* COPY FREEZE is disallowed on partitioned and foreign tables */
    ...
    InvalidateCatalogSnapshot();
    if (!ThereAreNoPriorRegisteredSnapshots() || !ThereAreNoReadyPortals())
        ereport(ERROR, ... "cannot perform COPY FREEZE because of prior transaction activity");
    if (cstate->rel->rd_createSubid != GetCurrentSubTransactionId() &&
        cstate->rel->rd_newRelfilelocatorSubid != GetCurrentSubTransactionId())
        ereport(ERROR, ... "the table was not created or truncated in the current subtransaction");
    ti_options |= TABLE_INSERT_FROZEN;
}

The subtransaction test is “crucial for correctness,” as the comment says: if this subxact aborts, the frozen rows vanish with the relfilenode, so the MVCC anomaly FREEZE introduces (other sessions can see the rows immediately) is confined to a relation that will not survive a rollback. The heap-AM side of TABLE_INSERT_FROZEN is covered in postgres-heap-am.md.

Soft error handling: ON_ERROR / REJECT_LIMIT / LOG_VERBOSITY

By default a row that fails type input aborts the whole COPY (COPY_ON_ERROR_STOP). With ON_ERROR ignore, BeginCopyFrom allocates an ErrorSaveContext, and the per-row parser calls the safe variant of the type input function, which records a soft error rather than throwing:

// CopyFromTextLikeOneRow (soft error path) — src/backend/commands/copyfromparse.c
else if (!InputFunctionCallSafe(&in_functions[m], string, typioparams[m],
                                att->atttypmod,
                                (Node *) cstate->escontext, &values[m]))
{
    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
    cstate->num_errors++;
    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
        ereport(NOTICE, errmsg("skipping row due to data type incompatibility ..."));
    return true;   /* row returned with error_occurred set */
}

CopyFrom sees escontext->error_occurred, resets it, bumps the skipped counter, and — if REJECT_LIMIT is exceeded — turns the soft error into a hard ERROR. At end of load it emits a summary NOTICE of how many rows were skipped (unless LOG_VERBOSITY silent). The whole mechanism is built on the generic soft-error infrastructure described in postgres-error-handling.md; COPY is its most prominent consumer.

The unload engine: `DoCopyTo`

COPY TO is simpler. BeginCopyTo either takes a table directly or, for COPY (query) TO, runs parse-analysis/rewrite/plan and wires a DestCopyOut receiver onto the query. DoCopyTo then either table-scans the relation, formatting each slot with CopyOneRowTo, or runs the plan and lets the copy_dest_receive callback format each emitted tuple:

// DoCopyTo (table-scan path) — src/backend/commands/copyto.c
scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
slot = table_slot_create(cstate->rel, NULL);
processed = 0;
while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
{
    CHECK_FOR_INTERRUPTS();
    slot_getallattrs(slot);            /* deconstruct the tuple */
    CopyOneRowTo(cstate, slot);        /* format and send */
    pgstat_progress_update_param(PROGRESS_COPY_TUPLES_PROCESSED, ++processed);
}

Each row is formatted in a per-row memory context that is reset every iteration, so the type output functions can leak freely without unbounded growth. The detail of running an arbitrary SELECT/INSERT ... RETURNING through a DestReceiver connects to postgres-ddl-execution.md and the executor docs.

The output side mirrors the input side’s vtable. CopyOneRowTo walks the slot’s attributes, and for text/CSV each non-null value is run through the type’s output function and then escaped by CopyAttributeOutText (or CopyAttributeOutCSV in CSV mode), which is the exact inverse of CopyReadAttributesText: control characters become \n, \t, \r, the delimiter is backslash-escaped, and NULLs are emitted as the null_print marker. The binary writer is symmetric to the binary reader — it emits the PGCOPY signature once, then a per-row int16 field count and length-prefixed bytea blobs from each type’s send function:

// CopyToBinaryOneRow — src/backend/commands/copyto.c
CopySendInt16(cstate, list_length(cstate->attnumlist));   /* field count */
foreach_int(attnum, cstate->attnumlist)
{
    Datum    value = slot->tts_values[attnum - 1];
    bool     isnull = slot->tts_isnull[attnum - 1];
    if (isnull)
        CopySendInt32(cstate, -1);                /* NULL = length of -1 */
    else
    {
        bytea *outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
        CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);     /* length */
        CopySendData(cstate, VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ);
    }
}
CopySendEndOfRow(cstate);

CopyToBinaryStart writes the fixed 19-byte header — an 11-byte signature ("PGCOPY\n\377\r\n\0"), a 4-byte flags word, and a 4-byte header-extension length — and CopyToBinaryEnd writes the -1 field-count trailer that the reader recognizes as the EOF marker. The exact same BinarySignature constant gates the reader (next section), so a binary dump produced by one server reloads on any other.

The text parse pipeline: four buffers, four stages

The single densest piece of COPY is the load-side parser in copyfromparse.c. Its file header lays out the staged pipeline explicitly: raw bytes are read into raw_buf, transcoded into input_buf, split into a single line in line_buf, and de-escaped into per-field slices of attribute_buf. The clever optimization is that when no encoding conversion is needed, raw_buf and input_buf are the same buffer, so transcoding collapses to a validation pass:

flowchart TD
    SRC["source<br/>file / PROGRAM pipe / client socket"] --> CGD["CopyGetData<br/>read bytes"]
    CGD --> RAW["raw_buf<br/>(RAW_BUF_SIZE)"]
    RAW --> CC{"need_transcoding?"}
    CC -->|yes| CONV["CopyConvertBuf<br/>encoding conversion"]
    CC -->|no| ALIAS["input_buf aliases raw_buf<br/>(validate only)"]
    CONV --> INP["input_buf<br/>(server encoding)"]
    ALIAS --> INP
    INP --> CRL["CopyReadLineText<br/>honor quote/escape, find EOL"]
    CRL --> LB["line_buf<br/>(one logical row)"]
    LB --> FMT{"format?"}
    FMT -->|text| CRAT["CopyReadAttributesText<br/>split on delim, de-escape"]
    FMT -->|csv| CRAC["CopyReadAttributesCSV<br/>quote-aware split"]
    CRAT --> AB["attribute_buf + raw_fields[]<br/>(de-escaped field slices)"]
    CRAC --> AB
    AB --> NCF["NextCopyFrom<br/>InputFunctionCall(Safe) per column"]
    NCF --> VALS["values[] / nulls[]<br/>typed Datums for the slot"]

CopyLoadRawBuf is the bottom of the stack: it slides any unprocessed bytes to the front of raw_buf, reads more via CopyGetData, and sets raw_reached_eof when the source is drained. CopyConvertBuf runs the encoding conversion routine only when need_transcoding is set; otherwise the validation comment in its header notes the buffers are aliased and it merely confirms the bytes are valid in the server encoding.

// CopyLoadRawBuf — src/backend/commands/copyfromparse.c
nbytes = RAW_BUF_BYTES(cstate);
if (nbytes > 0 && cstate->raw_buf_index > 0)
    memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index, nbytes);
cstate->raw_buf_len -= cstate->raw_buf_index;
cstate->raw_buf_index = 0;
/* Load more data */
inbytes = CopyGetData(cstate, cstate->raw_buf + cstate->raw_buf_len,
                      1, RAW_BUF_SIZE - cstate->raw_buf_len);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_len = nbytes;
if (inbytes == 0)
    cstate->raw_reached_eof = true;

Finding line boundaries: `CopyReadLineText`

CopyReadLineText is the state machine that converts a byte stream into one logical row in line_buf. The subtlety is quoting: in CSV mode a \r or \n inside a quoted field is data, not a row terminator, so the loop carries in_quote/last_was_esc flags and only treats a newline as end-of-line when not inside quotes. A second subtlety is encoding-safety — because every supported server encoding has the high bit set on all bytes of a multibyte character, the loop can scan byte-by-byte without ever mistaking a continuation byte for a delimiter:

// CopyReadLineText — src/backend/commands/copyfromparse.c
if (is_csv)
{
    quotec = cstate->opts.quote[0];
    escapec = cstate->opts.escape[0];
    /* ignore special escape processing if it's the same as quotec */
    if (quotec == escapec)
        escapec = '\0';
}
/* ... scan input_buf byte-by-byte, tracking in_quote, until an
 *     unquoted \n / \r / \r\n (or the \. end-marker) is found, copying
 *     the line — minus its terminator — into line_buf ... */

Splitting a text line into fields: `CopyReadAttributesText`

Once line_buf holds one row, CopyReadAttributesText walks it left-to-right, splitting on the (unescaped) delimiter and de-escaping backslash sequences (\t, \n, octal \013, hex \x1b) into a parallel attribute_buf, recording each field as a pointer into that buffer via raw_fields[]. A crucial ordering rule lives in its inner loop’s comment: the scanner must locate the end of the field and test it against the NULL marker before throwing any de-escaping syntax error, so that a field equal to \N is read as SQL NULL rather than an invalid escape:

// CopyReadAttributesText — src/backend/commands/copyfromparse.c
for (;;)                       /* outer loop over fields */
{
    start_ptr = cur_ptr;
    cstate->raw_fields[fieldno] = output_ptr;
    for (;;)                   /* inner loop scans + de-escapes one field */
    {
        c = *cur_ptr++;
        if (c == delimc) { found_delim = true; break; }
        if (c == '\\') { /* handle \n \t \013 \x1b ... */ }
        else *output_ptr++ = c;
    }
    /* compare raw field to null_print BEFORE committing de-escaped bytes */
}

CopyReadAttributesCSV is the RFC-4180 variant: a field that opens with the quote character is read verbatim (including embedded delimiters and newlines) until a closing quote, with a doubled quote ("") collapsing to one. Both functions return the field count, which NextCopyFrom checks against the column list.

Binary parsing: framed, length-prefixed, header-validated

The binary reader is the mirror image of the binary writer. The load opens by validating the 19-byte header in ReceiveCopyBinaryHeader: the 11-byte signature must memcmp-equal BinarySignature, the flags word must not set the old WITH-OIDS bit, and any header extension is skipped:

// ReceiveCopyBinaryHeader — src/backend/commands/copyfromparse.c
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
    memcmp(readSig, BinarySignature, 11) != 0)
    ereport(ERROR, ... "COPY file signature not recognized");
if (!CopyGetInt32(cstate, &tmp)) ... ;          /* flags */
if ((tmp & (1 << 16)) != 0)
    ereport(ERROR, ... "invalid COPY file header (WITH OIDS)");

Each row then begins with an int16 field count (-1 is the EOF marker), and each field is read by CopyReadBinaryAttribute: a 4-byte length (-1 = NULL), then that many bytes handed to the type’s receive function. Two integrity checks make the format self-describing — a negative non--1 length is rejected, and the receive function must consume the whole buffer or the row is declared malformed:

// CopyReadBinaryAttribute — src/backend/commands/copyfromparse.c
if (!CopyGetInt32(cstate, &fld_size))
    ereport(ERROR, ... "unexpected EOF in COPY data");
if (fld_size == -1) { *isnull = true; return ReceiveFunctionCall(flinfo, NULL, ...); }
if (fld_size < 0)  ereport(ERROR, ... "invalid field size");
/* load fld_size bytes into attribute_buf, then: */
result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf, typioparam, typmod);
if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
    ereport(ERROR, ... "incorrect binary data format");   /* must eat whole field */

Because binary fields carry no type names and no per-value text, the binary path has no soft-error story: ON_ERROR IGNORE is rejected for BINARY at option-parse time (only ON_ERROR STOP is legal), since a length-framed stream that desynchronizes cannot be resynchronized at a row boundary the way a text stream can at the next newline.

The insert-method decision, in full

CopyFrom resolves the CopyInsertMethod once, before the row loop, through a cascade of disqualifying conditions — each one a case where a buffered, not-yet-inserted tuple could be observed (or mis-handled) by something running mid-load:

// CopyFrom (insert-method cascade) — src/backend/commands/copyfrom.c
if (resultRelInfo->ri_TrigDesc != NULL &&
    (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
     resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
    insertMethod = CIM_SINGLE;                  /* trigger may query the table */
else if (resultRelInfo->ri_FdwRoutine != NULL && resultRelInfo->ri_BatchSize == 1)
    insertMethod = CIM_SINGLE;                  /* FDW can't batch */
else if (cstate->volatile_defexprs)
    insertMethod = CIM_SINGLE;                  /* volatile DEFAULT may query table */
else if (contain_volatile_functions(cstate->whereClause))
    insertMethod = CIM_SINGLE;                  /* volatile WHERE */
else
    insertMethod = proute ? CIM_MULTI_CONDITIONAL : CIM_MULTI;

CIM_MULTI_CONDITIONAL is the partitioned-table compromise: the parent qualifies for batching, but each leaf partition is only checked when the first row routes to it (a leaf might be a foreign table or carry its own BEFORE-ROW trigger), at which point leafpart_use_multi_insert is decided per partition.

Flushing a batch: `table_multi_insert` and the index/AFTER-trigger fan-out

When a buffer fills (CopyMultiInsertInfoIsFull) or the load ends, CopyMultiInsertBufferFlush hands the whole slot array to the table AM’s table_multi_insert in one call, then — because the AM only writes heap tuples — loops over the now-inserted slots to maintain indexes and fire AFTER-ROW triggers. The flush switches to a short-lived memory context first, because table_multi_insert is documented to leak:

// CopyMultiInsertBufferFlush — src/backend/commands/copyfrom.c
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
table_multi_insert(resultRelInfo->ri_RelationDesc,
                   slots, nused, mycid, ti_options, buffer->bistate);
MemoryContextSwitchTo(oldcontext);
for (i = 0; i < nused; i++)
{
    if (resultRelInfo->ri_NumIndices > 0)
        /* ExecInsertIndexTuples for slot i */ ;
    /* ExecARInsertTriggers for slot i */
}

For a foreign target the same function instead drives ExecForeignBatchInsert in ri_BatchSize-sized chunks — the FDW batch path covered in postgres-fdw.md. The heap side of table_multi_insert (one WAL record covering many tuples, heap_multi_insert) lives in postgres-heap-am.md.

Source Walkthrough

This section indexes the stable symbols that make up COPY, grouped by call-flow. Treat the symbol names as the durable anchors; the (file, line) pairs in the closing table are hints valid as of the updated: date.

Entry point and option parsing (`copy.c`)

DoCopy — the SQL-level entry. Resolves privileges (pg_read_server_files / pg_write_server_files / pg_execute_server_program for server-side file/program access), opens and locks the relation, applies the RLS rewrite to a query when needed, transforms the WHERE clause, and branches to the load or unload engine.
ProcessCopyOptions — folds the grammar’s DefElem list into a flat CopyFormatOptions, fills format-coupled defaults (tab vs comma delimiter, \N vs empty NULL marker, " quote/escape in CSV), and enforces cross-option legality (e.g. ON_ERROR IGNORE forbidden in BINARY; REJECT_LIMIT requires ON_ERROR IGNORE).
defGetCopyOnErrorChoice / defGetCopyLogVerbosityChoice / defGetCopyRejectLimit — keyword→enum mappers for the error-handling options.

Load engine and format vtable (`copyfrom.c`)

CopyFromGetRoutine — selects CopyFromRoutineCSV, CopyFromRoutineBinary, or CopyFromRoutineText from the resolved options.
CopyFromText / CSV / BinaryStart / OneRow / End callbacks — the three vtables. Text and CSV share start/infunc/end and differ only in the per-row callback; binary uses receive functions and validates the PGCOPY header in CopyFromBinaryStart → ReceiveCopyBinaryHeader.
BeginCopyFrom — opens the source, looks up per-column input (or receive) functions and typioparams, allocates the four parse buffers, evaluates default expressions, and — under ON_ERROR IGNORE — allocates the ErrorSaveContext.
CopyFrom — the load driver. Decides the CopyInsertMethod, sets ti_options (TABLE_INSERT_SKIP_FSM, TABLE_INSERT_FROZEN), installs CopyFromErrorCallback, and runs the per-row loop.
CopyMultiInsertInfoInit / …SetupBuffer / …Store / …IsFull / …Flush / CopyMultiInsertBufferFlush — the batching subsystem; MAX_BUFFERED_TUPLES (1000), MAX_BUFFERED_BYTES (65535), MAX_PARTITION_BUFFERS (32) bound it.
CopyFromErrorCallback — the errcontext callback that adds COPY rel, line N, column C (and the offending value, in text mode) to every error raised mid-load.

Parse pipeline (`copyfromparse.c`)

CopyGetData — lowest-level reader; abstracts file vs PROGRAM pipe vs frontend socket.
CopyLoadRawBuf — fills raw_buf, slides unprocessed bytes, sets raw_reached_eof.
CopyConvertBuf — encoding conversion raw_buf → input_buf (or a validate-only pass when the buffers are aliased).
CopyReadLine / CopyReadLineText — the quote-aware EOL state machine that fills line_buf.
CopyReadAttributesText / CopyReadAttributesCSV — split one line into de-escaped raw_fields[] slices of attribute_buf.
NextCopyFromRawFields / NextCopyFromRawFieldsInternal — public raw-field reader (also used by extensions); handles HEADER / HEADER MATCH validation.
NextCopyFrom — turns raw fields into typed values[]/nulls[] via InputFunctionCallSafe (or ExecEvalExpr for defaulted columns), routing soft errors to the ErrorSaveContext.
CopyFromTextLikeOneRow / CopyFromCSVOneRow / CopyFromBinaryOneRow — the per-row vtable callbacks.
ReceiveCopyBinaryHeader / CopyReadBinaryAttribute / CopyGetInt16 / CopyGetInt32 / CopyReadBinaryData — the binary framing layer.

Unload engine (`copyto.c`)

CopyToGetRoutine — selects the CopyToRoutine vtable.
BeginCopyTo — opens the destination; for COPY (query) TO runs parse-analysis/rewrite/plan and wires a DestCopyOut receiver.
DoCopyTo — the unload driver: table-scan + CopyOneRowTo, or run-the-plan with copy_dest_receive.
CopyOneRowTo — formats one slot; calls the format’s per-row callback.
CopyAttributeOutText / CopyAttributeOutCSV — the escaping inverse of the read-attributes functions.
CopyToBinaryStart / CopyToBinaryOneRow / CopyToBinaryEnd / CopySendInt16 / CopySendInt32 / CopySendData — the binary writer; BinarySignature header, length-prefixed fields, -1 trailer.

Position hints (as of 2026-06-05, REL_18 273fe94)

Symbol	File	Line
`DoCopy`	`src/backend/commands/copy.c`	62
`ProcessCopyOptions`	`src/backend/commands/copy.c`	536
`CopyFromGetRoutine`	`src/backend/commands/copyfrom.c`	156
`CopyFromBinaryStart`	`src/backend/commands/copyfrom.c`	221
`CopyFromErrorCallback`	`src/backend/commands/copyfrom.c`	254
`CopyMultiInsertInfoSetupBuffer`	`src/backend/commands/copyfrom.c`	380
`CopyMultiInsertInfoIsFull`	`src/backend/commands/copyfrom.c`	425
`CopyMultiInsertBufferFlush`	`src/backend/commands/copyfrom.c`	446
`CopyMultiInsertInfoFlush`	`src/backend/commands/copyfrom.c`	662
`CopyMultiInsertInfoStore`	`src/backend/commands/copyfrom.c`	756
`CopyFrom`	`src/backend/commands/copyfrom.c`	779
`BeginCopyFrom`	`src/backend/commands/copyfrom.c`	1529
`MAX_BUFFERED_TUPLES` (=1000)	`src/backend/commands/copyfrom.c`	63
`MAX_BUFFERED_BYTES` (=65535)	`src/backend/commands/copyfrom.c`	69
`MAX_PARTITION_BUFFERS` (=32)	`src/backend/commands/copyfrom.c`	75
`ti_options	= TABLE_INSERT_SKIP_FSM`	`src/backend/commands/copyfrom.c`
`ti_options	= TABLE_INSERT_FROZEN`	`src/backend/commands/copyfrom.c`
`BinarySignature`	`src/backend/commands/copyfromparse.c`	139
`ReceiveCopyBinaryHeader`	`src/backend/commands/copyfromparse.c`	190
`CopyConvertBuf`	`src/backend/commands/copyfromparse.c`	400
`CopyLoadRawBuf`	`src/backend/commands/copyfromparse.c`	590
`NextCopyFromRawFields`	`src/backend/commands/copyfromparse.c`	747
`NextCopyFrom`	`src/backend/commands/copyfromparse.c`	871
`CopyFromTextLikeOneRow`	`src/backend/commands/copyfromparse.c`	937
`CopyFromBinaryOneRow`	`src/backend/commands/copyfromparse.c`	1086
`CopyReadLineText`	`src/backend/commands/copyfromparse.c`	1234
`CopyReadAttributesText`	`src/backend/commands/copyfromparse.c`	1564
`CopyReadAttributesCSV`	`src/backend/commands/copyfromparse.c`	1818
`CopyReadBinaryAttribute`	`src/backend/commands/copyfromparse.c`	2013
`CopyToBinaryStart`	`src/backend/commands/copyto.c`	314
`CopyToBinaryOneRow`	`src/backend/commands/copyto.c`	345
`BeginCopyTo`	`src/backend/commands/copyto.c`	623
`DoCopyTo`	`src/backend/commands/copyto.c`	1026
`CopyOneRowTo`	`src/backend/commands/copyto.c`	1122
`CopyAttributeOutText`	`src/backend/commands/copyto.c`	1147
`CopyAttributeOutCSV`	`src/backend/commands/copyto.c`	1300

Source verification (as of 2026-06-05)

Verified facts

DoCopy dispatches to exactly two engines and applies the RLS rewrite/refusal. Verified in DoCopy (copy.c): is_from → BeginCopyFrom/CopyFrom/EndCopyFrom, else BeginCopyTo/DoCopyTo/EndCopyTo. Server-side file/program access is gated on the pg_read_server_files / pg_write_server_files / pg_execute_server_program roles; COPY rel TO under RLS is rewritten to COPY (SELECT …) TO and COPY FROM under RLS is refused.
Three formats are dispatched through a static const CopyFromRoutine / CopyToRoutine vtable. Verified in CopyFromGetRoutine (csv → CSV routine, binary → Binary routine, else Text) and the symmetric CopyToGetRoutine. Text and CSV share start/infunc/end; only the per-row callback differs.
The text load runs a four-buffer pipeline raw_buf → input_buf → line_buf → attribute_buf, and the first two alias when no transcoding is needed. Verified against the copyfromparse.c file header and CopyLoadRawBuf (the raw_buf == input_buf aliasing assertions) and CopyConvertBuf.
Multi-insert batching is bounded by both a tuple count and a byte count, and partition buffers are capped. Verified: MAX_BUFFERED_TUPLES = 1000, MAX_BUFFERED_BYTES = 65535, MAX_PARTITION_BUFFERS = 32, with CopyMultiInsertInfoIsFull testing the first two and CopyMultiInsertInfoFlush trimming the buffer list to the third.
The insert method is chosen once via a disqualification cascade. Verified in CopyFrom: BEFORE/INSTEAD-OF row triggers, a non-batching FDW (ri_BatchSize == 1), a partitioned statement-level insert trigger, volatile default expressions, or a volatile WHERE each force CIM_SINGLE; a partitioned target otherwise gets CIM_MULTI_CONDITIONAL, a plain target CIM_MULTI.
COPY FREEZE sets TABLE_INSERT_FROZEN only after a prior-activity check and a same-subtransaction check. Verified in CopyFrom: InvalidateCatalogSnapshot(), the ThereAreNoPriorRegisteredSnapshots() / ThereAreNoReadyPortals() guard, and the rd_createSubid / rd_newRelfilelocatorSubid == GetCurrentSubTransactionId() test, all gating ti_options |= TABLE_INSERT_FROZEN. TABLE_INSERT_SKIP_FSM is set independently when the relfilenode is new in this subxact.
Soft errors flow through InputFunctionCallSafe + an ErrorSaveContext, are counted in num_errors, and only ON_ERROR STOP is legal for BINARY. Verified in CopyFromTextLikeOneRow (the InputFunctionCallSafe(...) soft-error branch that bumps num_errors and conditionally emits the VERBOSE NOTICE) and the option checks in ProcessCopyOptions. The binary per-row callback CopyFromBinaryOneRow has no soft-error branch.
The binary format is header-validated and length-framed. Verified in ReceiveCopyBinaryHeader (the BinarySignature memcmp, the WITH-OIDS flag rejection), CopyReadBinaryAttribute (-1 length = NULL; the “incorrect binary data format” check that the receive function consumed the whole field), and the symmetric writer CopyToBinaryStart / CopyToBinaryOneRow / CopyToBinaryEnd.

Open questions

The practical ceiling on MAX_PARTITION_BUFFERS churn. A COPY that round-robins across thousands of partitions evicts and rebuilds multi-insert buffers repeatedly (32-buffer cap, oldest-first eviction). The amortized cost of buffer setup/teardown under an adversarial partition-key ordering is not obvious from the code; investigation path is to instrument CopyMultiInsertInfoFlush’s trim loop under a synthetic interleaved load.
Interaction of COPY FREEZE with parallel-safe table AMs. The freeze eligibility check is heap-AM-shaped (same-subxact relfilenode); how a future custom table AM that does not establish visibility via the heap’s t_infomask frozen bits would honor TABLE_INSERT_FROZEN is left to postgres-table-am.md.

Beyond PostgreSQL — Comparative Designs & Research Frontiers

PostgreSQL’s COPY sits at a deliberately modest point in the bulk-load design space: a single-process, row-at-a-time parser feeding a batched inserter, with no parallelism inside one COPY statement. The comparative landscape is wide.

External bulk loaders and the \copy client variant. The psql-side \copy is not a server feature at all — it runs COPY … FROM STDIN and streams a client-local file over the protocol, which is why it needs no server-file privilege. Tools like pg_bulkload (out of core, not covered here) bypass the buffer manager and WAL by writing heap pages directly, trading crash-safety for speed — the same trade other systems expose as a “direct path” load.

Direct-path and minimally-logged loads elsewhere. Oracle’s SQL*Loader direct path and SQL Server’s BULK INSERT with TABLOCK + minimal logging both formalize what COPY FREEZE only gestures at: under an exclusive table lock on a freshly created or empty object, the loader writes formatted pages straight to the data files and emits only an allocation-level redo record rather than a per-row log. PostgreSQL’s WAL design (full-tuple logging unless wal_level = minimal lets a same-transaction-created relation skip WAL entirely) reaches a similar place by a different route; the wal_level = minimal optimization is the closest analogue, and COPY FREEZE layers visibility-skipping on top.

Parallel and distributed ingest. Greenplum and Citus shard the input and run one loader per segment/worker; cloud warehouses (Snowflake, BigQuery, Redshift COPY) parse many input files concurrently across compute nodes and commit a manifest. PostgreSQL core has no intra-statement COPY parallelism — the parser is a serial state machine — though the staged-buffer design (the four-buffer pipeline) is exactly the shape one would pipeline across threads, and the format vtable is exactly the seam at which a parallel parser would plug in. The relevant theory anchor is the classic observation in the Architecture of a Database System survey (Hellerstein, Stonebraker, Hamilton; captured in dbms-papers/fntdb07-architecture.md) that admission and loading are throughput-bound on CPU for parse/transcode, which is why every serious warehouse parallelizes the parser, not the inserter.

The format vtable as an extension seam. Recent PostgreSQL versions made CopyFromRoutine / CopyToRoutine an extensible surface (copyapi.h), so an extension can register a custom COPY format — Arrow, Parquet-row, JSON-lines — that reuses the entire row-handling core (partition routing, batching, FREEZE, soft errors) and supplies only start/infunc/onerow/end. This is the same plug-point philosophy as the table AM (postgres-table-am.md) and the index AM (postgres-index-am.md): a narrow vtable that lets out-of-core code ride the in-core machinery. It is the single most consequential recent change in the COPY subsystem’s shape.

Error-tolerant ingest as a first-class feature. ON_ERROR IGNORE

REJECT_LIMIT + LOG_VERBOSITY brings PostgreSQL toward what warehouses have long offered (ON_ERROR = CONTINUE / MAXERROR / reject tables). The PostgreSQL implementation is notable for reusing the generic soft-error infrastructure (InputFunctionCallSafe + ErrorSaveContext, see postgres-error-handling.md) rather than building a COPY-specific error path — COPY is simply that machinery’s most prominent consumer. A natural frontier is a reject-rows-to-a-table sink (currently the rows are only counted and optionally logged, not captured), which the ErrorSaveContext already has enough structure to support.

Sources

PostgreSQL source (REL_18_STABLE, commit 273fe94, as of 2026-06-05):
- src/backend/commands/copy.c — DoCopy, ProcessCopyOptions, option enum mappers, privilege/RLS handling.
- src/backend/commands/copyfrom.c — CopyFrom, the CopyFromRoutine vtables, BeginCopyFrom, the CopyMultiInsert* batching subsystem, COPY FREEZE eligibility.
- src/backend/commands/copyfromparse.c — the four-buffer parse pipeline (CopyLoadRawBuf, CopyConvertBuf, CopyReadLineText, CopyReadAttributesText/CSV, NextCopyFrom), and the binary framing (ReceiveCopyBinaryHeader, CopyReadBinaryAttribute).
- src/backend/commands/copyto.c — DoCopyTo, BeginCopyTo, CopyOneRowTo, CopyAttributeOutText/CSV, the binary writer.
- src/include/commands/copyfrom_internal.h, src/include/commands/copyapi.h, src/include/commands/copy.h — the CopyFromState/CopyToState structs, the format-routine vtable types, and the public CopyFormatOptions.
Theory / textbook anchors:
- Database System Concepts (Silberschatz, Korth, Sudarshan) — storage and physical-organization cost model for per-row vs bulk loading.
- Architecture of a Database System (Hellerstein, Stonebraker, Hamilton; knowledge/research/dbms-papers/fntdb07-architecture.md) — process model and the CPU-bound nature of parse/admission.
Cross-references within this KB:
- postgres-heap-am.md — table_multi_insert / heap_multi_insert, TABLE_INSERT_FROZEN heap semantics.
- postgres-ddl-execution.md — running COPY (query) TO through a DestReceiver; partition routing setup.
- postgres-error-handling.md — ErrorSaveContext / InputFunctionCallSafe soft-error infrastructure.
- postgres-fdw.md — ExecForeignBatchInsert batch path.
- postgres-table-am.md, postgres-index-am.md — the vtable / extension-seam design pattern COPY’s format routines follow.