PostgreSQL Wire Protocol — FE/BE Framing, Startup Handshake, and the Simple/Extended Query Loops
Contents:
- Theoretical Background
- Common DBMS Design
- PostgreSQL’s Approach
- Source Walkthrough
- Source verification (as of 2026-06-05)
- Beyond PostgreSQL — Comparative Designs & Research Frontiers
- Sources
Theoretical Background
Section titled “Theoretical Background”A database server speaks to its clients over a session-oriented binary protocol — a contract specifying how raw bytes on a bidirectional stream are divided into messages, what each message type means, and in what order messages may legally appear. Three properties define the design space:
-
Framing. How does a receiver know where one message ends and the next begins? Options range from delimiter-based (newline, NUL), to fixed-length headers, to length-prefixed packets. The choice drives how much the receiver needs to buffer and how robust the protocol is to partial reads on a TCP stream.
-
Message typing. Is each message identified by a type code, a fixed position in a handshake sequence, or both? A type-code-first design allows the receiver to skip unknown messages or impose per-type length bounds before reading the body — important for security.
-
Query execution model: simple vs. extended. The simplest possible protocol is a one-shot request/response cycle: send SQL text, receive results, send
ReadyForQuery. Real engines also expose a prepared statement protocol:Parse(compile SQL),Bind(supply parameters),Execute(run) — the extended query model from JDBC/ODBC lineage. The two models differ in when the server sends itsReadyForQueryresponse and in how errors interact with subsequent messages in a pipeline.
Database System Concepts (Silberschatz et al.) frames the client–server interaction as “a series of request/response exchanges” and notes that network round-trips dominate latency for OLTP workloads, which makes the pipelining capability of the extended query protocol architecturally significant. Architecture of a Database System (Hellerstein et al., §“Client Communication Manager”) identifies the CCM layer as the component that translates between the application’s logical query requests and the byte stream the network delivers.
PostgreSQL’s own framing answer is: type-byte + 4-byte length prefix for every post-startup message, with a special no-type-byte packet shape for the initial startup message (which has no fixed type code in v3 — the protocol version itself identifies the message). This framing allows the server to validate the length word before allocating memory for the body, which is a deliberate defence against garbage data causing an out-of-memory condition.
Common DBMS Design
Section titled “Common DBMS Design”This section names the engineering patterns that almost every client-server DBMS adopts when building a wire protocol, so that PostgreSQL’s specific choices read as selections within a shared space rather than as ad-hoc inventions.
Duplex send-buffer / receive-buffer pair
Section titled “Duplex send-buffer / receive-buffer pair”Every real engine keeps two kernel-bypass buffers: a receive buffer
(fills from recv(2); the protocol parser peeks and consumes from it)
and a send buffer (the message-builder appends into it; flushed to
send(2) at controlled points). This decouples message construction from
I/O: the builder can fail partway through without sending a partial
message, because the bytes never leave the process until flush is called.
The send buffer doubles as a coalescing layer — multiple small messages
(column descriptions, data rows) are packed into one syscall.
Length-word validation before body allocation
Section titled “Length-word validation before body allocation”A secure protocol reads the length word first, validates it against a
per-type maximum, and only then allocates a buffer for the body. Without
this guard, a malicious client can send a garbage length word (e.g.,
2 GB) and exhaust the server’s memory before the payload bytes arrive.
PostgreSQL’s SocketBackend is a textbook example: the type byte arrives,
the switch block selects maxmsglen based on the type, and pq_getmessage
enforces that bound before reading.
Two-phase startup: negotiation then authentication
Section titled “Two-phase startup: negotiation then authentication”Nearly every production DBMS separates the startup phase (TLS/GSSAPI
negotiation, protocol version agreement) from the authentication phase
(password, SASL, Kerberos). The negotiation phase may loop — the client
sends a TLS request, the server replies with a single byte (S or N),
and the client may then re-send a startup packet — before committing to
the regular framing. Authentication then proceeds as a typed-message
exchange with an AuthenticationOk terminal.
Simple vs. extended query modes
Section titled “Simple vs. extended query modes”The simple query mode (Q / ReadyForQuery) is a stateless
request/response cycle. The extended query mode (P/B/E/S /
ReadyForQuery) is a pipelined state machine:
- Parse compiles SQL into a named or unnamed prepared statement and
returns
ParseComplete. - Bind supplies parameter values and output format codes, creating a
named or unnamed portal, and returns
BindComplete. - Execute runs the portal up to an optional row limit, returning data
rows and a
CommandComplete(orPortalSuspendedfor cursor-like partial execution). - Sync flushes the pipeline and unconditionally sends
ReadyForQuery, even if errors occurred mid-pipeline.
The key difference from simple mode: in extended mode, an error during
Parse or Bind does not immediately send ReadyForQuery; the server
enters skip-till-Sync mode and discards all further extended messages
until the client sends Sync. This lets clients pipeline multiple
P/B/E bursts without having to wait for a round trip after each.
Per-message CAN encoding: type byte first
Section titled “Per-message CAN encoding: type byte first”A robust framing puts the type byte before the length word, so a
desynced receiver can scan for known type bytes to recover synchronisation.
PostgreSQL v3 does exactly this for all backend-to-frontend and
frontend-to-backend messages except the startup packet (which has no type
byte, predating this rule). Once sync is lost, PostgreSQL treats the
condition as fatal (ERRCODE_PROTOCOL_VIOLATION, connection teardown) —
recovery is not attempted, because there is no safe way to find the next
message boundary without the type byte.
Theory ↔ PostgreSQL mapping
Section titled “Theory ↔ PostgreSQL mapping”| Theory / convention | PostgreSQL name |
|---|---|
| Session-oriented binary protocol | PostgreSQL FE/BE protocol v3 (PG_PROTOCOL(3,0)) |
| Receive buffer | PqRecvBuffer[PQ_RECV_BUFFER_SIZE] in pqcomm.c |
| Send buffer | PqSendBuffer[PQ_SEND_BUFFER_SIZE] in pqcomm.c |
| Message type byte | first byte of every post-startup message; PqMsg_* constants in protocol.h |
| Length-word validation | maxmsglen selected in SocketBackend before pq_getmessage |
| Negotiation loop (TLS/GSSAPI) | ProcessStartupPacket in backend_startup.c |
| Authentication exchange | sendAuthRequest in auth.c (AUTH_REQ_* codes) |
| Simple query loop | PqMsg_Query → exec_simple_query → ReadyForQuery |
| Extended query machine | PqMsg_Parse/Bind/Execute/Sync → exec_parse_message etc. |
| Skip-till-Sync | ignore_till_sync flag in PostgresMain |
| ReadyForQuery with txn status | ReadyForQuery sends TransactionBlockStatusCode() in its body |
| Message-builder API | pq_beginmessage / pq_send* / pq_endmessage in pqformat.c |
| Flush chokepoint | pq_flush (via ReadyForQuery), PqMsg_Flush in extended mode |
PostgreSQL’s Approach
Section titled “PostgreSQL’s Approach”PostgreSQL’s wire protocol is version 3 (PG_PROTOCOL(3,0)), introduced
in PostgreSQL 7.4. Every backend process — whether a normal client backend,
a WAL sender, or a background worker receiving commands — runs the same
PostgresMain loop (in tcop/postgres.c). The transport layer lives in
backend/libpq/pqcomm.c; message construction and parsing live in
pqformat.c; the startup and authentication handshake lives in
tcop/backend_startup.c. These three layers are cleanly separated: message
builders never call send(2) directly; they append to the send buffer and
the caller invokes pq_flush (or ReadyForQuery, which flushes
implicitly).
Framing: the type-byte + length-word pair
Section titled “Framing: the type-byte + length-word pair”Every post-startup message from either direction has the form:
| 1 byte: type code | 4 bytes: length (includes these 4 bytes) | body ... |The protocol.h header defines a PqMsg_* constant for each type. The
frontend-to-backend types used in the main loop are:
// PqMsg_* constants — include/libpq/protocol.h#define PqMsg_Query 'Q' /* simple Query */#define PqMsg_Parse 'P' /* extended: Parse */#define PqMsg_Bind 'B' /* extended: Bind */#define PqMsg_Execute 'E' /* extended: Execute */#define PqMsg_Describe 'D' /* extended: Describe */#define PqMsg_Close 'C' /* extended: Close */#define PqMsg_Flush 'H' /* extended: Flush */#define PqMsg_Sync 'S' /* extended: Sync */#define PqMsg_FunctionCall 'F' /* legacy fast-path */#define PqMsg_Terminate 'X' /* disconnect */#define PqMsg_CopyData 'd' /* COPY data */#define PqMsg_CopyDone 'c' /* COPY done */#define PqMsg_CopyFail 'f' /* COPY abort *//* auth responses share 'p': GSSResponse, PasswordMessage, SASLInitialResponse, SASLResponse */Backend-to-frontend responses:
#define PqMsg_AuthenticationRequest 'R'#define PqMsg_ParameterStatus 'S'#define PqMsg_BackendKeyData 'K'#define PqMsg_ReadyForQuery 'Z'#define PqMsg_RowDescription 'T'#define PqMsg_DataRow 'D'#define PqMsg_CommandComplete 'C'#define PqMsg_ErrorResponse 'E'#define PqMsg_NoticeResponse 'N'#define PqMsg_ParseComplete '1'#define PqMsg_BindComplete '2'#define PqMsg_CloseComplete '3'#define PqMsg_NoData 'n'#define PqMsg_PortalSuspended 's'#define PqMsg_ParameterDescription 't'#define PqMsg_EmptyQueryResponse 'I'#define PqMsg_NegotiateProtocolVersion 'v'/* COPY: CopyInResponse 'G', CopyOutResponse 'H', CopyBothResponse 'W' */The send/receive buffer pair (pqcomm.c)
Section titled “The send/receive buffer pair (pqcomm.c)”pqcomm.c owns two fixed-size ring buffers:
// pqcomm.c (condensed) — send/recv buffer layout#define PQ_SEND_BUFFER_SIZE 8192#define PQ_RECV_BUFFER_SIZE 8192
static char *PqSendBuffer; /* heap-allocated; can grow via pq_putmessage_noblock */static int PqSendBufferSize;static size_t PqSendPointer; /* write position in PqSendBuffer */static size_t PqSendStart; /* flush position in PqSendBuffer */
static char PqRecvBuffer[PQ_RECV_BUFFER_SIZE];static int PqRecvPointer; /* next byte to consume */static int PqRecvLength; /* valid data end */All socket I/O goes through the PQcommMethods vtable, whose default
implementation is PqCommSocketMethods. This indirection exists to
support alternative transports (e.g., the parallel-query shm_mq path
uses a different vtable without changing the callers):
// PQcommMethods vtable — pqcomm.cstatic const PQcommMethods PqCommSocketMethods = { .comm_reset = socket_comm_reset, .flush = socket_flush, .flush_if_writable = socket_flush_if_writable, .is_send_pending = socket_is_send_pending, .putmessage = socket_putmessage, .putmessage_noblock = socket_putmessage_noblock};const PQcommMethods *PqCommMethods = &PqCommSocketMethods;The message-builder API (pqformat.c) calls into PqCommMethods only at
pq_endmessage, which invokes PqCommMethods->putmessage. The builder
itself just appends into a StringInfo buffer:
// pq_beginmessage / pq_endmessage — libpq/pqformat.c (condensed)voidpq_beginmessage(StringInfo buf, char msgtype){ initStringInfo(buf); buf->cursor = msgtype; /* type byte stashed in cursor field */}
voidpq_endmessage(StringInfo buf){ /* emit type byte + 4-byte length + body in one putmessage call */ (void) pq_putmessage(buf->cursor, buf->data, buf->len); pfree(buf->data); buf->data = NULL;}The 4-byte length word is prepended by socket_putmessage internally,
not by the builder — callers never serialise the length themselves.
Startup and authentication handshake (backend_startup.c)
Section titled “Startup and authentication handshake (backend_startup.c)”The startup sequence runs before PostgresMain. It is owned by
BackendRun → BackendInitialize → ProcessStartupPacket:
sequenceDiagram
participant C as Client
participant S as Server (backend_startup.c)
C->>S: (optional) SSLRequest<br/>(len=8, code=80877103)
S-->>C: 'S' (SSL ok) or 'N' (no SSL)
note over C,S: TLS handshake if 'S'
C->>S: (optional) GSSENCRequest<br/>(len=8, code=80877104)
S-->>C: 'G' or 'N'
note over C,S: GSSAPI channel if 'G'
C->>S: StartupMessage<br/>(len + proto=196608 + key=val pairs + NUL)
S-->>C: AuthenticationRequest 'R' (AUTH_REQ_*)
C->>S: password / SASL / GSSAPI response ('p')
S-->>C: AuthenticationOk 'R' (code=0)
S-->>C: ParameterStatus 'S' (server_version, client_encoding, …)
S-->>C: BackendKeyData 'K' (pid + cancel key)
S-->>C: ReadyForQuery 'Z' (txn status = 'I')
Figure 1 — PostgreSQL v3 startup handshake. The SSL and GSSAPI
negotiation steps (a single-byte response to a special request code)
precede the typed-message phase. The startup packet itself has no type
byte — the protocol version word doubles as its identifier. After
AuthenticationOk, the server sends ParameterStatus messages for all
session GUCs before BackendKeyData and ReadyForQuery.
ProcessStartupPacket reads the first four bytes of the startup packet as
a 32-bit big-endian length, then reads len - 4 bytes into a heap buffer,
and inspects the protocol version word at the front:
// ProcessStartupPacket — tcop/backend_startup.c (condensed)port->proto = proto = pg_ntoh32(*((ProtocolVersion *) buf));
if (proto == CANCEL_REQUEST_CODE) { ProcessCancelRequestPacket(...); return STATUS_ERROR; }if (proto == NEGOTIATE_SSL_CODE && !ssl_done) { /* send 'S'/'N', retry */ goto retry; }if (proto == NEGOTIATE_GSSENC_CODE && !gss_done){ /* send 'G'/'N', retry */ goto retry; }/* else: normal startup, proto should be PG_PROTOCOL(3,0) */The “retry” pattern is the loop that allows the client to layer TLS on top
of a plain TCP connection before sending the real startup packet. The
server responds with a raw single byte (not a protocol message) and the
client retries ProcessStartupPacket with ssl_done = true.
After ProcessStartupPacket, PerformAuthentication (in auth.c) calls
sendAuthRequest, which wraps pq_beginmessage + pq_sendint32 + body +
pq_endmessage for each challenge-response round, and terminates with
sendAuthRequest(port, AUTH_REQ_OK, NULL, 0) on success.
Before sending ReadyForQuery, PostgresMain emits BackendKeyData:
// PostgresMain — tcop/postgres.c (condensed, BackendKeyData send)pq_beginmessage(&buf, PqMsg_BackendKeyData);pq_sendint32(&buf, (int32) MyProcPid);pq_sendbytes(&buf, MyCancelKey, MyCancelKeyLength);pq_endmessage(&buf);/* Need not flush since ReadyForQuery will do it. */The cancel key length is 32 bytes for PG_PROTOCOL >= 3.2 (PG18 default)
and 4 bytes for older protocol negotiation. ReadyForQuery is then called,
which sends 'Z' + TransactionBlockStatusCode() + flushes the send
buffer, completing the startup phase.
The main message loop: PostgresMain
Section titled “The main message loop: PostgresMain”PostgresMain is an infinite for(;;) loop structured as seven numbered
steps on every iteration:
// PostgresMain — tcop/postgres.c (condensed, main loop skeleton)MessageContext = AllocSetContextCreate(TopMemoryContext, "MessageContext", ...);
for (;;){ doing_extended_query_message = false;
// (1) Send ReadyForQuery if idle if (send_ready_for_query) { ReportChangedGUCOptions(); ReadyForQuery(whereToSendOutput); /* sends 'Z' + txn status, flushes */ send_ready_for_query = false; }
// (2) Enable async signal delivery while waiting for client DoingCommandRead = true;
// (3) Block here until a message arrives firstchar = ReadCommand(&input_message);
// (4-5) Disable async signals, check interrupts // (6) Reload config if SIGHUP arrived // (7) Dispatch on message type if (ignore_till_sync && firstchar != EOF) continue;
switch (firstchar) { /* ... */ }}ReadCommand dispatches to SocketBackend (for remote connections) or
InteractiveBackend (for --single mode). SocketBackend reads the type
byte via pq_getbyte, selects a per-type maxmsglen, and reads the body
via pq_getmessage:
// SocketBackend — tcop/postgres.c (condensed)pq_startmsgread();qtype = pq_getbyte(); /* blocks until 1 byte arrives */
switch (qtype){ case PqMsg_Query: maxmsglen = PQ_LARGE_MESSAGE_LIMIT; break; case PqMsg_Parse: case PqMsg_Bind: maxmsglen = PQ_LARGE_MESSAGE_LIMIT; doing_extended_query_message = true; break; case PqMsg_Execute: case PqMsg_Close: case PqMsg_Describe: case PqMsg_Flush: maxmsglen = PQ_SMALL_MESSAGE_LIMIT; doing_extended_query_message = true; break; case PqMsg_Sync: maxmsglen = PQ_SMALL_MESSAGE_LIMIT; ignore_till_sync = false; break; case PqMsg_Terminate: maxmsglen = PQ_SMALL_MESSAGE_LIMIT; break; default: ereport(FATAL, (errcode(ERRCODE_PROTOCOL_VIOLATION), ...));}pq_getmessage(inBuf, maxmsglen); /* reads 4-byte length then body */RESUME_CANCEL_INTERRUPTS();return qtype;MessageContext is reset at the top of each iteration, releasing all
per-message memory from the previous cycle.
flowchart TD
A["PostgresMain loop top<br/>reset MessageContext"] --> B{"send_ready_for_query?"}
B -- "yes" --> C["ReadyForQuery<br/>(sends 'Z' + txn status, flushes)"]
C --> D["DoingCommandRead = true"]
B -- "no" --> D
D --> E["ReadCommand<br/>(blocks on socket)"]
E --> F{"ignore_till_sync<br/>and not EOF?"}
F -- "yes" --> A
F -- "no" --> G["switch firstchar"]
G -- "'Q'" --> H["exec_simple_query<br/>send_ready_for_query = true"]
G -- "'P'" --> I["exec_parse_message<br/>(no RFQ yet)"]
G -- "'B'" --> J["exec_bind_message<br/>(no RFQ yet)"]
G -- "'E'" --> K["exec_execute_message<br/>(no RFQ yet)"]
G -- "'S' Sync" --> L["finish_xact_command<br/>send_ready_for_query = true"]
G -- "'X'/EOF" --> M["proc_exit(0)"]
H --> A
I --> A
J --> A
K --> A
L --> A
Figure 2 — PostgresMain main loop. The loop resets MessageContext,
sends ReadyForQuery when idle, blocks on ReadCommand, and dispatches on
the message type. Simple-Query (‘Q’) sets send_ready_for_query directly;
extended-query messages accumulate without sending ReadyForQuery until
Sync (‘S’) or error.
Simple query mode
Section titled “Simple query mode”exec_simple_query implements the full simple-Query cycle:
- Call
start_xact_command()— opens an implicit transaction if none is active. - Call
pg_parse_query(query_string)inMessageContext— produces a list ofRawStmtnodes. - For each parse tree:
CreateCommandTag→BeginCommand(sends no bytes; sets up the destination) → analyze/plan/execute viaPortalRun→EndCommand(sendsCommandComplete 'C'). - After the last statement: caller’s loop sets
send_ready_for_query = true, soReadyForQuerygoes out at the top of the next iteration.
An empty query (empty string or only whitespace) skips steps 3 and sends
EmptyQueryResponse 'I' via NullCommand(dest).
Extended query mode
Section titled “Extended query mode”The extended protocol is a four-message state machine. Each message arrives
and is dispatched independently. The server accumulates results without
sending ReadyForQuery until the client sends Sync:
Parse (exec_parse_message): compiles the query string and creates a
CachedPlanSource (unnamed or named prepared statement). Responds with
ParseComplete '1'.
Bind (exec_bind_message): takes parameter values and result format
codes, creates a Portal via CreatePortal. Responds with BindComplete '2'.
Execute (exec_execute_message): runs the portal via PortalRun up
to max_rows. Returns DataRow 'D' rows, then CommandComplete 'C' (or
PortalSuspended 's' if max_rows was reached). Does not send
ReadyForQuery.
Sync (PqMsg_Sync): calls EndImplicitTransactionBlock + finish_ xact_command, sets send_ready_for_query = true. ReadyForQuery goes
out next iteration.
Describe (exec_describe_statement_message / exec_describe_portal_ message): returns ParameterDescription 't' (for statements) and/or
RowDescription 'T' (for portals with output).
Flush (PqMsg_Flush): calls pq_flush() without sending
ReadyForQuery. Intended for interactive use where the client wants
buffered output without committing the pipeline.
sequenceDiagram
participant C as Client
participant S as PostgresMain
C->>S: Parse 'P' (stmt_name, sql, param_types)
S-->>C: ParseComplete '1'
C->>S: Bind 'B' (portal_name, stmt_name, params, formats)
S-->>C: BindComplete '2'
C->>S: Describe 'D' (portal)
S-->>C: RowDescription 'T'
C->>S: Execute 'E' (portal_name, max_rows=0)
S-->>C: DataRow 'D' (×N)
S-->>C: CommandComplete 'C'
C->>S: Sync 'S'
S-->>C: ReadyForQuery 'Z'
Figure 3 — Extended query protocol: one Parse/Bind/Execute/Sync cycle
with a Describe step. The server holds ReadyForQuery until Sync. If an
error occurs anywhere before Sync, the server enters ignore_till_sync
mode and discards all further P/B/E/D/F messages until the Sync arrives,
then sends an ErrorResponse followed by ReadyForQuery.
Error recovery and ignore_till_sync
Section titled “Error recovery and ignore_till_sync”The ignore_till_sync flag is the protocol’s error-recovery mechanism in
extended mode. When doing_extended_query_message is true and an error
occurs (the sigsetjmp recovery path fires), the error handler sets:
// PostgresMain error recovery path — tcop/postgres.c (condensed)if (doing_extended_query_message) ignore_till_sync = true;The main loop then skips every incoming message until PqMsg_Sync clears
the flag (ignore_till_sync = false). This keeps the client and server in
sync even when the client has already sent a burst of P/B/E messages
that follow the failing one. The PqMsg_Sync handler resets the flag in
SocketBackend (before message body is read) so the server never tries to
interpret a Sync body under the skip.
If sync is lost at the framing level — for example, because pq_is_ reading_msg() returns true during error recovery, meaning a partial
message body was in flight — the backend escalates to FATAL:
if (pq_is_reading_msg()) ereport(FATAL, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("terminating connection because protocol synchronization was lost")));ReadyForQuery and the transaction status byte
Section titled “ReadyForQuery and the transaction status byte”ReadyForQuery (in tcop/dest.c) sends a one-byte transaction status
indicator drawn from TransactionBlockStatusCode():
// ReadyForQuery — tcop/dest.c (condensed)pq_beginmessage(&buf, PqMsg_ReadyForQuery);pq_sendbyte(&buf, TransactionBlockStatusCode());pq_endmessage(&buf);pq_flush();The status byte carries one of three values: 'I' (idle — not in a
transaction), 'T' (in a transaction block), or 'E' (in a failed
transaction block, where only ROLLBACK is accepted). Clients use this
byte to present the correct prompt (=>, =#, !# in psql).
pq_flush() is the only place in the normal request/response cycle where
the send buffer is unconditionally flushed. The message builder never
flushes; individual message sends (pq_endmessage) only write into the
send buffer. This means all the data rows and completion messages for a
query accumulate in the send buffer and are delivered in as few
send(2) calls as the kernel allows, with ReadyForQuery triggering the
final flush.
Source Walkthrough
Section titled “Source Walkthrough”Anchor on symbol names, not line numbers. Use
git grep -n '<symbol>' src/backend/tcop/ src/backend/libpq/to relocate; line numbers in the table below are hints scoped to commit273fe94.
Protocol constants (include/libpq/protocol.h)
Section titled “Protocol constants (include/libpq/protocol.h)”PqMsg_*— all frontend-to-backend and backend-to-frontend type bytes.PG_PROTOCOL(m, n)— constructs the protocol version word ((m<<16)|n).CANCEL_REQUEST_CODE,NEGOTIATE_SSL_CODE,NEGOTIATE_GSSENC_CODE— special codes that precede the typed startup message.PQ_SMALL_MESSAGE_LIMIT,PQ_LARGE_MESSAGE_LIMIT— per-type message size bounds enforced inSocketBackend.
Transport layer (libpq/pqcomm.c)
Section titled “Transport layer (libpq/pqcomm.c)”pq_init— allocatePqSendBuffer, set upFeBeWaitSet, registersocket_closeon-exit callback.pq_getbyte— read one byte from the receive buffer (refills from socket if empty).pq_getmessage— read a length-prefixed message body into aStringInfo; enforcesmaxmsglen.pq_startmsgread/pq_endmsgread— bracket a message read; setsPqCommReadingMsgso error recovery can detect partial reads.PqCommMethods/PqCommSocketMethods— vtable;socket_putmessagewrites type byte + length + body intoPqSendBuffer.socket_flush(exposed aspq_flush) — callsinternal_flushto drainPqSendBufferto the socket.PqSendBuffer/PqRecvBuffer— the fixed-size I/O buffers.ProcessClientReadInterrupt/ProcessClientWriteInterrupt— inject interrupt checks around blocking socket calls.
Message builder (libpq/pqformat.c)
Section titled “Message builder (libpq/pqformat.c)”pq_beginmessage—initStringInfo+ stash type byte inbuf->cursor.pq_sendbyte/pq_sendint/pq_sendint32/pq_sendbytes/pq_sendstring— append typed values to theStringInfobuffer.pq_endmessage— callpq_putmessage(buf->cursor, buf->data, buf->len); free the buffer.pq_getmsgbyte/pq_getmsgint/pq_getmsgstring/pq_getmsgend— parse an inboundStringInfobody.
Startup handshake (tcop/backend_startup.c)
Section titled “Startup handshake (tcop/backend_startup.c)”ProcessStartupPacket— read startup packet; branch onCANCEL_REQUEST_CODE/NEGOTIATE_SSL_CODE/NEGOTIATE_GSSENC_CODE/ normal startup; parse key=value GUC pairs.SendNegotiateProtocolVersion— sendPqMsg_NegotiateProtocolVersionlisting unrecognised options.
Authentication (libpq/auth.c)
Section titled “Authentication (libpq/auth.c)”PerformAuthentication— top-level auth dispatcher; callshba_getauthmethodthen the appropriate method.sendAuthRequest—pq_beginmessage(PqMsg_AuthenticationRequest)+pq_sendint32(areq)+ optional extra data +pq_endmessage; called for each challenge and for the terminalAUTH_REQ_OK.
Main loop and dispatch (tcop/postgres.c)
Section titled “Main loop and dispatch (tcop/postgres.c)”PostgresMain— setup + infinite dispatch loop withsigsetjmprecovery; createsMessageContext.ReadCommand— callsSocketBackendorInteractiveBackend.SocketBackend— read type byte, validate + selectmaxmsglen, read body; setdoing_extended_query_message.exec_simple_query— simple Query cycle: parse → foreach-tree → plan/execute →EndCommand; caller setssend_ready_for_query.exec_parse_message— extended Parse: compile →CachedPlanSource; sendParseComplete.exec_bind_message— extended Bind: parameters →Portal; sendBindComplete.exec_execute_message— extended Execute:PortalRun; send rows +CommandComplete/PortalSuspended.exec_describe_statement_message/exec_describe_portal_message— Describe: sendParameterDescriptionand/orRowDescription.ignore_till_sync— skip-till-Sync flag; set on error in extended mode.doing_extended_query_message— true while an extended-query message is being processed; controls whether errors arm the skip.
ReadyForQuery (tcop/dest.c)
Section titled “ReadyForQuery (tcop/dest.c)”ReadyForQuery—pq_beginmessage(PqMsg_ReadyForQuery)+pq_sendbyte(TransactionBlockStatusCode())+pq_endmessage+pq_flush.
Position hints (as of 2026-06-05, REL_18 273fe94)
Section titled “Position hints (as of 2026-06-05, REL_18 273fe94)”| Symbol | File | Line |
|---|---|---|
PqMsg_Query (first constant) | include/libpq/protocol.h | 19 |
pq_init | libpq/pqcomm.c | 174 |
pq_getbyte | libpq/pqcomm.c | 964 |
pq_getmessage | libpq/pqcomm.c | 1204 |
pq_beginmessage | libpq/pqformat.c | 88 |
pq_endmessage | libpq/pqformat.c | 296 |
ProcessStartupPacket | tcop/backend_startup.c | 492 |
SendNegotiateProtocolVersion | tcop/backend_startup.c | 936 |
sendAuthRequest | libpq/auth.c | 677 |
ReadyForQuery | tcop/dest.c | 256 |
SocketBackend | tcop/postgres.c | 353 |
ReadCommand | tcop/postgres.c | 481 |
PostgresMain | tcop/postgres.c | 4188 |
exec_simple_query | tcop/postgres.c | 1012 |
exec_parse_message | tcop/postgres.c | 1390 |
exec_bind_message | tcop/postgres.c | 1625 |
exec_execute_message | tcop/postgres.c | 2108 |
exec_describe_statement_message | tcop/postgres.c | 2642 |
exec_describe_portal_message | tcop/postgres.c | 2735 |
Source verification (as of 2026-06-05)
Section titled “Source verification (as of 2026-06-05)”Facts about the source at commit
273fe94, readable without external materials. Open questions follow.
Verified facts
Section titled “Verified facts”-
Every post-startup frontend message is read via
SocketBackend, which validates the length word against a per-typemaxmsglenbefore callingpq_getmessage. Verified inSocketBackend(tcop/postgres.c). The two bounds arePQ_LARGE_MESSAGE_LIMIT(forQuery,Parse,Bind,CopyData,FunctionCall) andPQ_SMALL_MESSAGE_LIMIT(forExecute,Close,Describe,Flush,Sync,Terminate,CopyDone,CopyFail). This guards against malformed length words exhausting server memory. -
PqMsg_Syncclearsignore_till_syncinsideSocketBackend(before the body is read), not in the dispatch switch. Verified inSocketBackend. The implication: aSyncmessage received while in skip mode resets the flag immediately on reading the type byte, so the server will read the Sync body cleanly and the dispatch switch can setsend_ready_for_query = true. -
ReadyForQueryis the only call site that unconditionally flushes the send buffer in the normal request cycle. Verified inReadyForQuery(dest.c) andPostgresMain.pq_endmessagecallspq_putmessagewhich appends toPqSendBufferbut does not flush. The flush happens only atReadyForQuery(viapq_flush) and at explicitPqMsg_Flushmessages in the extended protocol. -
The cancel key length is 32 bytes for protocol ≥ 3.2 and 4 bytes for older. Verified in
PostgresMain(tcop/postgres.c):len = (MyProcPort->proto >= PG_PROTOCOL(3, 2)) ? MAX_CANCEL_KEY_LENGTH : 4.MAX_CANCEL_KEY_LENGTHis 32. PG18 negotiates 3.2 by default. -
pq_beginmessagestashes the type byte inbuf->cursor, not as the first byte ofbuf->data. Verified inpqformat.c. The type byte is serialised bysocket_putmessagewhenpq_endmessagecallspq_putmessage. This means theStringInfobody is purely the message payload; callers can computebuf->lenas the payload length without accounting for the type byte. -
The
PQcommMethodsvtable allows non-socket transports without changing message-builder callers. Verified:PqCommMethodsis aconst PQcommMethods *global;pq_putmessagecallsPqCommMethods->putmessage. Parallel query workers set a different vtable (shm_mq-backed) beforePostgresMainruns. -
MessageContextis reset once per main loop iteration, at the top of the loop. Verified inPostgresMain. This bounds per-message memory use and ensures thatStringInfobuffers allocated for the current message body are freed at the next iteration boundary, not at transaction commit.
Open questions
Section titled “Open questions”-
Pipeline mode (PG17+) and
ignore_till_syncinteraction. TheEndImplicitTransactionBlockcall in thePqMsg_Synchandler suggests awareness of pipeline-mode implicit transaction blocks. How pipeline mode changes the error-recovery semantics when multiple transactions are in flight within one sync cycle is not fully traced here. Investigation path: read the pipelining section of the protocol documentation and traceIsInPipelineMode/EndImplicitTransactionBlockintcop/postgres.c. -
WAL sender reuse of PostgresMain. The same
PostgresMainloop handles WAL-sender processes (am_walsender == true), which routePqMsg_Querytoexec_replication_commandfirst. Howforbidden_in_wal_senderguards the extended-query messages, and what replication commands are allowed, are partially traced; seereplication/walsender.cand a futurepostgres-wal-sender-receiver.md. -
The
PqMsg_Progress 'P'constant.protocol.hdefinesPqMsg_Progress 'P'alongsidePqMsg_Parse 'P'. How this constant is used — whether it is a distinct message type or a documentation alias — is not clear from a reading ofpqcomm.candpostgres.calone. Investigation path:git grep PqMsg_Progress.
Beyond PostgreSQL — Comparative Designs & Research Frontiers
Section titled “Beyond PostgreSQL — Comparative Designs & Research Frontiers”Pointers, not analysis. Each bullet is a starting handle for a follow-up document.
-
MySQL wire protocol (MySQL Client/Server Protocol). MySQL uses a different framing (3-byte length + 1-byte sequence number before the type byte) and a distinct concept of a COM_QUERY vs. COM_STMT_ family. The extended-query equivalent (
COM_STMT_PREPARE/COM_STMT_EXECUTE) predates PostgreSQL’s v3 protocol. A side-by-side of error-recovery under pipelining would compareignore_till_syncwith MySQL’s multi-statement buffering behaviour. -
The pgwire project / generic PG-protocol servers. Multiple databases (ClickHouse, CockroachDB, YugabyteDB, DuckDB via extension) have implemented a PostgreSQL-compatible wire protocol layer, allowing PostgreSQL clients to connect without modification. The appeal is JDBC/ ODBC driver reuse. The cost is fidelity to edge cases (cancel key semantics,
NegotiateProtocolVersion,ParameterStatusexpectations). This is the real-world existence proof that the protocol is spec-complete. -
JDBC pipelining vs. libpq pipelining. libpq gained explicit pipelining support (PG14
PQpipelineStatus/PQenterPipelineMode), allowing clients to send multipleP/B/Ebatches without waiting for eachReadyForQuery. The JDBC driver’s own batching predates this and operates differently (buffering at the driver layer, not at the protocol layer). A comparison of the two pipeline models againstignore_till_syncsemantics would clarify where error recovery is the driver’s responsibility vs. the protocol’s. -
Protocol authentication evolution. The
sendAuthRequestmachinery already supports SASL/SCRAM-SHA-256 and GSSAPI. SCRAM-SHA-256-PLUS (channel binding) arrived in PG11. The authentication module is the natural companion to this document; seepostgres-authentication.md(planned,client-protocolsubcategory). -
Cancel request design. The cancel key is sent out-of-band as a new TCP connection (not on the session connection), because the session is blocked waiting for query results. PG18 extends the cancel key to 32 bytes (see verified facts above). The design rationale and the security properties (cancel tokens are not secrets — any process that learns the pid+key can cancel a query) deserve a deeper look alongside
postgres-postmaster.md.
Sources
Section titled “Sources”Protocol specification
Section titled “Protocol specification”- PostgreSQL documentation, “Frontend/Backend Protocol” chapter — canonical message format reference, startup sequence, extended query protocol, error recovery.
PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)
Section titled “PostgreSQL source (under /data/hgryoo/references/postgres, REL_18 273fe94)”src/backend/tcop/postgres.c—PostgresMain,SocketBackend,ReadCommand,exec_simple_query,exec_parse_message,exec_bind_message,exec_execute_message,exec_describe_statement_message,exec_describe_portal_message.src/backend/tcop/backend_startup.c—ProcessStartupPacket,SendNegotiateProtocolVersion.src/backend/tcop/dest.c—ReadyForQuery.src/backend/libpq/pqcomm.c— send/receive buffers,pq_init,pq_getbyte,pq_getmessage,PqCommMethods.src/backend/libpq/pqformat.c—pq_beginmessage,pq_endmessage,pq_send*,pq_getmsg*.src/backend/libpq/auth.c—PerformAuthentication,sendAuthRequest.src/include/libpq/protocol.h—PqMsg_*constants,PG_PROTOCOL, special request codes.
Textbook chapters (under knowledge/research/dbms-general/)
Section titled “Textbook chapters (under knowledge/research/dbms-general/)”- Architecture of a Database System (Hellerstein et al.), §“Client Communication Manager” — CCM layer framing, request/response model.
- Database System Concepts (Silberschatz et al.) — client–server interaction model, network round-trip cost.
Cross-references (sibling module docs)
Section titled “Cross-references (sibling module docs)”postgres-backend-lifecycle.md— howPostgresMainis reached frompostmaster→BackendRun→InitPostgres.postgres-authentication.md— (planned) deep dive onauth.c.postgres-xact.md—start_xact_command/finish_xact_commandthat wrap each message cycle.postgres-tls-gssapi.md— (planned)be-secure-openssl.c/be- secure-gssapi.cthat sit underpqcomm.c.postgres-wal-sender-receiver.md— (planned) WAL sender reuse ofPostgresMain.