I chose Zig, love it, and still eat my own dogfood

It started innocently. Someone asked: can pgrest handle subrequests? The module serves PostgreSQL results over HTTP. SSI includes and the mirror module both issue subrequests. Subrequests have been a first-class feature in nginz since day one — the echoz module had been doing the exact same r->main->count += 1 dance for over a year, across thousands of test runs. This wasn’t untested territory. It was supposed to be a five-minute wiring job.

The worker crashed instead.

That crash kicked off a debugging descent that consumed three AI models — Sonnet, Opus, and DeepSeek v4 — across five rounds of isolation, and more wall-clock hours than any of us want to admit. At the bottom, we found something that now seems too simple to have taken so long: a six-byte gap between where our Zig binding thought a bitfield lived and where the C compiler actually put it. The binding had been fine for years — across nginx 1.26, 1.28, and early 1.30 builds. Then at some point, something shifted. A configure flag we tweaked. A struct reordering in a point release. I honestly don’t know which change broke it, only that port ended up at offset 1200 with count:16 packed tight at 1202, and our packed struct(u64) — with its 8-byte alignment — now sat six bytes late at 1208. Same binding, new layout, phantom padding. We’d been lucky until we weren’t.

This post is the postmortem: what happened, why it was so hard to find, how we fixed it, and what we built to make sure it never happens again.

The symptom: segfault, inconsistent, silent

Let’s be specific about what “crash” meant. The pgrest module pools PostgreSQL connections. When a request arrives, it increments r->main->count to prevent nginx from freeing the main request while the pgrest connection is active. When the response comes back, it decrements the count. Standard nginx module bookkeeping.

When pgrest ran as the primary request handler — a normal location /pgrest { pgrest_pass; } — everything worked. The echoz module had used the same r->main->count pattern for over a year without issue. Subrequests were not new to us. The count increment and decrement were battle-tested.

When pgrest ran as a subrequest — triggered by an SSI include or a mirror directive — the worker would segfault, but not deterministically. Sometimes the first request worked. Sometimes the third. Sometimes a keepalive connection survived five requests before dying on the sixth. The crashes moved around.

The crash signature itself was misleading. The backtrace pointed to nginx’s subrequest cleanup machinery — ngx_http_finalize_request → ngx_http_postpone_filter → freed-memory access. That suggested a use-after-free on the request struct itself, which we assumed was a pgrest pool lifecycle bug. We were wrong.

The debugging descent: five iterations

The first iteration was the obvious one. Sonnet led on this — inspect the pool lifecycle, suspect a double-release. We added pool-level tracing, tagged every connection state transition with a sequence counter, and dumped a trace of the last 32 events before every crash. The traces showed clean lifecycle management — every acquire had a matching release. No double-frees. No use of freed memory. Sonnet was right to start here; the hypothesis was wrong, but the process confirmed what wasn’t broken.

The second iteration suspected the NJS post-subrequest callback path. DeepSeek v4 pointed out that when NJS scripts call r.subrequest() and then r.return() in the callback, the timing is tight: nginx decrements r->main->count after the callback returns, but if the callback already set a final response code, the decrement can happen on a request that’s already been finalized. DeepSeek designed the isolation: null out ctx.request before releasing the pool connection, preserving the count hold for nginx’s cleanup path rather than the callback’s. We restructured the pgrest finalization path accordingly. This fixed a real ordering issue — but it wasn’t the root cause. The crashes got rarer, but didn’t stop. DeepSeek’s structural insight was correct and stayed in the code; the problem was it only fixed one of two overlapping bugs.

The third iteration went deeper into the pool connection dispatch. Sonnet and DeepSeek traded off on isolation paths — each narrowing the search space by eliminating hypotheses and adding assertions. We added full-lifecycle trace logs with 128-byte buffers per event, enough to capture the exact state of every pgrest context at the moment of crash. We ran the SSI test suite in a loop — 200 iterations, stopping on crash. The logs showed that ctx.count_held was consistent, that main_count_inc and main_count_dec were paired, and that the pool never handed out the same connection twice. Something else was corrupting the request struct — but after three rounds, we’d eliminated every theory that pointed at pgrest.

Then Opus said the thing that broke it open: “I don’t believe this shit. Let me log the actual byte offset of r->main->count and assert it against what the binding thinks.”

Ten minutes later we had the answer. The binding said offset 1208. The C runtime said 1202. Six bytes of phantom padding. Opus didn’t out-think the problem — it refused to think at all and demanded the machine tell us the truth. That’s when we stopped looking at pgrest and started looking at the binding.

The insight: `packed struct(u64)` is not a C bitfield

When I wrote the Zig binding for ngx_http_request_t, the nginx source had a long run of consecutive bitfields — 55 bits in total across count:16, subrequests:8, blocked:8, and two dozen single-bit flags. I packed them all into a single packed struct(u64). Greedy, convenient, one struct to hold them all. Translate-c doesn’t even attempt bitfields; it skips them entirely. So I wrote this by hand, and I wrote it wrong — not because a packed struct(u64) can’t hold 55 bits, but because I never asked where the struct itself would land.

Here’s the problem. Zig aligns a packed struct(u64) to an 8-byte boundary. C aligns a bitfield run to the alignment of its declared storage unit — which is unsigned int, a 4-byte type. And crucially, C doesn’t require the bitfield to start at an aligned offset at all: if there’s room in the current storage unit, the bitfield starts right after the previous field, regardless of alignment.

The count:16 bitfield in nginx’s ngx_http_request_t sits immediately after port, which is an in_port_t — a 2-byte field at offset 1200. The C compiler sees a 2-byte field at 1200, notes that there are 2 unused bytes in the current 4-byte unsigned int unit, and packs count:16 into bytes 1202–1203. No padding. No alignment. The bitfield starts at byte 1202.

Zig sees port at offset 1200, then sees packed struct(u64) as the next field. A packed struct(u64) requires 8-byte alignment. The next 8-byte-aligned offset after 1200 is 1208. So Zig inserts 6 bytes of padding and places flags0 at 1208.

C compiler (gcc / clang)

1202

byte offset of r→count:16

Zig packed struct(u64)

1208

byte offset of flags0

Every time our pgrest module did r.*.main.*.flags0.count += 1, it was writing to offset 1208 — six bytes past the real count. That byte lands in the tail end of flags0 itself, corrupting whatever bitfields sit there in the packed struct. Meanwhile, nginx’s own code would later read r->count at byte 1202, see it unchanged (because our increment landed elsewhere), and free the request structure while we were still holding a reference to it. Use-after-free. Segfault.

The fix in the binding was a single keyword: align(2). By explicitly telling Zig to align flags0 on a 2-byte boundary (matching the in_port_t that precedes it), the packed struct slides back to byte 1202 — exactly where C put it. This isn’t Zig’s fault — packed struct alignment rules are documented and reasonable. I just assumed the layout would match C without verifying, because bitfields seemed trivial.

// I wrote this, assuming it would land at the same byte as the C bitfield.
// It doesn't. packed struct(u64) is 8-byte aligned → offset 1208.
flags0: struct_ngx_http_request_flag0_s,

// Fixed: force 2-byte alignment → offset 1202, matches C
flags0: struct_ngx_http_request_flag0_s align(2),

Six bytes. One keyword. My mistake. Five debugging iterations.

Why we didn’t find it sooner — and why it used to work

The honest answer is that the binding wasn’t always wrong. We’ve been through nginx 1.26, 1.28, and multiple 1.30 builds — echoz incremented count across all of them without a hiccup. At some point the layout shifted. Maybe it was the recent nginx configure changes. Maybe a struct field got reordered in a point release. We don’t know, and right now it doesn’t matter. What matters is that we had no checkpoint to tell us when the drift happened.

There’s a deeper reason this class of bug is so rare: almost every nginx struct field is 8-byte aligned. Pointers, ngx_uint_t, off_t, time_t — all naturally land on 8-byte boundaries on 64-bit. A packed struct(u64) after any of those lands at the same byte C expects. The mismatch only bites when a bitfield run follows one of the narrow fields: in_port_t (2 bytes), u_char (1 byte), c_int (4 bytes). These are the exceptions in an otherwise pointer-heavy struct design, and they’re exactly where our binding drifted.

Subrequests made the drift fatal. Under subrequest dispatch, nginx checks r->main->count after issuing the subrequest and again after its callback returns. Our increment landed at offset 1208 instead of 1202 — nginx never saw the count change. The main request got freed while the subrequest was still active. Use-after-free. The crash moved around because the timing of the free depended on allocator state, which varied with every run.

This is the signature of a layout mismatch bug: crashes that move, symptoms that look like use-after-free, and invariants that hold in all your tracing but fail in production. The code is right. The addresses it’s touching are wrong.

🔬

The rule: if the preceding field is narrower than the backing integer, Zig inserts padding that C does not.

This holds for packed struct(u32) after a u8, packed struct(u64) after a u16, and any combination where the backing integer's alignment exceeds the preceding field's natural alignment. C doesn't care — it packs bitfields into the current storage unit, alignment be damned.

The audit: 58 packed structs, four at risk

Finding one bug raises the obvious question: how many more are there?

We audited every packed struct in the nginz Zig bindings. Fifty-eight of them, spanning ngx.zig (nginx core structs) and ngx_http.zig (HTTP structs). For each one, we checked the preceding field’s type against the backing integer’s alignment.

Most were fine — preceded by a pointer or ngx_uint_t, both 8-byte aligned on 64-bit, matching packed struct(u64) alignment. But four were at risk:

struct	backing	preceding field	risk
`ngx_http_request_t.flags0`	u64	port: in_port_t (2B)	bug — 6-byte misalignment
`ngx_http_request_t.flags1`	u64	flags0 (8B, but cascade)	bug — cascades off flags0
`ngx_ssl_connection_t.flags`	u32	early_buf: u_char (1B)	likely — 3-byte padding hole
`ngx_slab_pool_t.flags`	u32	zero: u_char (1B)	likely — 3-byte padding hole

The flags1 case is particularly insidious. Its preceding field is flags0, which after the fix is 8 bytes at offset 1202 — so flags1 starts at 1210. But before the fix, flags1 started at 1216 because flags0 sat at 1208. Fixing flags0 cascaded flags1 back to the correct offset. If we had only fixed one, we’d have swapped one misalignment for another.

The workaround: raw pointer arithmetic as a temporary bridge

Before we understood the full scope, we needed pgrest subrequests to work. The immediate fix was to bypass the Zig binding entirely — read and write r->main->count at the hardcoded byte offset that C uses:

const NGX_REQUEST_COUNT_OFFSET: usize = 1202;

fn main_count_ptr(r_main: [*c]ngx_http_request_t) *u16 {
    const base: [*]u8 = @ptrCast(@cast(r_main));
    return @ptrCast(@alignCast(base + NGX_REQUEST_COUNT_OFFSET));
}

This is ugly. It hardcodes a byte offset that could change across nginx versions. It requires manual pointer casting that the type system can’t verify. But it works — and it confirmed the diagnosis. When the raw-pointer version replaced the binding-field version, the crashes stopped. Every one of them. The SSI test suite passed 200 consecutive iterations, which had never happened before.

The workaround stayed in place through the audit and the fix, and was finally removed in the hardening pass — replaced with the native field access that had always been the intent.

The fix recipes: three patterns

The root cause is systematic — any Zig packed struct(uN) following a sub-aligned field will drift. So we developed three fix recipes, in decreasing order of preference:

Recipe A

Pull out byte-aligned wide bitfields as plain integers

When a bitfield is exactly a byte, halfword, or word width (count:16, subrequests:8, buffered:8), extract it from the packed struct and declare it as a regular field at the correct offset. This gives native field semantics — r.count = 1 instead of r.flags0.count = 1 — and eliminates the packed struct entirely for the extracted fields. Keep the remaining sub-byte bits in a reduced packed struct(u32).

We applied this to ngx_http_request_t: count, subrequests, and blocked became top-level fields. The remaining 40+ single-bit flags stayed in the packed struct, which now has a smaller backing integer and clean alignment.

Recipe B

Add `align(N)` override on the packed struct

When all bitfields are sub-byte and the only problem is that the backing integer is over-aligned for the preceding field, force the packed struct to a smaller alignment that matches the C layout. packed struct(u32) align(1) after a u_char; packed struct(u64) align(2) after an in_port_t. The internal bit layout remains unchanged — only the struct's position shifts.

We applied this to ngx_ssl_connection_t.flags and ngx_slab_pool_t.flags, both of which follow a single-byte field and had 3 bytes of phantom padding.

Recipe C

Split `packed struct(u64)` into multiple `packed struct(u32)`

The packed struct(u64) in the bindings was my attempt to hold all the bits in one backing integer — but C never uses a single 64-bit storage unit for bitfields. It uses separate 32-bit unsigned int units. Splitting the packed struct along C storage unit boundaries, each with the correct align(N), matches the real layout. This is what Recipe A produces when you pull out the wide fields — the remaining sub-byte bits fall into smaller packed structs naturally.

The choice between them depends on the struct. Wide bitfields that get accessed frequently (like count) benefit from Recipe A because the access syntax is cleaner and @bitOffsetOf lookups are avoided at comptime. Structs where every field is a single-bit flag benefit from Recipe B because it’s a one-line change with full backward compatibility.

The hardening: make the C compiler tell us the truth

Fixing the bindings is half the battle. The other half is making sure they stay fixed — through nginx version upgrades, through compiler changes, through the next engineer who adds a packed struct without reading this post.

We built two layers of defense:

Layer 1: Runtime cross-check in tools/check_layout.c. A C probe program, compiled against the exact same nginx binary our modules run against — same ./configure flags, same compile-time options. Struct layout in nginx depends heavily on configure-time choices (--with-stream, --with-http_v2_module, etc.), so just having the headers isn’t enough. We hard-wire check_layout.c to the same nginx build configuration as production, so the struct shapes it measures are the struct shapes our modules actually encounter.

offsetof() is undefined behavior on bitfields, so we use a memset-and-scan technique: zero two instances, set the field to 1 in one of them, and scan for the first byte that differs. This gives us the actual byte offset as determined by the C compiler — no guessing, no assumptions.

The probe was extended from one field (count) to over 60 fields, covering every packed struct in the bindings. Not just the first bitfield in each run — every field at every storage-unit boundary. A padding shift inside a packed struct would be caught by the @bitOffsetOf guards; a shift of the whole struct would be caught by the probe. Both failures now fail CI.

Layer 2: Comptime guards in the Zig bindings. A comptime flag_byte_offset() helper computes the byte offset of any bitfield from the Zig binding itself, and compares it against the known-C value at compile time:

fn flag_byte_offset(comptime Parent: type, comptime flag_field: []const u8, comptime sub: []const u8) usize {
    const FlagType = @TypeOf(@field(@as(Parent, undefined), flag_field));
    return @offsetOf(Parent, flag_field) + @bitOffsetOf(FlagType, sub) / 8;
}

// If any of these fail at comptime, the binding is wrong:
try expectEqual(flag_byte_offset(ngx_http_request_t, "flags0", "count"), 1202);
try expectEqual(flag_byte_offset(ngx_http_request_t, "flags0", "aio"), 1206);
try expectEqual(flag_byte_offset(ngx_http_request_t, "flags1", "background"), 1216);
// ... 60+ more guards

These guards run as part of zig build test. They don’t need the C compiler — they derive their answers from the binding itself. If a future nginx upgrade shifts a bitfield offset, the build fails before any module gets to corrupt memory.

The concrete payoffs

After the hardening pass, every bitfield in every binding has a comptime guard. The pgrest module’s raw-pointer workaround is gone. r.*.main.*.flags0.count += 1 now writes to the right byte. The check-layout step in CI catches mismatches before they merge.

And pgrest as an SSI subrequest? It works. Not just in the test suite — in the composability that was always the point:

location /ssi/users {
    ssi on;
    return 200 '<!DOCTYPE html>
<html>
<body>
  <h1>Users</h1>
  <ul>
    <!--# include virtual="/pgrest/users?select=id,name" -->
  </ul>
</body>
</html>';
}

location /pgrest/ {
    pgrest_pass postgresql://db/userdb;
}

A single pgrest location, serving subrequests from SSI, mirror, and NJS — all sharing the same connection pool, all maintaining the same r->main->count protection. The feature that crashed the worker now works because the six bytes we were writing to are the right six bytes.

Why it’s silly in retrospect

Afterthoughts are always easier than the work. The fix was six characters: align(2). The root cause was documented in Zig’s language reference — packed struct alignment matches its backing integer. The mismatch was deterministic and reproducible once we knew what to look for.

But the gap between “should have known” and “actually knows” is where production bugs live. Zig doesn’t warn about alignment mismatches in C interop — why would it? I was the one claiming the struct matched the C layout. The struct sizes matched C. The offsets of non-bitfield members matched C. Every check I had — @sizeOf, @offsetOf for normal fields — passed, so I assumed the bitfields were also correct. They weren’t.

The lesson isn’t “read the Zig reference more carefully.” It’s that layout correctness between two languages requires measuring every field at the byte level, and bitfields need the most scrutiny because they’re the only fields where Zig’s model and C’s model diverge structurally. If you’re writing Zig bindings to C structs with bitfields, measure every bitfield byte offset at runtime from the C side and compare at comptime from the Zig side. Don’t assume your packed struct matches C just because the sizeof checks out. Trust the compiler that owns the struct layout.

We now have a 61-line checklist in BITFIELDS.md, a 181-line C probe program, and 66 comptime guards. None of that existed four days ago. The build catches what we missed before — and if nginx 1.32 changes a bitfield offset, we’ll know before the first test runs.

The agentic coding insight: log the machine, not your theory

There’s a meta-lesson here about debugging with AI, and it’s not about which model is smarter. It’s about what kind of instruction produces breakthroughs.

🧠

Sonnet

hypothesis generator

Led the first iteration — pool lifecycle tracing. Came up with most of the isolation paths. Sonnet thinks in branching scenarios: "if it's not X, it's either Y or Z." Great at narrowing the search space, not afraid to be wrong. The wrong hypotheses were still valuable because each one got instrumented and eliminated.

🔍

DeepSeek v4

structural analyst

Caught the NJS post-subrequest callback ordering issue — a real bug hiding alongside the layout bug. Designed the null-ctx.request-before-pool-release pattern that stayed in the final code. DeepSeek thinks in structural invariants: "what must be true before this function returns?" Excellent at finding the second-order consequences of a design.

💥

Opus

breakthrough trigger

The one who refused to believe. After three rounds of narrowing, Opus said: log the actual byte offset and assert it. Not "let's think about what else could be wrong" — let's make the machine tell us. Ten minutes to write the probe, thirty seconds to see 1202 vs 1208. Opus didn't find the root cause by reasoning. It found it by refusing to reason further without data.

Each model was indispensable, but Opus’s contribution was the one that taught us something about the process itself. Three rounds of hypothesis-driven debugging — “it might be the pool,” “it might be the callback ordering,” “it might be dispatch timing” — had eliminated every sensible theory. The correct answer wasn’t sensible. It wasn’t reachable by staring at the code, tracing control flow, or inspecting invariants. It was only reachable by asking the machine: where did you actually put this field?

This is the pattern. When you’re deep in a debugging session with an AI and you’ve eliminated three or four coherent hypotheses, the remaining possibilities are not coherent. They’re layout bugs, compiler bugs, ABI mismatches, cosmic rays. You cannot think your way to them. You have to instrument the assumptions you’re taking for granted.

Don’t overthink. Put assertions in the code. Log the values you assume are correct. Make the machine contradict you.

Sonnet and DeepSeek v4 are outstanding at designing the isolation experiments and narrowing the problem space. But Opus won this round because it was the first to say: stop reasoning about the code, start interrogating the addresses. That’s the difference between an inference engine and an action engine — and it’s why the three of them make an excellent team.

📋

Write this into every CLAUDE.md and AGENTS.md.

When a hypothesis survives three rounds of elimination but the bug persists, stop reasoning. Instrument your assumptions. Log the values you trust implicitly — field offsets, struct sizes, pointer addresses. Act and assert your guess. Do not assume things lightly. It is faster to measure than to think, and the machine will tell you the truth in seconds that reasoning would take hours to circle.

What to take away

📐

C bitfields and Zig packed structs have different alignment rules. Zig aligns a packed struct(uN) to its backing integer's natural alignment. C packs bitfields into the current storage unit regardless. When the preceding field is narrower than the backing integer, the two layouts diverge.

🔬

Measure bitfield offsets from the C side at runtime. offsetof() is UB on bitfields. Use memset + set-field-to-1 + byte-scan. The C compiler is the only source of truth for C struct layout. Don't infer it from Zig — measure it.

🛡️

Guard every bitfield at comptime. A try expectEqual(flag_byte_offset(...), NNNN) costs one line per field and catches layout shifts before runtime. Pair it with the C-side probe for double coverage. CI fails on mismatch — from either direction.

→

Coming post: the AI gateway modules — how we're building an LLM proxy inside nginx that shares the same request lifecycle, the same connection pool, and the same bitfield layout guards. The hard part isn't the AI. It's the infrastructure.