How I cut KV reads by 12× the day before launch

6 min read·1,203 words
Properties 6
titleHow I cut KV reads by 12x the day before launch
tagslaunchtechnicalperformancecloudflarebehind-the-scenes
created2026-05-15
updated2026-05-15
authorMaheen
id01KRKXST61P4PNMHBBEHTFVFA4

I'm publishing this at the worst possible time, the day before I launch BrainShare publicly. The launch slice is live, the letter is written, the Show HN draft is queued. And I just shipped a perf rewrite that touches the hottest path in the worker.

Bad timing or perfect timing. I'll tell you in 48 hours.

The numbers

Scenario Before After Change
Warm render (cached) 3.4s 0.7s ~5× faster
Cold render (cache miss, wrapindex present) 3.4s 0.9-1.0s ~3.5× faster
Cold render (truly fresh, fallback path) 3.4s 1.95s ~1.7× faster + builds index
KV reads per cold render ~2N + M + 2 (≈38) 3 ~12× fewer

That last row is the one I care about. KV is BrainShare's entire data layer. Reads are the operation that dominates cost AND latency. Cutting them by 12× isn't a micro-optimization; it changes what the free tier can handle.

What the old path looked like

Rendering a single note inside a wrap meant the worker did, roughly:

  1. wrap:{wrapId}: 1 read for the wrap descriptor (title, ULID list, gating config)
  2. For every ULID in the wrap, note:{ulid} + meta:{ulid}. That's 2 × N reads, in parallel.
  3. For every canvas ULID, canvasmeta:{ulid}. That's M reads.
  4. Plus 1-2 incidental reads for assets metadata or gate-related state.

For a wrap with ~17 notes and a handful of canvases, that lands around 38 reads per cold render. Every. Single. Page. Load.

On Cloudflare's free tier (100,000 KV reads/day), that's a ceiling of about 2,600 page views per day before you hit the wall. For a launch you're hoping to push to HN front page, that ceiling is hours away from being a problem.

What the new path does

Instead of N+M parallel reads, the worker now does one read against a pre-aggregated key:

wrapindex:{wrapId}

This key holds everything needed to render any note inside the wrap, in a single JSON blob: every ULID's metadata, every basename → ULID mapping for wikilink resolution, every canvas reference, the share-set, the folder tree structure, and a small content fingerprint.

When a note page is requested, the worker:

  1. Reads wrap:{wrapId} (for the gate check)
  2. Reads wrapindex:{wrapId} (for share-set, sidebar, backlinks)
  3. Reads note:{ulid} (for the actual markdown of the current note)

Three reads total. Down from 38.

The trick is that the wrapindex contains the metadata for every note in the wrap, not the bodies. The current note's body is the only markdown we need at render time; the others are referenced for wikilinks and the sidebar tree, where only their basename and ULID matter.

Cache invalidation: the actual hard part

The wrapindex is great, but a stale wrapindex is worse than no wrapindex. The whole architecture hinges on knowing when to rebuild it.

Three trigger points:

  1. When a wrap is PUT. The worker computes the new index synchronously before responding. Reader sees fresh data on the next request.
  2. When a note is PUT. The worker looks up noterefs:{ulid} (reverse index: which wraps contain this note?) and queues a background rebuild via ctx.waitUntil(buildWrapIndex(...)). The async tradeoff is acceptable because the fallback path (next section) handles index staleness gracefully.
  3. When a wrap is DELETE'd. Same noterefs lookup, but to clean up.

Each invalidation also bumps a wrapver:{wrapId} counter, which keys the Cloudflare Cache API entry for the rendered HTML. Increment the counter → next read misses the HTML cache → re-renders with fresh wrapindex → repopulates.

The fallback path

What happens when a request comes in and wrapindex:{wrapId} doesn't exist yet? Either the wrap is brand new, or the async rebuild from the last note PUT hasn't completed.

The worker falls back to the old N+M read path, renders the page (slower, 1.95s instead of 1.0s), and kicks off an async index rebuild as a side effect via ctx.waitUntil. By the next request, the index is there.

This is the line I'm proudest of. No cache stampede. No 5xx errors during index rebuilds. No exposed inconsistency to the user. Just a graceful "slow first hit, fast everything else" experience.

Why not block on index rebuild?

The first reader doesn't deserve to wait for an async background task that mostly benefits subsequent readers. Falling back to the direct-read path keeps cold-cold latency under 2s while letting the index build in the background. Cloudflare Workers' ctx.waitUntil was made for exactly this pattern.

What this means for the launch

A few concrete things:

  • Free tier can serve real traffic. Pre-rewrite: ~2,600 page views/day before hitting KV quota. Post-rewrite: ~33,000 page views/day. An HN front-page hit doesn't break the budget anymore.
  • Mobile users don't bounce. Cold renders under 1 second on a fresh CDN edge. Mobile Core Web Vitals move from "needs improvement" to "good," which directly affects SERP rank.
  • The worker stays simple. No Redis, no Postgres, no new infra. Just one extra KV key per wrap, computed at write time, read once at read time. Same primitives as before, better access pattern.

What I learned

The thing I keep relearning about Cloudflare KV is that the access pattern dominates the performance story. KV isn't slow. 38 sequential KV reads from a Worker handler is slow. Parallel reads via Promise.all help a lot but don't change the per-read cost or count. The win came from rethinking which information needs to live where, not from a faster store.

That's the bigger lesson: when you're on a key-value system, the data model question is what's the right cardinality of the key, not how do I make reads faster. Pre-aggregation is the lever. Use it.

Honest caveats

  • The wrapindex blob is bounded by KV's 25MB per-value limit. In practice it's tiny (a few KB even for a 1000-note wrap). But if you push it (think: a 5,000-note vault with deep folder structures), you'd want to think about chunking.
  • The fallback path still does 38 reads, just less often. If a malicious actor pounded a wrap whose index keeps getting evicted, you'd burn writes faster. Mitigated by rate limits, but worth knowing.
  • The async index rebuild is best-effort. If the Worker is OOM-killed mid-rebuild, the next read goes through the fallback path. Self-healing but slightly more KV reads in the failure case.

None of these are launch-blockers. All of them are documented in the code.


For the wider context of how BrainShare's rendering works end-to-end, see How BrainShare Works (Behind the Scenes). That one was written before this rewrite, so a few specifics are out of date (it still describes the N+M read pattern). I'll update it after the launch dust settles.

The launch letter that prompted all this: A letter for my fellow viewers.

Maheen 15 May 2026