The_Two_Families | Pagination AI Job 41

What Does Your API Actually Promise?

Offset pagination and cursor pagination are not two ways of doing the same thing. They are two different *promises* about what rows the next page contains. The promise you choose determines the cost model, the consistency model, and the failure mode of every page that ever gets fetched.

Key Takeaways

Offset says: *give me rows 21 through 40 of the current ordered set*. Cursor says: *give me what comes after this marker*. The two are not equivalent.
Offset's hidden cost is linear in the offset value, not in the page size. A query for page 50,000 reads the first million rows and discards a million minus twenty of them.
Cursor's hidden cost is the cost of a stable, indexed ordering. When that ordering doesn't exist, the cursor degenerates into a full scan with extra steps.
Neither is "better." The right default depends on what the caller does with the page: jump to a known index, or walk forward through an unbounded list.

---

Open a SQL console. Paste this in:

SELECT id, created_at, title
FROM   articles
ORDER  BY created_at DESC, id DESC
LIMIT  20 OFFSET 400000;

Now run EXPLAIN ANALYZE. On a table with a few hundred million rows and a supporting index on (created_at, id), you will see something that looks roughly like this:

Limit  (cost=18542.31..18543.12 rows=20 width=64)
  ->  Index Scan Backward using articles_created_at_id_idx
        on articles
        (cost=0.57..925431.20 rows=9998000 width=64)

Read the row count. The plan is willing to read 9,998,000 rows to give you back 20. The database is not broken. The plan is correct. The plan is honoring the contract the query asked for: *the 400,021st through 400,040th rows of the current ordered set*. Honoring that contract requires looking at the rows that come before, in order, until it has counted past the first 400,000. There is no shortcut. The cost is in the contract.

This is the whole story in one query plan. The two families of pagination — offset and cursor — are not two ways of asking for the same thing. They are two different promises, and the storage layer pays a different price for honoring each one.

---

Promise one: positional ("give me rows N through M")

Offset pagination is the positional contract. The query above is the canonical example. The endpoint typically exposes it as ?page=4&size=25, which the client reads as "the 76th through 100th row," and the database reads as "skip 75 rows, return 25." The translation is faithful, and the cost is hidden inside the skip.

The cost is linear in the offset. A LIMIT 20 OFFSET 0 is cheap — the database starts at the top, reads 20 rows, stops. A LIMIT 20 OFFSET 400000 is 20,001 times more expensive, because the database has to read 400,020 rows to return 20. The page size is fixed at 20. The work is not. The cost is in the offset, not the limit, and most engineers I have watched debug this for the first time have not internalized that.

The promise is unstable under writes. Between page 1 and page 2, a new row can be inserted at the top. The user has now seen the row at position 21 on page 1 *and* the row at position 21 on page 2 — because the row that was at position 21 yesterday is at position 22 today. Or, worse, the row at position 21 yesterday was deleted, and the new row 21 is one the user has never seen. The page is correct. The page is not what the user expected. The contract did not promise stability. The user assumed stability anyway.

The promise is computable in advance. This is the offset contract's one genuine virtue. If the caller wants to know "how many pages are there?", the answer is a single COUNT(*) away, and the answer is meaningful: *as of this moment, 4,837,221 rows match the filter, which is 241,861 pages of 20*. The total count is a real number. It is expensive to compute, but it is computable, and a non-trivial set of UIs — admin tables, audit logs, anything where the user jumps to page 47 by typing it into a box — needs that real number.

If the caller is going to type a page number into a box, offset is the right contract. If the caller is going to scroll forever, offset is a disaster in slow motion, and the disaster accelerates as the user scrolls. Both facts can be true at the same time, and they are.

Promise two: sequential ("give me what comes after this marker")

Cursor pagination is the sequential contract. The endpoint exposes it as ?after=<opaque_token>, and the database reads it as a keyset predicate. Translated, the query is:

SELECT id, created_at, title
FROM   articles
WHERE  (created_at, id) < ($cursor_created_at, $cursor_id)
ORDER  BY created_at DESC, id DESC
LIMIT  20;

The plan is the same index scan, but it is no longer willing to read 9,998,000 rows. It walks the index from the cursor's position and stops after 20. The work is bounded by the page size, not the offset. The cost is constant in the caller's history of requests. A user who has been scrolling for an hour pays the same per-page cost as a user on page 1.

The promise is stable under writes. The cursor encodes the last row the client saw. The next request asks for rows that sort strictly after that row. Inserts and deletes above the cursor are invisible to the client, which is exactly what a user scrolling forward wants. If you remember the librarian from the introduction: this is the second librarian's answer.

The promise is not computable in advance. A cursor cannot tell you the total count, because "what comes after" depends on where you started. The user on page 1 and the user on page 500 are not asking the same question in the same frame of reference, and there is no number that meaningfully answers "how many pages are left?" for a sequential walk. The admin who wants to type "page 47" cannot, because the page numbers are not stable enough to type. The trade-off is not negotiable: you can know where you are, or you can know where you are going, but you cannot know both from a single contract.

The promise depends on a stable ordering. This is cursor's hidden landmine, and it is the one that bites teams who adopt cursor pagination without thinking about what comes next. The cursor is a tuple of sort-key values. If the sort order is non-unique, the cursor needs a tie-breaker, and the tie-breaker must be in the index. If the sort order is computed at query time, the cursor cannot be expressed as a keyset predicate, and the database falls back to "filter the rows that come after this cursor in the result set," which is offset in disguise and pays the same linear cost.

If the caller is going to scroll forward through an unbounded list, cursor is the right contract. If the caller is going to type a page number, cursor is the wrong shape for the UI, and forcing the UI to fit the contract is a usability bug that engineering will be asked to fix.

The shape of the caller's behavior decides the contract

The decision is not about which scheme is "modern" or which library is fashionable. It is about what the caller does with the page, and there are exactly three common shapes:

1. Walk forward, never look back. User-facing infinite feed, social timeline, log viewer, search results. The caller does not have a page number in mind. The caller has "what I have already seen" in mind. *Cursor is the right contract.* 2. Jump to a known page. Admin tables, audit log viewers, paginated reports, anything where the URL embeds the page. The caller has a specific row index in mind. The caller will return to it. *Offset is the right contract.* 3. Walk forward, but count what was missed. "You have 1,247 unread items" with a "mark all" button. The caller needs both the walk and the count. *Hybrid: cursor for the walk, cached count for the total, with a footnote in the contract that the count is best-effort and lags.*

Each of these is a different *user intent*, and the intent dictates the contract. Reversing the mapping — putting cursor behind the admin table, or putting offset behind the infinite feed — produces the failure mode the other contract was designed to prevent. The offset infinite feed eventually hits a page that takes 11 seconds. The cursor admin table makes the URL bar useless, and the support tickets start.

flowchart TD
  A[Caller behavior?] --> B{Walks forward,<br/>never jumps?}
  A --> C{Jumps to a<br/>page number?}
  A --> D{Walks + needs<br/>total count?}

  B -->|yes| E[Cursor<br/>keyset predicate]
  C -->|yes| F[Offset<br/>positional slice]
  D -->|yes| G[Hybrid:<br/>cursor + cached count]

  E -->|caveat| E1[Requires stable,<br/>indexed ordering]
  F -->|caveat| F1[Cost is linear<br/>in the offset]
  G -->|caveat| G1[Count is eventually<br/>consistent]

The diagram is not a recommendation. It is a translation: caller behavior → contract shape → hidden cost. Every branch ends in a "caveat" box because every contract has a failure mode the other contract was designed to prevent.

---

The mistake I have made and the mistake I have watched

I have shipped both contracts in the wrong place. The first time, I put offset behind a social feed. The first page was 4 ms. Page 100 was 800 ms. Page 1,000 was 8 seconds. The user did not scroll that far, but the crawler did, and the crawler took the database with it. The fix was not "add an index." The fix was to change the contract, because the index was already there — the index was the thing being scanned. I had to admit that the cost model and the user intent were incompatible, and the cost model wins, because the cost model is what the storage layer is *physically capable of doing*.

I have also watched a team put cursor behind an admin table. The URL was unguessable, the "jump to page" feature was removed, the support team got tickets about "where did my bookmark go," and the engineering team added a parallel offset endpoint to "fix" the problem. They now run two paginated endpoints on the same table, each one correct, each one broken in the other's use case, and the API documentation is a lie. The fix was not "use a better cursor library." The fix was to admit that admin tables are a different user intent than infinite feeds, and the two intents need two contracts.

I started this chapter assuming cursor was strictly better. Cursor is strictly better *for sequential walks*, and worse for everything else, and pretending otherwise is how teams end up running two endpoints on the same table.

The decision is upstream of every library, every framework, every "best practice" article. It is a question about your user, and the right answer is going to be unflattering, because the right answer is usually "we have two users, and they need two contracts."

The next chapter takes that conclusion into the part of the system that nobody likes to think about: the part where the contract meets the storage engine, and the storage engine charges interest on promises the contract made but the caller did not notice.