Subject Line A/B: The 15-Hypothesis Spreadsheet I Build Before Writing a Single Variant
Contents
My old subject-line testing process had three candidates and a coin flip. The new one starts with a 15-row spreadsheet, no copy written, and a clear answer to "what are we even testing?"
The coin-flip version is still the default in most in-house teams I work with. Someone drafts two subject lines — "Special 50% off today" and "🔥 Limited time 🔥" — they test, one wins by a few points, and the team moves on. They learned nothing. Those two lines changed length, emoji, urgency, and tone all at once, so the winner is a stew of variables you can't isolate. The next test starts from scratch, and the team is back to guessing.
The fix is simple: write down your hypotheses before you write a single subject line. A 15-row spreadsheet takes 25 minutes and saves you from running a year of tests that all blur together.
The 15-hypothesis grid
The 15 rows cover every lever that meaningfully moves opens, with 2-4 cells per row showing the direction of the test. Here are 6 of the 15 rows I use:
| # | Hypothesis | Control | Cell A | Cell B |
|---|---|---|---|---|
| 1 | Shorter subject lines (<50 chars) lift opens on mobile | "Our new collection is here — shop the 12 best pieces" | "New collection: 12 picks you'll wear all week" (44) | "New collection is live" (24) |
| 2 | One emoji at the start lifts opens | "Sale ends tonight" | "🎁 Sale ends tonight" | "Sale ends tonight 🎁" |
| 3 | First-name tokens lift opens | "Your weekly recap" | "Sarah, your weekly recap" | — |
| 4 | Specific numbers beat round numbers | "Save 50% this week" | "Save 47% this week" | "Save 53% on your first order" |
| 5 | Bracketed urgency tags survive truncation | "Last chance: 24 hours left" | "[Last 24 hrs] 50% off everything" | — |
| 6 | Lowercase, no-punctuation reads as personal | "We just shipped a new feature" | "we just shipped a new feature" | "We just shipped a new feature!" |
If you can't fill the "Cell A" column for a row, you don't actually have a hypothesis — you have a vibe. Skip it.
The other 9 rows in my template cover: question vs statement, benefit-first vs curiosity-first, sender name (brand vs person), preheader alignment, time-of-day, day-of-week, segment-specific copy, reactivation framing, and discount-on vs value-on.
One hypothesis per send
The most common mistake I see is "we'll test length AND emoji AND personalization in this campaign" — which is a multivariate stew disguised as A/B. Pick one row per send. Run it. Move to the next row.
In Klaviyo, set the test to 20% of the list (10/10 between control and variant), wait for 90% confidence, then send the winner. A 40,000-subscriber list gives you ~4,000 per arm — enough to detect a 1.5-point open-rate lift. Below ~1,000 per arm, you're guessing.
Why the spreadsheet comes first
Three reasons it's worth 25 minutes of pre-work:
- You write less, test more. Each row has one job. The copy for row 4 takes 60 seconds, not 20 minutes. You ship 5-6 testable hypotheses per quarter instead of one.
- You learn cumulatively. Once "specific numbers beat round numbers" is a confirmed row 4, the next test can skip it and test the next unknown. After 3 quarters the spreadsheet is a learning history, not a planning document.
- You stop testing vibes. "I think emoji works for our brand" stops being a debate. It's row 2, with a control and a cell, and the data decides.
The 15-hypothesis grid doesn't replace creative judgment. It just forces the creative judgment to land on a testable claim, not a guessable one. Build the grid first, write a single subject line second, and "best subject line" stops being a coin flip.