A/B Testing Roofing Mailers: What to Test First

Q: What should I A/B test first on a roofing mailer?

Test the list — who receives the mail — before anything else. Audience targeting moves response far more than any wording change: filtering to owner-occupied homes, aging roofs, or a recent storm footprint can double or triple your response rate, while a clever headline might move it a few percent. Lock in your best audience, then test the offer, then the headline, then format, then the smaller design elements.

Q: How many pieces do I need to mail to A/B test a roofing campaign?

Enough that the winning version collects at least 25–30 responses, which keeps the result above the noise. At typical roofing response rates under 1.5%, that means thousands of pieces per version, not hundreds. As a guide: at a 1% response rate you'd want roughly 2,500 pieces per cell (5,000 total); at 0.5% closer to 5,000 per cell. Smaller tests can still be run, but treat their results as weak hints to confirm, not proven facts.

Q: Why do I have to change only one thing in an A/B test?

Because if you change two or more things and one version wins, you can't tell which change caused the win — the headline, the photo, the offer, or some combination. That makes the result impossible to repeat or build on. Changing exactly one variable isolates cause and effect, so each test teaches you one durable, reusable fact about your market.

Q: How do I track which mailer version got a response?

Put unique tracking on each version: a distinct call-tracking phone number, a distinct QR code, and/or a distinct landing-page URL for A versus B. When a call or scan comes in, the number or code tells you which version drove it. Without separate tracking, both versions blend together and the test is unreadable. Set this up before you mail, never after.

Q: How long should I wait before deciding the winner of a mailer test?

Wait for the full response window to close — typically 3 to 6 weeks for roofing direct mail, and longer in slow seasons. Mail responses trickle in over weeks, and a version that looks like it's losing on day five can win by day thirty. Decide your win metric and your wait period before you mail so you're not tempted to call it early or move the goalposts.

Q: Should I test the offer or the headline first?

The offer, generally. The offer is the reason someone calls now and changes their actual incentive to act, so it carries more weight than the headline, which mainly decides whether the card gets read at all. The standard order after the list is offer, then headline. Both are worth testing — just sequence the higher-leverage offer test ahead of the headline test.

Q: What's a good metric to judge a roofing mailer test by?

Booked jobs per thousand pieces, or cost per booked job, is the best metric. Qualified leads per thousand is an acceptable second. Raw call volume is the weakest measure because a scary or confusing card can generate lots of calls that never become jobs. Decide your single win metric before you mail so the result is unambiguous.

Q: Is it worth A/B testing if I only mail a few thousand pieces a month?

Yes, but be realistic about what you'll learn. A few thousand pieces split into two cells can read a big difference (like owner-occupied versus a random ZIP) but won't reliably detect small ones (like two similar headlines). Start with high-leverage list and offer tests where the gap is likely to be large, and treat close results as inconclusive rather than overreading them.

Q: Can I test more than two versions at once?

You can — that's multivariate testing — but it needs much more volume, often 20,000+ pieces, to read cleanly, which most roofing companies don't mail in one drop. For nearly all roofers, simple two-version A/B testing is the right tool. Save multivariate testing for when you're mailing tens of thousands a month and already have a stable control.

Q: How do I know if my test result is real or just luck?

Check two things: did the winning version collect at least 25–30 responses, and is the gap between versions large (roughly 30–50%+ at small scale)? If the winner got 40 responses to the loser's 24, that's a real signal. If it got 7 to 5, that's noise — re-run it bigger. Small samples with small gaps are coin flips dressed up as data.

Sarah Jenkins, Senior Roofing Consultant·Jun 20, 2026·32 min readDirect Mail Marketing

Branded illustration for the RoofPredict guide: A/B Testing Roofing Mailers: What to Test First — A/B Testing Roofing Mailers: What to Test First

On this page

Short Answer

A/B testing roofing direct mail means splitting one campaign into two near-identical versions, changing exactly one thing between them, and measuring which version produces more calls, forms, and booked jobs per thousand pieces mailed. For most roofing companies, the order of what to test is the same: test the list first (who you mail), then the offer (the reason to call now), then the headline (the first line they read), then the format/size, and only after that the smaller stuff like colors, photos, and call-to-action wording. You test the list first because targeting a different audience moves response more than any wording change ever will — a 1% lift from a clever headline is rounding error next to mailing aging roofs in a hail swath instead of a random ZIP.

The mechanics are simple. Take a clean mailing list, randomly split it into two equal halves, send Version A to one half and Version B to the other in the same week, put a unique tracking number or QR code on each version so you can tell responses apart, and wait until the response window closes (usually 3–6 weeks for roofing mail) before you call a winner. Whatever wins becomes your new control. Then you test the next variable against that control. This is a loop, not a one-time event.

Two cautions kill more roofing A/B tests than anything else: too-small sample sizes and measuring the wrong outcome. If you split 600 pieces into two cells of 300 and one gets 4 calls and the other gets 6, you have learned nothing — that gap is noise. You generally want enough volume that the winner gets at least 25–30 responses before you trust the result, which for typical roofing response rates means cells of a few thousand pieces each, not a few hundred. And track booked jobs or qualified leads, not raw calls. A headline that triples "is this a scam?" phone calls is not a winning headline.

Direct mail remains a workhorse channel for roofers precisely because it is measurable and improvable. The U.S. Postal Service runs formal mail-targeting and Every Door Direct Mail programs built for exactly this kind of local saturation, and the USPS business-mail guidance covers the formats and postage classes you will choose between when you test size. Your targeting data — roof age, owner-occupancy, storm exposure — can be reasoned about using public datasets like the Census American Community Survey for housing-stock age and NOAA's Storm Events Database for hail and wind history. Keep every offer honest and compliant: the FTC's advertising and marketing basics govern what you can claim on a postcard, and consumers are primed to spot home-improvement scams, so a credible mailer beats a hypey one. Sources checked: June 20, 2026.

Why A/B Testing Beats Guessing (and Beats Opinions)

Walk into ten roofing offices and you will hear ten confident theories about what makes a postcard work. One owner swears by storm-damage imagery. Another insists on a big discount. A third refuses to put a price anywhere on the card. They cannot all be right, and most of them have no data — they have a memory of one good month and a story they have told themselves since.

A/B testing replaces the story with a number. When you mail two versions of a card to two random halves of the same list in the same week, every difference in response is caused by the one thing you changed. Weather is the same. The list is the same. The season is the same. Your reputation in the market is the same. The only variable is the variable. That is the entire point: you isolate cause and effect so the market tells you what works instead of your loudest team member.

This matters financially because direct mail is one of the few marketing channels where small percentage improvements compound into real money. If you mail 4,000 pieces a month and your response rate climbs from 0.8% to 1.1%, that is 12 extra responses a month, roughly 144 a year. At a 25% close rate and a $12,000 average job, that single sustained improvement is worth several hundred thousand dollars in top-line revenue over a year — from the same postage, the same list, the same effort. Testing is how you find and bank that improvement instead of leaving it on the table.

The discipline also protects you from the opposite mistake: scaling a loser. Without testing, the natural move after a slow month is to mail more of the same card. If that card is the problem, you just spent more money being wrong. Testing in small, controlled batches before you commit the big spend is the cheapest insurance in marketing. The broader principle — measure, learn, then invest behind what works — is exactly what the SBA's marketing and sales guidance tells every small business to do, and it applies to a roofing postcard as cleanly as to anything else.

The One Rule That Makes a Test Valid: Change One Thing

The single most important rule in A/B testing is that A and B differ by exactly one variable. If Version A has a new headline and a new photo and a new offer, and it wins, you have no idea why. Was it the headline? The photo? The offer? All three? Some combination that cancels out? You cannot tell, so you cannot repeat the win or build on it.

This feels slow, and it is. Owners who are used to redesigning the whole card every quarter find single-variable testing frustrating. But the slowness is the price of knowledge. Each clean test teaches you one durable fact about your market — "owner-occupied homes respond 40% better than the full list," "the inspection offer beats the discount offer," "the storm headline beats the price headline in March" — and those facts stack into a playbook that a competitor running redesign-roulette will never have.

There is one legitimate exception, called multivariate testing, where you test several variables at once across many cells (Headline 1/2 × Offer A/B × Photo X/Y = eight versions). Multivariate testing is powerful but it needs a lot of volume — easily 20,000+ pieces to read cleanly — and most roofing companies don't mail enough in one drop to support it. For nearly everyone, simple one-variable A/B testing is the right tool. Save multivariate for when you are mailing tens of thousands a month and have a stable control.

What to Test First: The Priority Ladder

Not all variables matter equally. Testing the font color before you have tested the list is like rearranging deck chairs. Here is the order that returns the most learning per dollar, from highest-impact to lowest.

Priority	Variable	Typical impact on response	Why it ranks here
1	The list / audience	Very high (can 2–5x response)	Who you mail dwarfs what you say. Right house, wrong card still beats wrong house, perfect card.
2	The offer	High	The reason to call now. A strong, credible offer changes the economics of the whole card.
3	The headline / hook	Medium-high	First thing read; decides whether the card gets flipped or trashed.
4	Format & size	Medium	Postcard vs. letter, 4x6 vs. 6x11, affects open/notice rate and postage cost.
5	The primary image	Medium	Storm damage vs. finished roof vs. crew vs. map changes who sees themselves in it.
6	Call to action & response mechanism	Low-medium	Phone vs. QR vs. URL; "call today" vs. "book your free check."
7	Color, layout, small copy	Low	Real but small; only worth testing once the big levers are set.

Work top-down. Lock in a winner at each rung before you spend a test cycle on the rung below it. The reason is leverage: a list change can double your response; a button-color change might move it 3%. You want to capture the big wins first, because every later test then runs against a stronger control and the absolute dollar value of each subsequent percentage improvement is higher.

Testing the List First (the Highest-Leverage Test)

If you only ever run one kind of test, make it a list test. The list is who receives the mail, and it is the dominant driver of response because it determines how many recipients have an actual roofing need.

Here are list variables worth testing, roughly in order of how much they move the needle for roofers:

Owner-occupied vs. all addresses. Renters almost never buy a roof. Filtering to owner-occupied homes typically lifts response meaningfully because you stop wasting pieces on people who can't say yes. This is usually the first list test to run.
Roof age / home age bands. Homes with roofs in the 12–22 year window are far likelier to need work than newer ones. You can approximate this with home-age data; the Census American Community Survey publishes year-built distributions by area that help you reason about which neighborhoods skew old.
Storm-exposed vs. non-exposed areas. Mailing the footprint of a recent hail or wind event — verifiable through NOAA's Storm Events Database — against a comparable non-storm area is one of the most revealing tests a roofer can run.
Home-value or income bands. Higher-value homes can support premium jobs and tend to be owner-occupied. Worth testing if you sell upgrades, not only insurance replacements.
Saturation vs. targeted. Mailing every door in a tight cluster (cheaper per piece via EDDM) vs. a hand-picked targeted list (more expensive per piece, higher hit rate). This is the classic cost-vs.-precision trade and only a test settles it for your market.

A clean list test looks like this: take two audiences you genuinely want to compare (say, owner-occupied-only vs. the full ZIP), mail the exact same card to a sample of each in the same week, and compare response per thousand and, more importantly, cost per booked job. The card is the constant; the list is the variable. Run it and you will often find one list converts at double the rate of the other — a bigger swing than any headline test will ever produce.

Testing the Offer (the Second-Highest Lever)

The offer is the answer to "why should I call you this week instead of ignoring this?" It is the second-biggest lever after the list because it changes the recipient's incentive to act, not only their perception of you.

Common roofing offers worth testing against each other:

Offer type	Example wording	Pulls well when	Watch-out
Free inspection / roof check	"Free 20-point roof check — no obligation"	Storm markets, aging neighborhoods	Can attract tire-kickers; qualify on the call
Storm / insurance angle	"Recent hail in your area? Find out if your roof qualified for a claim"	After a verified storm event	Must stay honest — you don't decide claims; see compliance below
Dollar or percent discount	"$750 off a full roof replacement"	Price-sensitive retail markets	Can cheapen brand; protect margin
Financing / payment	"New roof from $X/month"	Retail, higher-ticket replacements	Disclose terms accurately
Maintenance / tune-up	"Roof tune-up before storm season — $199"	Off-season, relationship building	Lower ticket; good for list-building
Urgency / seasonal	"Beat the spring rush — book your check by March 15"	Capacity-driven seasons	Don't fake scarcity

The honest, low-pressure offer often beats the aggressive discount in roofing, because homeowners are wary — they have heard about home-improvement scams and a too-good offer triggers suspicion rather than trust. But you don't have to guess: test the free-inspection card against the discount card on the same list and let booked jobs decide. Just make sure whatever you promise is something you actually deliver, and that the claim is defensible under the FTC's advertising rules. A "free inspection" must be genuinely free with no bait-and-switch.

Testing the Headline

The headline is the first line the recipient reads, often the only line, because mail is sorted over a trash can. It earns the next three seconds of attention or it doesn't. Once your list and offer are dialed in, the headline is the next thing to test.

Headline angles to pit against each other:

Location-specific: "A message for [Neighborhood] homeowners" — local relevance lifts notice.
Storm-triggered: "The hail that hit [City] on [Month] may have damaged your roof."
Age/condition prompt: "Is your roof past 15? Here's what to check before winter."
Question hook: "When did you last have your roof looked at?"
Benefit-forward: "Catch a small roof problem before it becomes a $15,000 one."
Social proof: "Why 300+ [City] homeowners trusted us with their roof."

Test exactly two headlines at a time, everything else identical. A common, high-value test for roofers is storm/condition headline vs. offer-led headline — does leading with the problem (your roof may be damaged) beat leading with the deal (free inspection)? The answer varies by market and season, which is exactly why you test it rather than assume.

Keep headlines truthful. A headline like "Your roof IS damaged" that the company has no way of knowing is both dishonest and a compliance risk. "Your roof may have been affected by the [Month] hail — here's a free way to find out" is both more credible and safer.

Testing Format and Size

Format is a structural variable: a 4x6 postcard, a 6x11 jumbo postcard, a letter in an envelope, or a folded self-mailer. It interacts with postage class and cost, so a format test is really a test of response per dollar, not only raw response.

Format	Rough relative cost/piece	Notice/open behavior	Best for
4x6 postcard	Lowest	Seen instantly, no opening; easy to ignore	High-volume saturation, repeat touches
6x9 / 6x11 jumbo postcard	Low-medium	Hard to miss in the stack; more room	Most roofing campaigns; good default
Letter in #10 envelope	Medium	Must be opened; higher intent if opened	Targeted, higher-ticket, personalized
Folded self-mailer	Medium	More content; can feel like an ad	Education-heavy offers, multi-step

Bigger formats cost more per piece and per ounce of postage, so a jumbo card has to earn its premium with enough extra response to pay for itself. The USPS business-mail guidance lays out the format and postage-class options, and EDDM has its own size and weight rules that affect what you can test in a saturation drop. Run a format test by mailing the same message and offer in two sizes to matched samples, then compare cost per booked job — the jumbo often wins on response but loses on cost, or vice versa, and only the math tells you which.

Testing the Image, CTA, and the Small Stuff

Once list, offer, headline, and format are settled, you move to the lower rungs. These tests produce smaller lifts but are still worth running on high-volume campaigns where a 5% improvement is real money.

Primary image. Test storm-damage close-up vs. a beautiful finished roof vs. your crew/truck vs. a local map of the service area. Damage imagery can raise urgency but can also feel alarmist; a clean finished roof sells aspiration; a crew photo sells trust and "real local company." There's no universal winner — test it.

Call to action and response mechanism. Test phone-first vs. QR-first vs. URL-first. Test "Call today" vs. "Book your free roof check." Many roofers find that offering multiple easy paths (call, scan, or visit a simple page) and tracking each separately beats forcing one channel. Put a distinct tracking number and a distinct QR destination on each version so you can attribute responses precisely.

Color, layout, and microcopy. Brand color blocks, the position of the offer, whether a price appears, the size of the phone number, trust badges (licensed, insured, warranty). These are the last things to test, not the first. They are real but small, and they only matter once the big levers are locked.

A useful mental model: list and offer decide whether someone wants what you're selling; headline and format decide whether they notice; image, CTA, and design decide how easily they respond. You optimize in that order because you can't polish your way out of mailing the wrong people.

How to Set Up a Valid Test: Step by Step

Here is the full procedure for a clean roofing A/B test, start to finish.

Pick one variable from the priority ladder. Just one. Write down your hypothesis: "Owner-occupied-only will book more jobs per thousand than the full ZIP."
Build one clean list and split it randomly into two equal halves. "Randomly" matters — don't put all of one neighborhood in cell A. Sort by a random key, or alternate every other address, so both cells are statistically the same.
Create two versions identical except for the test variable. If you're testing the list, the card is identical and the list filter is the variable.
Add unique tracking to each version: a distinct call-tracking phone number, a distinct QR code, and/or a distinct landing-page URL. This is non-negotiable — without it you cannot tell A's responses from B's.
Mail both versions in the same week, ideally the same day. Different weeks mean different weather and different competitor activity, which contaminate the test.
Define your win metric before you mail. Booked jobs per thousand pieces is best; qualified leads per thousand is acceptable; raw calls is weak. Decide now so you can't move the goalposts later.
Wait for the full response window to close — typically 3–6 weeks for roofing mail, longer in slow seasons. Don't call it on day 5.
Count responses by version, compute response and cost per booked job for each, and apply a simple significance check (next section) before declaring a winner.
Promote the winner to control. The winning version becomes your new baseline. Document what you learned.
Test the next variable against the new control. Repeat forever.

Sample Size and Significance: The Math That Keeps You Honest

This is where most roofing A/B tests go wrong. The owner mails 500 pieces of A and 500 of B, A gets 7 calls and B gets 4, and the owner declares A the winner and reprints 10,000 of it. That conclusion is worthless. With numbers that small, the 7-vs-4 gap is well within the range of pure luck — flip a coin 11 times and you'll often get 7 heads.

The fix is sample size. You need enough volume that the winner accumulates a real pile of responses, not a handful. A practical rule of thumb for roofing mail:

You generally want the winning cell to collect at least 25–30 responses before you trust the result, and a relative gap of at least ~30–50% between cells to call it with small samples.

Because roofing response rates are low (often well under 1.5%), getting 25–30 responses means mailing thousands per cell, not hundreds. Use this table to size a test, assuming you want roughly 25+ responses in the better cell:

Expected response rate	Pieces per cell to get ~25 responses	Total mail for the A/B test (both cells)
0.5%	~5,000	~10,000
0.75%	~3,300	~6,600
1.0%	~2,500	~5,000
1.5%	~1,700	~3,400
2.0%	~1,250	~2,500

These are minimums to get a readable signal, not statistical certainty. If you can only afford a small test, you can still run it — just treat the result as a weak hint, not gospel, and confirm it on the next drop rather than betting the whole budget on it. The cardinal sin is treating a small, noisy result as a proven fact.

A quick gut-check formula for whether a difference is meaningful at small scale: if the better version's response count is more than roughly the square root of itself times two above the other version, the gap is probably real. (For 25 responses, √25 = 5, so you'd want a margin of ~10 — i.e., 25 vs. 15 is interesting, 25 vs. 22 is noise.) It's rough, but it keeps you from over-reading coin flips. When in doubt, mail bigger or re-run, never round a hunch up to a conclusion.

A Copy-Paste A/B Test Planning Worksheet

Use this before every test. Fill in every line; if you can't, you're not ready to mail.

ROOFING MAILER A/B TEST PLAN

Test name: ____________________________
Drop date (both versions, same week): ____________

HYPOTHESIS (one sentence):
"We believe [Version B] will beat [Version A]
 on [metric] because [reason]."

VARIABLE BEING TESTED (exactly ONE):
[ ] List/audience   [ ] Offer   [ ] Headline
[ ] Format/size     [ ] Image   [ ] CTA/response   [ ] Design

WHAT IS IDENTICAL between A and B (list everything):
- ____________________________________________
- ____________________________________________

VERSION A (control): _________________________
VERSION B (challenger): ______________________

LIST:
Total clean records: __________
Split method (random/alternate): ______________
Cell A size: ________  Cell B size: ________
Are cells equal & randomly assigned? [Y/N]

TRACKING:
Version A phone #: __________  QR/URL: __________
Version B phone #: __________  QR/URL: __________

WIN METRIC (pick ONE, decide NOW):
[ ] Booked jobs / 1,000   [ ] Qualified leads / 1,000
[ ] Cost per booked job   [ ] (weak) Calls / 1,000

RESPONSE WINDOW CLOSES (3-6 wks out): __________
MINIMUM RESPONSES needed in winning cell: ______ (aim 25+)

DO NOT call the winner before the window closes.
DO NOT change two things at once.

A Copy-Paste Results Scorecard

After the response window closes, fill this in for each version. The math is deliberately simple so anyone in the office can run it.

ROOFING MAILER A/B TEST — RESULTS

                          | VERSION A | VERSION B
--------------------------|-----------|----------
Pieces mailed             |           |
Responses (calls+scans)   |           |
Qualified leads           |           |
Appointments set          |           |
Jobs booked               |           |
Revenue booked ($)        |           |
--------------------------|-----------|----------
Response rate (%)         |           |
Booked jobs / 1,000       |           |
Cost of mail ($)          |           |
Cost per booked job ($)   |           |
Revenue per piece ($)     |           |

KEY RATIOS:
Response rate = Responses / Pieces mailed
Booked / 1,000 = (Jobs booked / Pieces) x 1,000
Cost per booked job = Cost of mail / Jobs booked
Revenue per piece = Revenue booked / Pieces mailed

WINNER (by the metric you chose up front): __________
Did winning cell get 25+ responses? [Y/N]
Is the gap > ~30-50%? [Y/N]
CONFIDENCE: [ ] Strong  [ ] Weak/confirm next drop

DECISION:
[ ] Promote B to control
[ ] Keep A as control
[ ] Inconclusive — re-run bigger

NEXT VARIABLE TO TEST: __________

A Copy-Paste 6-Month Testing Roadmap

Don't test randomly. Sequence your tests so each builds on the last. Here's a sensible default roadmap for a company mailing a few thousand pieces a month.

MONTHS 1-2: LIST TEST
  Owner-occupied-only  vs.  full ZIP
  -> Winner becomes your default audience.

MONTHS 2-3: SECOND LIST TEST
  Aged-roof / storm-exposed cut  vs.  prior winner
  -> Lock in your best audience definition.

MONTHS 3-4: OFFER TEST
  Free roof check  vs.  $ discount  (best list)
  -> Winner becomes your default offer.

MONTHS 4-5: HEADLINE TEST
  Storm/condition hook  vs.  offer-led hook
  -> Winner becomes your default headline.

MONTH 5: FORMAT TEST
  Standard postcard  vs.  jumbo postcard
  -> Compare COST per booked job, not only response.

MONTH 6: IMAGE / CTA TEST
  Finished-roof photo  vs.  crew photo (or QR vs. phone-first)
  -> Final polish on the proven card.

THEN: re-run your list test. Markets drift; what won
in spring may not win in fall. Testing never ends.

Common A/B Testing Mistakes Roofers Make

Testing too small. Covered above and worth repeating because it's the number-one error. A cell of 300 pieces teaches you nothing. Mail thousands per cell or treat the result as a hint.

Changing more than one thing. A "winner" with a new headline, photo, and offer tells you nothing actionable. One variable per test.

Mailing the versions in different weeks. Weather, storms, and competitor drops all change week to week. Same week, ideally same day, or the test is contaminated.

No tracking, or shared tracking. If both versions point to the same phone number, you can't tell them apart. Distinct number and QR per version, every time.

Measuring calls instead of jobs. A scary headline can triple calls while booking fewer jobs because half the callers are confused or hostile. Track booked jobs or qualified leads.

Calling the winner too early. Roofing mail trickles in for weeks. A version that's "behind" at day 5 can win by day 30. Wait for the window to close.

Re-testing the same thing forever. Once you have a clear winner on the offer, stop re-running offer tests every month. Move down the ladder. Re-visit only when the season or market shifts.

Ignoring the loser's lesson. A losing version isn't a waste — it's information. Write down why you think it lost. Patterns across tests become your playbook.

Letting opinion override data. The hardest one. Sometimes the ugly card or the boring headline wins. If the data says so and the test was valid, the data wins. That's the whole point.

Seasonal and Regional Variation in What Wins

A/B test results are not permanent truths — they're snapshots of what worked in a particular market at a particular time. The same test can flip depending on season and region, which is why the roadmap above ends with "re-run your list test."

Season. In storm-heavy spring and early summer, storm- and condition-led headlines and inspection offers tend to pull hardest because the need is top-of-mind. In the off-season (deep winter in cold climates, the dog days where there's been no weather), maintenance and "beat the spring rush" angles often do better than storm fear, because there's no fresh damage to reference. An offer that wins in May may lose in January, and vice versa. Don't assume your spring control is your January control.

Region. Hail-alley markets (parts of Texas, Oklahoma, Colorado, the Plains) are heavily insurance-driven, so storm/claim messaging — kept honest — tends to win. Coastal and high-wind regions skew toward wind-damage and resilience messaging; the IBHS FORTIFIED roof program gives you a credible, third-party hook for resilience-minded homeowners. In milder retail markets with little severe weather, age- and value-based targeting with replacement/upgrade offers usually beats storm messaging, because there's no storm to point to. Older housing stock — which you can scope with the Census ACS — supports age-based targeting; brand-new subdivisions don't.

Local competition. If three competitors are blanketing a ZIP with discount cards, a discount test there may underperform simply because the market is numb to discounts; a trust/credibility angle can stand out. Your test results reflect your competitive context, not only your card.

The practical takeaway: treat your winning card as a current champion, not a permanent law. Re-test the big levers at least once or twice a year, and always re-test after a major storm changes the market or a new competitor floods your area.

How to Read Results: A Worked Example

Numbers below are illustrative to show the math, not real campaign data.

Imagine a contractor runs a list test. Version A (full ZIP) and Version B (owner-occupied-only) use the identical jumbo postcard with a free-inspection offer. Both drop the same Tuesday.

Version A — full ZIP: 4,000 pieces, 24 responses (0.60%), 9 qualified leads, 3 jobs booked, $36,000 revenue.
Version B — owner-occupied: 4,000 pieces, 40 responses (1.00%), 22 qualified leads, 7 jobs booked, $84,000 revenue.

Run the scorecard: B's response rate is 67% higher, its booked-jobs-per-thousand is 1.75 vs. 0.75 (more than double), and at, say, $0.85/piece all-in, both cells cost ~$3,400 to mail. Cost per booked job is $1,133 for A and $486 for B. The winning cell (B) collected 40 responses — above the 25–30 threshold — and the gap is far more than 30–50%. This is a strong result. Promote owner-occupied to control.

Now imagine a weaker test. Same setup, but each cell is only 600 pieces. A gets 5 responses, B gets 7. B "wins" on paper, but 7 responses is below threshold and the gap is 2 responses — pure noise. The correct decision is inconclusive; re-run bigger. Reprinting 10,000 of B off this data would be gambling, not testing. Same logic, opposite confidence, entirely because of sample size.

The discipline is the same every time: compute response and cost per booked job, check the winning cell cleared the response threshold, check the gap is bigger than noise, and only then act. When the numbers are thin, the honest answer is "we don't know yet," and that answer will save you more money than any clever headline.

Decision Framework: When to Test vs. When to Just Mail

Testing has overhead. You don't need to A/B every single drop. Use this framework to decide.

Situation	Test or just mail?	Why
You've never tested anything	Test (start with list)	Biggest unknowns, biggest upside
You have a proven control + stable market	Mostly just mail the control; test occasionally	Don't fix what works; sample-test for drift
A major storm just changed the market	Test offer/headline	Market conditions shifted; old winners may not hold
You're entering a new ZIP/region	Test the list	New market = unknown audience
Tiny budget, < ~2,000 pieces total	Just mail your best guess; test later	Too small to read a split cleanly
New competitor flooding your area	Test differentiation angle	You need to stand out, not blend in
Seasonal turnover (spring↔winter)	Test offer	Seasonal winners differ
You're scaling spend 3–5x	Test first, then scale the winner	Never scale an unproven card

The meta-rule: test when the stakes or the uncertainty are high, mail your control when things are stable, and always test before you pour money behind a big scale-up. A small test in front of a large spend is the cheapest risk reduction in your entire marketing budget.

Where RoofPredict Fits

Everything above — splitting a list cleanly, filtering to owner-occupied or aged-roof or storm-exposed homes, dropping two versions in the same week with distinct tracking, and reading cost per booked job — is doable by hand for one small test. The friction shows up when you try to do it repeatedly, across a whole territory, every month. That's the gap RoofPredict is built to close.

RoofPredict scores the properties in your service area by how likely they are to need roof work — using property age and characteristics, storm/hail/wind exposure history, and roof-imagery signals — and turns that scored demand into targeted direct-mail campaigns. That's the list-testing lever, the highest one on the ladder, handled as a repeatable system: you can define an audience (owner-occupied, aged-roof, recent-storm footprint), build the mail list from scored properties, and run that as your challenger against a broader control. It also generates branded, shareable roof reports reps can send, and helps your team organize the records and photos behind a job. On cost: the subscription/credits cover the roof reports (one per home, no matter how many mail touches it gets); the mailers themselves are billed in real dollars per piece, around $0.68 each, with volume discounts by send size (1,000+ saves 7%, 2,500+ saves 12%, 5,000+ saves 18%) — and nothing is charged until you approve the proof and the mailers go to print. That's useful for testing because you can size A/B cells to hit the response thresholds in this guide and see the honest dollar total before you commit.

Guardrail: RoofPredict's score is a prioritization and targeting signal — a way to decide which homes to mail and call first. It is not a verdict on any individual roof. It does not inspect or climb a roof, does not prove roof age or storm causation on its own, and does not decide, approve, or guarantee an insurance claim. Whether a specific roof is actually damaged, how old it really is, and whether a claim is valid are determinations only a licensed roofer, a qualified adjuster, or the building department can make. RoofPredict tells you where to point your mail and your reps; the field and the paperwork confirm the rest.

Key Takeaways

Test the list first. Who you mail moves response 2–5x; wording changes move it a few percent. Owner-occupied, aged-roof, and storm-exposed cuts are the highest-leverage tests.
Change exactly one variable per test. A multi-change "winner" teaches you nothing you can repeat.
Follow the ladder: list → offer → headline → format → image → CTA → design. Lock a winner at each rung before moving down.
Size for significance. Aim for 25–30+ responses in the winning cell, which means thousands of pieces per cell at typical roofing response rates — not hundreds.
Same week, distinct tracking. Mail both versions in the same week with unique phone numbers and QR codes, or the test is contaminated.
Measure booked jobs, not calls. Track cost per booked job and revenue per piece, decided before you mail.
Promote the winner to control and keep going. A/B testing is a loop. Re-test the big levers each season and after market shifts.
Don't scale an unproven card. A small test in front of a big spend is the cheapest insurance in marketing.
Keep every claim honest under FTC advertising rules. A credible mailer beats a hypey one, especially with scam-wary homeowners.

FAQ

What should I A/B test first on a roofing mailer?

Test the list — who receives the mail — before anything else. Audience targeting moves response far more than any wording change: filtering to owner-occupied homes, aging roofs, or a recent storm footprint can double or triple your response rate, while a clever headline might move it a few percent. Lock in your best audience, then test the offer, then the headline, then format, then the smaller design elements.

How many pieces do I need to mail to A/B test a roofing campaign?

Enough that the winning version collects at least 25–30 responses, which keeps the result above the noise. At typical roofing response rates under 1.5%, that means thousands of pieces per version, not hundreds. As a guide: at a 1% response rate you'd want roughly 2,500 pieces per cell (5,000 total); at 0.5% closer to 5,000 per cell. Smaller tests can still be run, but treat their results as weak hints to confirm, not proven facts.

Why do I have to change only one thing in an A/B test?

Because if you change two or more things and one version wins, you can't tell which change caused the win — the headline, the photo, the offer, or some combination. That makes the result impossible to repeat or build on. Changing exactly one variable isolates cause and effect, so each test teaches you one durable, reusable fact about your market.

How do I track which mailer version got a response?

Put unique tracking on each version: a distinct call-tracking phone number, a distinct QR code, and/or a distinct landing-page URL for A versus B. When a call or scan comes in, the number or code tells you which version drove it. Without separate tracking, both versions blend together and the test is unreadable. Set this up before you mail, never after.

How long should I wait before deciding the winner of a mailer test?

Wait for the full response window to close — typically 3 to 6 weeks for roofing direct mail, and longer in slow seasons. Mail responses trickle in over weeks, and a version that looks like it's losing on day five can win by day thirty. Decide your win metric and your wait period before you mail so you're not tempted to call it early or move the goalposts.

Should I test the offer or the headline first?

The offer, generally. The offer is the reason someone calls now and changes their actual incentive to act, so it carries more weight than the headline, which mainly decides whether the card gets read at all. The standard order after the list is offer, then headline. Both are worth testing — just sequence the higher-leverage offer test ahead of the headline test.

What's a good metric to judge a roofing mailer test by?

Booked jobs per thousand pieces, or cost per booked job, is the best metric. Qualified leads per thousand is an acceptable second. Raw call volume is the weakest measure because a scary or confusing card can generate lots of calls that never become jobs. Decide your single win metric before you mail so the result is unambiguous.

Is it worth A/B testing if I only mail a few thousand pieces a month?

Yes, but be realistic about what you'll learn. A few thousand pieces split into two cells can read a big difference (like owner-occupied versus a random ZIP) but won't reliably detect small ones (like two similar headlines). Start with high-leverage list and offer tests where the gap is likely to be large, and treat close results as inconclusive rather than overreading them.

Can I test more than two versions at once?

You can — that's multivariate testing — but it needs much more volume, often 20,000+ pieces, to read cleanly, which most roofing companies don't mail in one drop. For nearly all roofers, simple two-version A/B testing is the right tool. Save multivariate testing for when you're mailing tens of thousands a month and already have a stable control.

How do I know if my test result is real or just luck?

Check two things: did the winning version collect at least 25–30 responses, and is the gap between versions large (roughly 30–50%+ at small scale)? If the winner got 40 responses to the loser's 24, that's a real signal. If it got 7 to 5, that's noise — re-run it bigger. Small samples with small gaps are coin flips dressed up as data.

Do A/B test results change with the season?

Yes. Storm and condition messaging tends to win in storm-heavy spring and summer when damage is top-of-mind, while maintenance and "beat the spring rush" angles often do better in the off-season. An offer that wins in May can lose in January. Treat each winning card as a current champion, not a permanent rule, and re-test your big levers at least once or twice a year.

Should I A/B test every single mail drop?

No. Test when uncertainty or stakes are high — you've never tested, you're entering a new market, a storm just shifted conditions, or you're about to scale spend 3–5x. When you have a proven control in a stable market, mostly just mail the winner and run an occasional check for drift. Always test before a big scale-up, though; never pour money behind an unproven card.

What's the most common A/B testing mistake roofers make?

Testing too small and then scaling the "winner." Splitting 600 pieces into two cells, seeing 7 calls versus 4, and reprinting 10,000 of the version with 7 — that gap is pure luck, and you've just bet real money on noise. The fix is sample size: mail enough that the winner clears 25–30 responses, and when you can't, treat the result as a hint to confirm rather than a conclusion to act on.

Is direct-mail discount the best offer to test for roofing?

Not necessarily. In roofing, an honest free-inspection or roof-check offer often beats an aggressive discount, because homeowners wary of scams can read a too-good discount as a red flag rather than a deal. But you don't have to guess — test the inspection offer against the discount offer on the same list and let booked jobs decide. Whatever you promise, make sure it's genuinely delivered and compliant with FTC advertising rules.

The Roofline by RoofPredict

Stay Ahead of Roofing Market Changes

Join The Roofline by RoofPredict for weekly roofing intelligence: material price signals, storm demand, insurance and regulatory updates, sales tactics, and local contractor opportunities.

Sources

USPS — Every Door Direct Mail (EDDM) — usps.com
USPS — Advertise With Mail (business mail) — usps.com
Census — American Community Survey — census.gov
NOAA — Storm Events Database — ncdc.noaa.gov
FTC — Advertising and Marketing Basics — ftc.gov
FTC Consumer — How to Avoid a Home Improvement Scam — consumer.ftc.gov
SBA — Marketing and Sales Guidance — sba.gov
IBHS — FORTIFIED Roof — ibhs.org

The Roofline

Get The Roofline

Weekly roofing market intelligence for contractors.

Learn more

Are you a roofing contractor?

Add your company to the RoofPredict Contractor Directory.

Add Your Company