How to Clean Up a Messy Roofing CRM (The Weekend Playbook That Turns a Junk Drawer Into a Work Queue)
On this page
Most roofing CRMs are not a database. They are a junk drawer with a login. Every owner who has been in business more than three years has the same mess: 8,000 contacts, half of them with no phone number, a thousand duplicates, three different people spelled four different ways, deals that closed in 2021 still sitting in the "Estimate Sent" stage, and a tag system invented by a sales manager who quit two summers ago and took the logic with him.
You already know it is bad. What you do not have is a plan to fix it without burning a week you do not have, paying a "data person" $4,000 to clean a list you can clean yourself, or losing the handful of records that are actually worth money.
So let's build that plan. Below is the exact order of operations a roofing company should follow to take a messy CRM and turn it back into something that drives revenue: how to audit what you have, how to de-duplicate without nuking your history, how to standardize fields so reports stop lying to you, how to archive the dead weight, how to fix the stages so your pipeline number means something again, and how to layer roof-age and storm signal on top so the cleaned list actually tells you who to call first. Worked numbers, copy-paste field rules, and the mistakes that cost people the most are all in here.
This is written for residential and storm-restoration contractors running HubSpot, JobNimbus, AccuLynx, Salesforce, Pipedrive, or a homegrown spreadsheet that grew teeth. The principles are identical no matter the platform. The screenshots differ; the discipline does not.
Why your roofing CRM got messy (and why "just be more disciplined" won't fix it)
A messy CRM is almost never a discipline problem. It is a system problem wearing a discipline costume. Roofing has four structural forces that turn any database into mush, and if you do not name them, you will clean the data once and watch it rot again in six months.
Force 1: The trade is address-centric, but most CRMs are person-centric. Your business runs on roofs. A roof is a physical address. But HubSpot, Salesforce, and Pipedrive were built to track people and companies. So your reps end up storing "the job" inconsistently — sometimes as a contact, sometimes as a deal, sometimes as a note glued to whoever answered the door. Same house, three records, no link between them. This single mismatch causes more duplication than anything else.
Force 2: Multiple intake doors, none of them talking. Leads come in from the website form, the call tracking number, the door-knocking app, the yard-sign QR code, the referral text to the owner's personal phone, the storm-canvass spreadsheet, and the third-party lead vendor's CSV. Each door creates a record its own way. Nobody normalizes on the way in. By the time you notice, the same homeowner exists as "Bob," "Robert," and "Rob & Linda."
Force 3: Storm season floods the system with low-intent volume. After a hail event you might dump 1,200 canvass records into the CRM in three weeks. Most never go anywhere. They sit there forever, inflating your contact count, dragging down every "conversion rate" report, and burying the 40 records that were actually warm.
Force 4: Reps are paid to sell, not to type. A canvasser closing $14k jobs is not going to lovingly fill in a "Roof Material" dropdown. They will type the homeowner's name, maybe a phone, and move on. That is the correct economic behavior for them. Which means your cleanup plan has to assume sparse, sloppy field entry as the permanent condition, not a temporary lapse to be scolded out of people.
Hold those four forces in your head, because the cleanup steps below are designed specifically to survive them. A cleanup that ignores them is a cleanup you redo every spring.
The cost of leaving it messy, in real dollars
Owners tolerate a messy CRM because the cost is invisible. Let's make it visible with conservative, made-up-on-the-spot-but-realistic arithmetic for a mid-size residential shop.
| Hidden cost | How it shows up | Rough annual drag |
|---|---|---|
| Wasted outbound | You mail or call records with bad addresses/dead numbers | 15-25% of every campaign's spend |
| Double-touched homeowners | Two reps work the same duplicate; one looks foolish | Lost trust, occasional lost job |
| Buried warm records | Old "we'll call you after winter" deals never resurface | 5-15 jobs/year left on the table |
| Lying reports | Pipeline and close-rate numbers are garbage, so you plan blind | Bad hiring and ad decisions |
| Onboarding drag | New sales hires can't trust the data, so they ignore it | Slower ramp, more leaning on the owner |
If your average job is $11,000 and even five buried warm records a year would have closed, that is $55,000 of revenue sitting in a database you already paid for. The cleanup is not housekeeping. It is recovering inventory you forgot you owned.
Before you touch anything: back up and freeze a snapshot
Rule zero. Do not start cleaning a live CRM without a full export sitting safely outside the system. Cleanup involves merges and bulk deletes, and merges are frequently irreversible. The single most expensive mistake in any CRM cleanup is a bulk operation you cannot undo.
Do this first, every time:
- Full export of every object. Contacts, companies, deals/jobs, and (critically) the notes/activities and deal history. In HubSpot that is a full export of each object plus engagements; in JobNimbus/AccuLynx use the account export or ask support for a data export; in Salesforce use the Data Export / Weekly Export Service. Save the CSVs to a folder named with today's date.
- Export the field definitions too. A list of every custom property/field, its type, and its dropdown options. You will need this map in the standardize step, and it documents what "normal" looked like before you touched it.
- Snapshot your current headline numbers. Write down, today: total contacts, total open deals, total pipeline dollar value, and your reported close rate. After cleanup these numbers will move — sometimes dramatically — and you want to be able to explain why instead of panicking.
- Tell your team there's a freeze. Pick a 24-48 hour window (a weekend is ideal) where nobody edits records while you work. Concurrent edits during a merge spree create exactly the kind of corruption you are trying to remove.
If your CRM offers a sandbox or a restore point, use it. If it does not, the dated CSV folder is your restore point. Treat it like one.
Step 1 — Audit: find out how bad it actually is (the 60-minute health check)
You cannot fix what you have not measured. Before any cleaning, run a health check so you know the size of each problem and can prove progress later. Open your full contact export in a spreadsheet and build a simple scorecard. Each of these is a filter or a formula, not a guess.
Completeness checks (count the blanks):
- Contacts with no phone number AND no email — these are nearly worthless and inflate your count.
- Contacts with no street address — fatal for a roofing business, since the address is the asset.
- Contacts with no lead source — you cannot judge what you cannot attribute.
- Deals/jobs with no dollar value and no stage.
Duplication checks:
- Sort by phone number; count adjacent matches. Sort by normalized address; count matches. Sort by last name + zip. Each pass surfaces a different duplicate cluster.
Staleness checks:
- Records with no activity (no call, email, note, or stage change) in 12+ months.
- Open deals last touched 90+ days ago — your pipeline's walking dead.
Hygiene checks:
- Phone numbers that are not 10 digits.
- Email addresses with no "@".
- Addresses with obvious junk ("n/a", "same", "ask Bob", a single zip with no street).
Put the results in a table so the scope is undeniable. A typical first audit on a neglected 8,000-record CRM looks something like this:
| Problem | Records affected | % of file |
|---|---|---|
| No phone and no email | 1,150 | 14% |
| No street address | 640 | 8% |
| No lead source | 3,900 | 49% |
| Likely duplicates (phone or address) | 1,300 | 16% |
| No activity in 12+ months | 4,400 | 55% |
| Malformed phone | 720 | 9% |
| Open deals stale 90+ days | 310 | of open deals |
Two things usually shock owners here. First, how much of the file is genuinely dead (that 55% no-activity number is normal, not alarming). Second, how few records carry a lead source — which means every attribution report they have ever shown an investor or made a budget decision on was built on half-blank data.
Keep this scorecard. After cleanup you will re-run the same filters and the before/after table becomes your proof the week was worth it.
Decide your "keep" definition now, in writing
Before deleting or archiving anything, define what makes a record worth keeping. Write it down so the decision is consistent across 8,000 records instead of re-litigated 8,000 times. A sensible default for most roofers:
Keep any record that has (a) a usable contact method — valid phone or email — AND a street address, OR (b) any closed-won job in its history, OR (c) a real sales activity (quote, inspection, or stage change) in the last 24 months. Everything else is a candidate for archive, not the active database.
Note the word archive, not delete. We will get to why that distinction protects you.
Step 2 — De-duplicate without destroying your history
Duplicates are the worst offender because they actively cause embarrassment: two reps calling the same homeowner, a "new lead" alert for someone you sold in 2022, a mailer to a house you re-roofed last spring. But de-duping is also where people do the most damage, because a careless merge silently throws away notes, the older "first contact" date, or the lead source on the record you kept.
Find the duplicates the right way
Exact-match tools miss most real duplicates because roofing data is messy in predictable ways. Match on normalized values, not raw ones:
- Phone: strip everything but digits, drop a leading 1, compare the last 10. "(512) 555-0199", "512-555-0199", and "15125550199" are the same person.
- Address: this is the big one for roofers. Normalize before comparing — uppercase, expand or standardize "ST/STREET", "DR/DRIVE", "N/NORTH", drop unit punctuation, and compare street + zip. "123 N Main St" and "123 North Main Street" are one roof.
- Name + zip: catches the records that lost their phone and address but still belong together.
Run all three passes. Each catches duplicates the others miss. If your platform has a native dedupe tool (HubSpot's duplicate management, Salesforce's matching/duplicate rules), use it as the first pass, then run an export-and-sort pass yourself to catch the address duplicates the native tools usually fumble.
The golden-record merge rules
When you merge two records into one "golden record," decide in advance which field wins, so you are not making a judgment call every time. These rules survive contact with reality:
| Field | Which value wins on merge |
|---|---|
| First-contact / create date | The oldest date (protects attribution and tenure) |
| Lead source | The original source, not the most recent re-entry |
| Phone / email | The most recently verified working value |
| Address | The standardized USPS-valid version |
| Job history & dollars | Keep all — never let a merge drop a closed-won job |
| Notes & activities | Append all — concatenate, never overwrite |
| Owner / assigned rep | The rep with the most recent real activity |
| Lifecycle stage | The furthest-along stage (a past customer outranks a cold lead) |
The two that bite people: keep the oldest create date (so a homeowner you have known for four years doesn't reset to "new lead"), and never let a merge discard notes or a past job. Before you bulk-merge, do five merges by hand and confirm with your own eyes that notes and job history survived in your specific platform. Tools handle this differently and some quietly drop data. Five test merges costs ten minutes and saves you from a silent catastrophe across 1,300 records.
Worked example
You have these two records for the same roof:
- Record A: "Bob Sanders," created 03/2021, source "Door Knock," phone blank, address "123 n main", note: "Wanted to wait until after the holidays."
- Record B: "Robert & Linda Sanders," created 09/2024, source "Website Form," phone "(512) 555-0148" (confirmed good last month), address "123 North Main Street," no notes.
The correct golden record: name "Robert & Linda Sanders" (more complete), create date 03/2021, source "Door Knock," phone "(512) 555-0148," address standardized to "123 N MAIN ST," and the 2021 note preserved. That note — "wait until after the holidays" — is now four years old and the holidays are long over, which makes this a record you should call this week. That is the entire point of not throwing history away during a merge.
Handle the merges your rules can't decide
Even with golden-record rules, a slice of duplicates will be genuinely ambiguous, and how you handle those decides whether you trust the result. Three patterns come up constantly in roofing data:
- Two different roofs, one homeowner. A landlord or a repeat customer owns two houses. These are not duplicates — they share a person but not an address, and the address is your key. Link them (a parent company/contact relationship if your CRM supports it), never merge them, or you'll lose one roof's entire job history.
- One roof, two genuinely different people. A house sold; the 2019 record is the old owner, the 2024 record is the new one. Do not merge a real ownership change into one record — keep both, mark the old owner's record as a past customer, and treat the new owner as a fresh prospect on a roof you may already know the age of.
- Conflicting phone numbers, both plausibly current. When you can't tell which number is live, keep both (a primary and a secondary phone field), flag the record for verification on the next call, and let the rep confirm rather than guessing during a bulk merge.
The discipline here: when in doubt, link or keep, don't merge. A merge you regret is hard to reverse; a duplicate you left for next month's hygiene pass is cheap to catch later.
Step 3 — Standardize the fields so your reports stop lying
Once duplicates are gone, the file is smaller but still inconsistent. Standardization is what makes filtering, mailing, and reporting actually work. A "Lead Source" field with 31 spellings of seven real sources cannot produce an honest report no matter how good your dashboard is.
Build a field dictionary (one page, the whole company follows it)
Write a single reference doc that defines every field that matters, its allowed values, and its format. This becomes law. New hires get it on day one. A starter dictionary for a roofing CRM:
| Field | Type | Allowed values / format |
|---|---|---|
| Lead Source | Dropdown | Door Knock, Website, Referral, Call-In, Yard Sign, Repeat Customer, Storm Canvass, Purchased Lead |
| Job Type | Dropdown | Full Replacement, Repair, Inspection Only, Maintenance, Gutters, Other |
| Roof Material | Dropdown | Asphalt Shingle, Metal, Tile, Flat/TPO, Wood Shake, Unknown |
| Stage | Dropdown | (your pipeline stages — see Step 5) |
| Phone | Format | 10 digits, stored as 5125550148 or (512) 555-0148 — pick ONE |
| Address | Format | USPS-standardized, uppercase street suffix abbreviations |
| Roof Age Range | Dropdown | 0-5, 6-10, 11-15, 16-20, 21-25, 26+, Unknown |
| Last Storm Exposure | Date/text | Date or season of last significant hail/wind event at address |
Two rules make this stick. Use dropdowns, not free text, for anything you ever want to filter or report on. Free text guarantees drift. And always include an "Unknown" option — because the alternative to "Unknown" is reps inventing fake values to escape a required field, which is worse than an honest blank.
Clean the existing values in bulk
For each standardized field, export the current distinct values and map the mess to the clean list. A real "Lead Source" cleanup map looks like this:
| Messy values found | Maps to |
|---|---|
| "door", "DK", "knock", "canvass", "door-knock" | Door Knock |
| "web", "site", "form", "online", "FB lead" | Website |
| "ref", "referral", "friend", "neighbor said" | Referral |
| "phone", "call", "called in", "inbound" | Call-In |
| blank, "?", "n/a", "unknown" | Unknown |
Do this find-and-replace in the spreadsheet, then re-import, or use your CRM's bulk-edit by filter. Repeat for every dropdown field. This is tedious and it is also the step that turns a junk drawer back into a database. After it, "show me every Door Knock lead from last fall" returns a true answer for the first time in years.
Standardize the address field properly — it's the roofer's most important field
For most businesses the address is a nice-to-have. For you it is the primary key, because every roof is an address and your whole strategy depends on knowing which address has which roof. Treat address standardization as non-optional.
The professional standard is CASS-certified, USPS-normalized addresses: a consistent format, correct ZIP+4, and a flag for addresses the USPS cannot validate (which are often typos or non-existent). Many CRMs and most mailing/data vendors can run a CASS pass for you. The payoff is threefold: your duplicate detection gets far more accurate, your direct mail stops getting returned, and — the part most roofers miss — clean standardized addresses are what let you append outside data like roof age and storm history later. Garbage addresses match to nothing.
Step 4 — Archive the dead weight (don't delete it)
Now the cleaned, de-duped, standardized file still has thousands of records that fail your "keep" definition: no contact method, no address, no activity in years, no history. The instinct is to delete them. Resist it. Archive instead.
Why archive beats delete
- Compliance and disputes. If a homeowner ever calls about work you did, or you face a warranty or lien question, having the record — even a thin one — matters. Some records you are legally smarter to retain. Talk to your accountant about retention windows for anything tied to a completed job.
- Suppression. A deleted bad address can walk right back in through your next vendor CSV or website form. An archived record can sit on a suppression list so the junk never re-enters the active database.
- Re-activation optionality. A record with no activity in 18 months is not dead; it is dormant. Storms reshuffle the deck. A neighborhood you wrote off can become your best market overnight, and you will want those old addresses back in play instantly.
How to archive cleanly
Most platforms do not have a literal "archive" button, so you build one with status:
- Create a field/value like Record Status = Active / Archived / Suppressed.
- Bulk-set everything failing your keep rule to Archived.
- Make your everyday views, dashboards, and outbound lists default to Status = Active only. Now your working database is the clean core, but nothing is destroyed.
- Move confirmed bad data (undeliverable addresses, dead numbers, opt-outs, do-not-contact) to Suppressed, and check new imports against it.
After this step your "active" contact count drops — often by half. That is not loss. That is finally being able to see. A 4,000-record active database you trust beats an 8,000-record one you don't.
A word on do-not-contact and consent
While you are sweeping the file, flag and honor every opt-out, do-not-call, and do-not-mail request you find, and make sure those records are suppressed rather than merely archived. Calling and texting are governed by federal and state rules (the FTC's Telemarketing Sales Rule and the National Do Not Call Registry among them), and "the CRM was messy" is not a defense anyone wants to test. Building suppression into your cleanup is both cleaner data and basic risk management.
Step 5 — Fix the pipeline so the dollar number means something
A cleaned contact list is half the job. The other half is the pipeline — your deals/jobs and the stages they move through. In a messy CRM the pipeline is usually the biggest liar in the building: it shows $900,000 "open," but half of it is deals that died in 2023 and nobody closed-lost.
Define stages that map to reality
Most roofing pipelines have too many stages, vague stages, or stages that describe your internal steps instead of the homeowner's decision. Good stages are few, mutually exclusive, and each represents a clear change in the homeowner's commitment. A clean default:
| Stage | What's true when a deal is here | Exit trigger |
|---|---|---|
| New Lead | Contact made, not yet inspected | Inspection scheduled |
| Inspection Set | Appointment on the calendar | Inspection done |
| Inspected / Documented | Roof assessed, photos and measurements captured | Estimate built |
| Estimate Delivered | Homeowner has the written estimate in hand | Verbal or written go-ahead |
| Won — Scheduled | Signed, in the build queue | Build complete |
| Lost | Declined, unresponsive 60+ days, or chose another contractor | (terminal) |
The key discipline: a deal is never allowed to sit in a stage past its exit trigger. If an "Estimate Delivered" deal has not moved in 60 days and the homeowner won't respond, it is Lost, not "open." Marking it Lost is not admitting failure; it is making your pipeline number honest so you can plan against it.
Clean the existing deals
Run a stale-deal sweep using the audit data from Step 1:
- Pull every open deal last touched 60+ days ago.
- For each, one of three things is true: it's genuinely still alive (give it a next step and a date), it's dead (mark Lost with a reason), or it's a future-maybe (a "call us after the season" — mark it Lost but tag it for re-activation so it resurfaces at the right time).
- The reason-for-loss matters. "Price," "went with competitor," "no response," "timing — wants to wait" each suggest a different follow-up later. Capture it.
When you finish, your open pipeline number drops, often hard. The owner who saw $900k open and now sees $340k is not poorer than yesterday. He was always at $340k; he just finally knows it. That number is the one you staff and forecast against.
Build the re-activation queue
The deals you marked "Lost — timing/wait" are gold, and most CRMs bury them forever. Tag them so they form a standing list: "Estimate delivered, said wait, last touched 6+ months ago." That list is the warmest outbound you own — people who already let you on their roof and saw your number. We will sharpen exactly who to call first in the next step.
Step 6 — Layer roof age and storm signal so the clean list tells you who to call first
Here is where a roofing CRM cleanup goes beyond generic data hygiene. A clean, de-duped, standardized list is necessary but not sufficient. It tells you who you know. It does not tell you whose roof is actually due. Two homeowners can have identical clean records — same complete fields, same "Lost — wait" status — and one has a 7-year-old roof that needs nothing while the other has an 18-year-old roof that took two hail seasons. You should call those two people very differently, and your CRM as-is has no idea which is which.
That ranking signal is roof age and storm exposure per address. Once your addresses are standardized (Step 3), you can append two columns that change how the whole database behaves:
- Roof age as a range — not an exact install date, which usually doesn't exist for a house you've never worked on, but a defensible band like 16-20 years, estimated from aerial imagery. A range is honest and it is enough: "16-20 years" already separates a roof near end-of-life from one with a decade left.
- Storm exposure per roof — which significant hail and wind events that specific address has actually seen, and a modeled sense of how hard each roof was hit. A regional storm map tells you a county got hail; what you want is which roofs in it were likely worn out, because hail and wind don't hit every roof on a street equally.
This is the gap RoofPredict fills. It takes your cleaned, standardized address list and appends a roof-age range per address plus storm history and a per-roof impact model, then ranks the list so the homes most likely due float to the top. It is not a lead service and it does not hand you strangers — it enriches the database you just spent a weekend cleaning, so your own customers and your own territory get prioritized by who is actually due. Honest about its limits: roof age comes back as a range, not a precise date, and storm modeling is odds, not proof a given roof is damaged — you still send a human up the ladder to confirm. But as a way to sort 4,000 clean addresses from "call this month" to "leave alone," age-plus-storm beats alphabetical or gut feel every time.
What the enriched list lets you do
| Without enrichment | With roof-age + storm signal appended |
|---|---|
| Call old customers in random order | Call the 18-22 year roofs first |
| Mail the whole zip | Mail the streets where roofs are aging out |
| Treat every "Lost — wait" the same | Surface the wait-list homes whose roofs are now due |
| Re-knock a neighborhood blindly | Re-knock the blocks a recent storm likely wore out |
| Skip new roofs only if a rep remembers | Filter out the 0-8 year roofs automatically |
The last row is quietly the biggest win. A huge amount of roofing outbound is wasted on roofs that are too new to need anything. Tagging roof-age ranges lets you suppress the new roofs from outbound entirely — the same way you suppress opt-outs — so every dollar of mail and every hour of knocking goes to homes where a sale is physically possible.
A practical caution: append this signal as its own fields (Roof Age Range, Last Storm Exposure, Due Score) and treat it as a prioritization layer, not gospel. It tells your reps where to spend their first hours. The ladder still decides the job.
Step 7 — Make the cleanup permanent: rules that keep it clean
A cleanup you do once is a tax you pay forever. The goal is a CRM that stays clean because the system won't let it rot — remember the four forces from the top; this step is built to beat them. Five habits do almost all the work.
1. Required fields at intake, kept minimal. Make address, contact method, and lead source required to create a record — but only those three. Require too much and reps fake it; require too little and you're back to junk. Three fields is the sweet spot for roofing.
2. Dedupe rules on creation. Turn on your platform's duplicate detection so the system warns a rep before they create a second record for an address that already exists. Catching duplicates at the door is a hundred times cheaper than merging them later.
3. A single intake funnel. Every lead source — website, call tracking, canvass app, lead vendor CSV, the owner's referral texts — should flow through one normalizing step (a form, an integration, or one person who imports) that applies the field dictionary on the way in. The four forces named at the top all trace back to multiple un-normalized intake doors. Close them into one funnel and most of the mess never forms.
4. A monthly 30-minute hygiene pass. Re-run the Step 1 audit filters once a month. New duplicates? Merge them. New stale deals? Sweep them. Records missing the three required fields (sneaking in through a CSV import)? Fix or archive. Thirty minutes a month prevents the next all-weekend rescue.
5. A quarterly re-enrichment. Roofs age and storms keep happening, so refresh roof-age ranges and storm exposure on your active list each quarter. A home that was "10-14 years, call later" becomes "15-19 years, call now," and a storm that rolled through in spring reshuffles your whole re-activation queue. Clean data decays; an enrichment cadence keeps the prioritization current, not only the contact info.
Assign one owner
The single biggest reason cleanups don't stick: nobody owns the database after the consultant or the motivated owner moves on. Name one person — office manager, ops lead, even a sharp part-timer — as the CRM owner. They run the monthly pass, they hold the field dictionary, they approve bulk imports. Without a named owner, entropy wins by default. With one, a 30-minute monthly habit holds the line you fought all weekend to draw.
The full cleanup, as a checklist
Print this. Tape it next to the monitor. It is the whole playbook in one pass.
Prep
- Full export of contacts, companies, deals, notes/activities, and field definitions to a dated folder
- Snapshot headline numbers (total contacts, open deals, pipeline $, close rate)
- Announce a 24-48h edit freeze
Audit
- Run completeness, duplication, staleness, and hygiene filters; fill in the scorecard
- Write your "keep" definition in one sentence
De-dupe
- Match on normalized phone, normalized address, and name+zip
- Test five merges by hand; confirm notes and job history survive
- Apply golden-record rules (oldest create date, original source, keep all history)
Standardize
- Publish the one-page field dictionary
- Convert filter/report fields to dropdowns with an "Unknown" option
- Bulk-map messy values to clean values, field by field
- CASS/USPS-standardize every address
Archive
- Add Record Status = Active / Archived / Suppressed
- Archive everything failing the keep rule
- Suppress opt-outs, dead numbers, undeliverable addresses
- Default all working views to Active only
Pipeline
- Define few, clear, exit-triggered stages
- Sweep deals stale 60+ days to Lost with a reason
- Build the "Lost — wait" re-activation queue
Enrich & rank
- Append roof-age range and storm exposure per address
- Suppress new roofs (0-8 yrs) from outbound
- Sort active + re-activation lists by who's actually due
Keep it clean
- Three required intake fields; dedupe-on-create on
- One normalized intake funnel
- 30-minute monthly hygiene pass
- Quarterly re-enrichment
- One named database owner
What pros get wrong (and the edge cases nobody warns you about)
A few hard-won lessons that separate a cleanup that lasts from one you redo next year.
Deleting instead of archiving. Covered above, but it's the number-one regret. The record you delete today is the warranty dispute or the re-activation goldmine you needed next year. Archive and suppress; almost never delete.
Merging on raw values. Teams run a dedupe on exact phone or exact address strings, get a low duplicate count, and declare victory — while 900 address-variant duplicates sail through untouched because "123 N Main St" ≠ "123 North Main Street" to a naive matcher. Always normalize before you compare.
Over-cleaning the fields. A field dictionary with 14 required fields and 40 dropdown options is a dictionary reps will sabotage. They'll pick the first option to escape the form, and now your clean-looking data is precisely, confidently wrong. Keep required fields to three and dropdowns short.
Cleaning the data but not the pipeline. Plenty of shops de-dupe contacts and never touch their zombie deals, then keep forecasting against a fake pipeline number. The deal/job object is where the dollar decisions live. Clean it with the same rigor.
Treating roof age as a date. A real estate site's "year built" is not roof age — re-roofs are invisible to it, so a 1985 house can have a 2-year-old roof. If you append age, store it as a range from current roof data, label it as an estimate, and never let a rep treat "16-20 years" as proof a roof is shot. It's a prioritization signal; the inspection is the truth.
The storm-claims temptation. When you re-activate storm-exposed homes, keep your message and your CRM notes strictly on your side of the line. You can document the roof thoroughly, capture photos, and prepare an accurate written repair estimate aligned to standard estimating tools, then hand that estimate to the homeowner. What you cannot do — and should never tag, promise, or write into a CRM note as a sales angle — is negotiate or "handle" the claim for the homeowner, interpret their policy or coverage, promise a specific approval or payout, tell anyone their deductible will be waived or absorbed, or advertise a "free roof." The homeowner files; the insurer decides coverage; you document and estimate your own scope. Building that discipline into your stages and note templates keeps a clean database from quietly becoming a compliance problem. Roughly: your CRM should help you find and document the roofs likely due — never make claims about claims.
Expecting the platform to save you. No CRM cleans itself, and "AI" toggles don't fix a junk-drawer intake process. The platform enforces the rules; you still have to write the rules, name an owner, and hold the monthly habit. Tools are leverage on discipline, not a substitute for it.
A realistic timeline
For a single-pipeline shop with ~8,000 records and one focused person, here's how the work actually distributes. It is a weekend plus a few short follow-ups, not a month.
| Phase | Time | When |
|---|---|---|
| Backup + audit + keep rule | 2-3 hours | Friday evening |
| De-dupe (find, test, merge) | 3-5 hours | Saturday |
| Standardize fields + addresses | 3-4 hours | Saturday/Sunday |
| Archive + suppression | 1-2 hours | Sunday |
| Pipeline stage cleanup | 2-3 hours | Sunday |
| Append roof-age + storm, rank | 1-2 hours setup | Following week |
| Stand up the keep-it-clean rules | 1 hour | Following week |
The single weekend gets you a database you can trust. The following-week enrichment is what turns a trustworthy database into a ranked work queue — a list that doesn't just hold your customers but tells you, this morning, which of them to call first.
Bottom line
A messy roofing CRM is not a character flaw and it is not a discipline problem you can scold your way out of. It is the predictable result of an address-centric trade running on person-centric software with too many un-normalized intake doors and a seasonal flood of low-intent volume. Fix it in order: back up, audit, de-dupe on normalized values, standardize the fields and especially the addresses, archive the dead weight instead of deleting it, make the pipeline honest, then layer roof-age and storm signal so the clean list ranks itself by who's actually due. Lock it in with three required fields, one intake funnel, a 30-minute monthly pass, and one named owner.
Do that and the junk drawer becomes inventory again — and the records you recover, especially the warm "we'll call you later" homes whose roofs have quietly aged into the replacement window, are very likely the cheapest jobs you'll book all year. They're already in a database you've already paid for. They were just buried.
If you want the cleaned list to do more than sit there — if you want it sorted by which roofs are actually due, with the new roofs filtered out and the storm-worn homes floated to the top — that's exactly the layer RoofPredict adds on top of a clean database. Get your house in order first with the steps above; then let the data tell you which doors to knock. Book a demo and bring a few addresses you already know the answer on — you decide if it nailed them.
FAQ
How long does it take to clean up a messy roofing CRM?
For a typical single-pipeline shop with around 8,000 records and one focused person, the core cleanup — backup, audit, de-dupe, standardize, archive, and pipeline fix — is realistically a single weekend (roughly 12-16 hours). Appending roof-age and storm signal and standing up the keep-it-clean rules adds a few short sessions the following week. The exact time scales with file size and how many intake doors you have feeding the database.
Should I delete old or bad records, or keep them?
Archive, don't delete. Deleting is nearly always a mistake because a thin old record can matter for a warranty or lien question later, a deleted bad address walks right back in through your next import, and a dormant record may re-activate after a storm. Create a Record Status field (Active / Archived / Suppressed), set dead weight to Archived, default your working views to Active only, and move confirmed bad data and opt-outs to Suppressed.
What's the biggest mistake people make when de-duplicating a roofing CRM?
Two things. First, matching on raw values instead of normalized ones — '123 N Main St' and '123 North Main Street' are the same roof, but a naive match treats them as different, so hundreds of address duplicates survive. Always normalize phone, address, and name before comparing. Second, letting a merge silently discard notes, the older create date, or a past job. Test five merges by hand to confirm your platform preserves history before you bulk-merge.
How do I de-duplicate without losing my notes and job history?
Set golden-record rules in advance: keep the oldest create date, keep the original lead source, keep the most recently verified phone/email, keep the standardized address, and crucially keep all notes (append, never overwrite) and all closed-won job history. Then test the rules on five manual merges in your specific CRM, because tools handle merges differently and some quietly drop data. Only bulk-merge after you've confirmed history survives.
Why are my CRM reports and pipeline numbers wrong?
Usually two causes. Lead Source and other fields are free text with dozens of spellings of the same value, so no report can group them honestly — fix this by converting report fields to dropdowns and bulk-mapping the messy values to a clean list. And your pipeline is full of zombie deals that died years ago but were never marked Lost, inflating your open dollar figure. Sweep every deal stale 60+ days to Lost with a reason, and your pipeline number becomes one you can actually forecast against.
How many stages should my roofing pipeline have?
Few, clear, and mutually exclusive — usually five or six. A clean default is New Lead, Inspection Set, Inspected/Documented, Estimate Delivered, Won-Scheduled, and Lost. Each stage should represent a real change in the homeowner's commitment and have a defined exit trigger, and no deal should sit past its exit trigger. Too many vague stages are a top reason pipelines become unreliable.
How do I keep my CRM from getting messy again after I clean it?
Build the rules into the system so it can't easily rot: require only three fields at intake (address, contact method, lead source), turn on dedupe-on-create so reps are warned before making a duplicate, funnel every lead source through one normalizing step, run a 30-minute hygiene pass monthly, and name one person as the database owner. Without a named owner the cleanup never sticks.
Can I add roof age to my customer addresses, and how accurate is it?
Yes — once your addresses are standardized (USPS/CASS format), you can append a roof-age estimate per address from aerial imagery, plus which storms that specific roof has seen. The honest answer on accuracy is that roof age comes back as a range, like 16-20 years, not a precise install date, and storm modeling gives odds, not proof a given roof is damaged. That's still plenty to rank a clean list from 'call this month' to 'leave alone' and to suppress new roofs from outbound — but a human still confirms on the ladder before you sell the job.
Is roof age the same as the 'year built' on a real estate site?
No, and confusing the two leads to bad outbound. Year built is the age of the house; the roof has almost certainly been replaced one or more times since, and those re-roofs are invisible to real estate sites. A 1985 house can have a 2-year-old roof. To prioritize who's actually due, use a roof-age range estimated from current roof data, store it as an estimate, and never treat it as proof a roof is shot — the inspection is the truth.
When I re-activate storm-affected customers, what can I legally say about insurance?
Stay strictly on the documentation and estimate side. You may inspect, photograph, and prepare an accurate written repair estimate for your own scope of work and hand it to the homeowner. You may not, for a fee, negotiate or 'handle' the claim, interpret the homeowner's policy or coverage, promise a specific approval or payout, tell them the deductible will be waived or absorbed, or advertise a 'free roof' — that crosses into unlicensed public adjusting. The homeowner files the claim and the insurer decides coverage; your role is thorough documentation and an honest estimate. Keep that line out of your CRM note templates and sales scripts, not only your verbal pitch.
The Roofline by RoofPredict
Stay Ahead of Roofing Market Changes
Join The Roofline by RoofPredict for weekly roofing intelligence: material price signals, storm demand, insurance and regulatory updates, sales tactics, and local contractor opportunities.
Sources
- National Roofing Contractors Association (NRCA) — nrca.net
- Insurance Institute for Business & Home Safety (IBHS) — ibhs.org
- NOAA Storm Prediction Center (SPC) — spc.noaa.gov
- NOAA National Centers for Environmental Information — Storm Events Database — ncdc.noaa.gov
- National Weather Service (NWS) — weather.gov
- USPS CASS (Coding Accuracy Support System) — postalpro.usps.com
- USPS Address Information / Web Tools APIs — usps.com
- FTC — Telemarketing Sales Rule — ftc.gov
- National Do Not Call Registry — donotcall.gov
- OSHA — Fall Protection in Construction — osha.gov
- U.S. Census Bureau — American Housing Survey — census.gov
- International Code Council — International Residential Code (IRC) — codes.iccsafe.org
- U.S. Bureau of Labor Statistics — Roofers Occupational Outlook — bls.gov
- RoofPredict — roofpredict.com
Related Articles
How to Smooth Out Roofing Seasonality: A Year-Round Revenue Playbook for Contractors
Seasonality is not weather. It is a demand-and-scheduling problem you can engineer against with backlog, off-season product lines, retention math, and better targeting.
One-Man Roofing Business: How to Get More Work Without Buying Leads
A field-tested playbook for the solo roofer: where the next ten jobs actually come from, how to price so you keep money, and the systems that keep you booked when you're the whole company.
How to Make Roofing Revenue More Predictable
Predictable roofing revenue is built, not lucky. Here is the pipeline math, forecasting model, and operating habits that turn a feast-or-famine roofing business into a steady one.