Monitoring

Why your cold email keeps going to spam — and the leading indicators you're ignoring

Cold-email senders find out their domains are dead two weeks after they died. Here are the leading indicators that surface decay before deliverability collapses — and how to monitor them.

Robin Criel 8 min read

Most cold-email teams find out their domain is dead two weeks after it died. Reply rate falls off a cliff, meeting bookings collapse, the founder asks why pipeline went quiet — and only then does the team start running Mail-Tester, pulling MXToolbox reports, checking Postmaster Tools. By that point the placement collapse is fourteen days old. The campaign has been talking to the spam folder for a fortnight, and the corrective action is already too late: the only honest path forward is replacing the domain.

This pattern is not a sign of incompetence. It is a sign of measuring the wrong things. Almost every cold-email team monitors lagging indicators — bounce rate, reply rate, meeting bookings — and almost none of them monitor the leading indicators that move first. This post is the playbook for the five leading indicators we watch at Lanello, the agency that built Mailnurse for exactly this problem.

The lagging-indicator trap

Reply rate, bounce rate, and meeting bookings are outputs. By the time they move, the underlying placement collapse is already ten to fourteen days old, because the reputation engines at Gmail, Workspace, and Microsoft 365 don’t change their minds on a domain in real time. They accumulate evidence — engagement, complaints, authentication consistency — over rolling windows of one to two weeks, and only then degrade the domain’s placement bucket. When you watch reply rate drop from 8% to 3%, you are reading a corpse. The domain has been declining for a fortnight; the metric is just catching up.

The fix is to instrument the inputs — the signals that move first, that the reputation engines themselves are reacting to. None of these inputs are exotic. They are all measurable today with off-the-shelf tooling. The reason most teams don’t measure them is operational: doing it across a single account is reasonable, doing it across fifty accounts is unbearable, and doing it across two hundred is impossible without a tool. We’ll get to the tool problem at the end. For now, the indicators themselves.

Leading indicator 1: Inbox placement percentile

The single best leading indicator is inbox placement percentile, measured continuously. You run a synthetic recipient pool — a spread of seed addresses across Gmail consumer, Google Workspace, Outlook consumer, Microsoft 365, and the major B2B ESPs (Yahoo Mail, Apple iCloud, Zoho, Fastmail) — and sample inbox-vs-promotions-vs-spam placement at regular intervals. The pool needs to be at least twenty seeds wide to be statistically useful; ten is too few, fifty is overkill for a single account.

What you measure is the percentage of test sends that land in the inbox bucket, broken out per provider so a Gmail-specific collapse is distinguishable from a Microsoft-specific one. The rolling 24-hour average is the key surface: a healthy account holds 95% or above. Any account where the 24h average drops below 85% deserves an investigation the same day. Below 70%, pause production sends from that account immediately and treat it as a likely-burned domain.

The leading-indicator behaviour here is unmistakable. Inbox-placement percentile typically begins drifting downward seven to ten days before reply-rate moves. The reason is that the reputation engines have already decided the domain is suspect; they’re just not enforcing it on real recipients yet. Synthetic placement testing reveals the decision before it propagates to production.

Leading indicator 2: Blacklist drift

The blacklist landscape is more nuanced than most cold-email tools admit. There are dozens of DNSBLs in active use, but only about eleven carry weight at the major receivers. The list we probe nightly is: Spamhaus ZEN (the big one), SORBS DUHL, Barracuda RBL, SpamCop, UCEPROTECT levels 1 and 2, the Composite Blocking List (CBL), Invaluement IVMURI, Mailspike Z, Hostkarma Black, and the Passive Spam Block List (PSBL). A clean account is listed on zero of these. A drifting account picks up its first listing — usually on UCEPROTECT-2 or PSBL, which are the most sensitive — between three and seven days before placement collapse.

The first-detection timestamp matters as much as the listing itself. A freshly-listed domain is a much stronger leading signal than an old listing that hasn’t moved; the rate of accumulation tells you whether the underlying behaviour has changed or whether you’re looking at a long-standing condition. We alert on any new listing within twelve hours of detection, and we alert with elevated severity on Spamhaus ZEN (any sublist) because Spamhaus is the listing most receivers actually consult.

A note on remediation: the removal-request workflow is different on every DNSBL, and most teams underestimate how much time the bureaucracy consumes. Spamhaus requires evidence that the underlying issue is fixed before they’ll delist; submitting a delisting form before you’ve actually paused the offending sends will just result in a polite rejection. SORBS DUHL auto-expires twenty-eight days after the last reported send, so for a domain you’ve already replaced, the cheapest path is to wait. UCEPROTECT-2 expires seven days after the last spam evidence. Invaluement requires a manual review by a human. Plan delisting effort accordingly.

Leading indicator 3: Authentication pass-rate

SPF, DKIM, and DMARC failures over the last 24 hours are one of the earliest signals a sending account can give you. A regression here typically appears seven to ten days before placement collapses, because the reputation engines accumulate authentication evidence over rolling windows: a single SPF fail is noise, but a 24-hour SPF fail-rate that climbs from 0.2% to 2% is a degradation the engines will react to in roughly a week.

A healthy sending account holds 99% or higher on all three checks. Once the 24-hour SPF fail rate climbs above 1-2%, dig in immediately — the most common causes are (1) a Workspace mailbox sending via an unauthorised relay, (2) a forwarding rule routing the mail through an intermediate hop that isn’t in your SPF record, or (3) an Instantly account whose sending IP changed without the SPF record being updated. DKIM failures are usually simpler — a key rotation that wasn’t propagated, or a mismatch between the signing domain and the From: domain.

DMARC is the consolidator. If you don’t have DMARC aggregate reports configured (the rua=mailto:... tag), set them up today; the aggregate reports are the single richest source of authentication-signal data in cold email. Parsing them is fiddly, but the data is uniquely valuable because it tells you what the receivers are seeing, not just what you think you’re sending.

Leading indicator 4: Warmup slope

For domains under thirty days old, the warmup engagement curve is the leading indicator. You’re not yet sending to real production lists; you’re feeding the domain through a warmup tool (Instantly’s built-in, Mailwarm, Warmup Inbox) that produces synthetic engagement on a defined schedule. Every warmup tool publishes a target slope — the rate at which open-rate and reply-rate are expected to climb over the warmup period — and the question is whether your account is matching that slope.

The threshold heuristic: if your account’s actual engagement curve falls below 70% of target slope for three consecutive days, the domain is not gaining reputation, and promoting it to production sends will result in immediate placement collapse. The cause is almost always upstream — usually DNS misconfiguration (missing MX, missing DKIM, A-record pointing to the wrong host) or a domain that was previously burned and re-registered with residual reputation history. A surprisingly high fraction of “fresh” domains are actually recycled, and the warmup slope is the cheapest way to detect it.

Once the warmup slope holds at 100% of target for seven consecutive days, the domain is production-ready. Promoting earlier is a discipline failure; the cost of premature promotion is the entire warmup investment, because a domain that goes to production prematurely typically burns within thirty days and has to be replaced from scratch.

Cold Email Weekly

Subscribe to Cold Email Weekly

Long-form writing on deliverability, infrastructure, and agency operations. One post a week, no fluff.

Leading indicator 5: Bounce velocity, not bounce rate

This is the one most teams get wrong. Almost every cold-email tool alerts on bounce-rate threshold — typically a hard threshold like “alert if bounce rate > 5%”. This catches the worst failures, but it misses the earlier signal: bounce velocity, defined as the rate of change in bounce rate versus a 7-day rolling baseline.

A jump from 0.8% to 2.5% bounce rate is alarming even though both numbers are “under threshold”. The velocity signal is what catches it. We alert on any 24-hour bounce rate that exceeds 2× the 7-day baseline, regardless of absolute value. In practice this catches a domain heading for a wall about four to six days before the absolute bounce rate would trip a 5% threshold.

A further refinement: hard-bounce velocity matters more than soft-bounce velocity. Hard bounces (5xx codes — mailbox doesn’t exist, domain doesn’t exist, recipient rejected) indicate list-quality issues or recipient-side blocks that compound; soft bounces (4xx codes — temporary failure, mailbox full) are noisier and often resolve on their own. We weight hard-bounce velocity at 3× the alert sensitivity of soft-bounce velocity, which keeps the signal focused on the failures that actually predict placement collapse.

How to actually monitor all five

The honest answer is that doing this manually means roughly an hour per account per day, which means roughly nobody does it. The DIY stack — Mail-Tester for synthetic placement, MXToolbox for DNSBL probing, parsing DMARC aggregate reports with a custom Python pipeline, and pulling bounce velocity from your sending tool’s API — works for about five accounts. Past twenty, the operational load becomes the whole job; past fifty, you stop doing it.

The product play is to consolidate the five indicators into a continuously-running pipeline that fuses them into a composite risk score per account and surfaces leading-edge alerts before campaigns crater. That’s what Mailnurse is. We run synthetic placement testing every four hours across a forty-seed pool, probe the eleven DNSBLs nightly, parse DMARC aggregate reports continuously, compare warmup slopes to targets daily, and compute bounce velocity in real time. The risk score is the fusion; the alerts are the surfaced leading edge. You can read more about how the monitoring stack works on the monitoring product page, or jump straight to pricing if you already know what you need.

A short note on what NOT to monitor

Two anti-patterns worth naming, because both are common and both waste effort.

Open rates as a primary deliverability signal. Open rates are recipient-side dependent: image blocking, Apple Mail Privacy Protection (which prefetches all images and inflates open rate to 100%), corporate mail-scanning gateways that prefetch links — all of these corrupt the signal. Open rate is useful for measuring warmup trajectories on synthetic pools where the recipient behaviour is controlled, and almost useless for production sends. If your team is alerting on production open-rate drops, you are reading noise.

Single-snapshot Mail-Tester runs. Running Mail-Tester once a week and grading on the result is monitoring theatre. Placement varies hour-to-hour — a 9 a.m. test and a 3 p.m. test on the same domain can differ by 20 percentage points purely because of the receiver’s current load and the time-of-day reputation weighting. The slope is the signal, not the snapshot. If you want to use Mail-Tester, automate it on at least a four-hour cadence and track the rolling average; one-off results are noise dressed as data.

Closing

You can’t fix what you can’t see. Cold-email infrastructure tells the truth backwards — in casualties, not in warnings — unless you measure the leading indicators. Mailnurse exists because at Lanello we couldn’t tolerate the lag any longer: the gap between a domain dying and us noticing was the most expensive operational cost in the agency, and the cost was getting worse as the fleet grew. We built the watchful instrument we needed. If your team is running more than twenty sending accounts, the math will say the same thing it said for us.

Mailnurse runs all five indicators on every sending account in your fleet, around the clock. Start a 14-day free trial or book a demo.

Cold Email Weekly

Subscribe to Cold Email Weekly

Long-form writing on deliverability, infrastructure, and agency operations. One post a week, no fluff.

Care, expressed as precision.

Cold-email infrastructure that watches itself — so you can focus on the campaign, not the chassis.

14-day free trial · No credit card · Instant setup