Blog
How to Handle False Positives in Website Monitoring
False positives are the fastest way to make alerts useless. If you get too many "down" alerts that are not real, you stop trusting them. Then the real outage gets missed.
A false positive happens when your monitoring probe cannot reach the site, but real visitors can. The site might be fine. The network path might not be. Or your probe might be blocked.
You can cut false positives with one rule.
Do not page on a single failed check. Verify first with a repeatable workflow and clear evidence. Multi-location confirmation is a simple baseline for this.
A Practical Verification Workflow
Recheck From Another Region
Run the same check from at least one other location. If only one region fails, you likely have a routing issue, a regional block, or a CDN edge problem. A good default is to require failures from two or three locations before you notify.
Test in a Real Browser
Open the URL in an incognito window. Then try from mobile data. Your probe might be blocked while browsers pass. Or a browser might fail because it loads more assets than your probe.
Split the Failure Into Layers
Log what broke. Do not just log "timeout."
- Check DNS resolution
- Check TCP connect and TLS handshake
- Check HTTP status code
- If you got HTML, scan for error text
Confirm Content, Not Only Status
A page can return 200 and still be broken. This shows up a lot in WordPress. Database errors and PHP fatal errors can render inside a 200 response. Keyword scanning catches this.
Decide: Down, Degraded, Blocked, or Regional
- Down means most locations fail and you see hard errors
- Degraded means the site responds but is slow, rate-limited, or unstable
- Blocked means your probe got denied
- Regional means specific locations fail
False Positive Examples
Example 1: WAF Blocks Your Probes
Your monitor checks every minute from a cloud IP range. A WAF flags it and returns 403, 406, or a challenge page. Real users still load the site.
What to do:
- Store the status code and a short response snippet
- Vary headers and user agent
- Offer allowlist guidance
Example 2: Rate Limiting Looks Like Downtime
An API returns 429 during peak traffic. Your monitor retries twice and still gets 429, so it alerts "down."
What to do:
- Treat 429 as degraded, not down
- Alert on error rate over a window, like 3 failures out of 5
- Monitor a lightweight health endpoint
Example 3: CDN Edge Issues in One Region
A single CDN edge starts timing out. Your probe in Europe fails. Your probe in the US succeeds.
What to do:
- Mark it as regional impact
- Show which locations failed and which passed
- Add a fallback check that hits the origin, if possible
Example 4: DNS Inconsistency During Changes
A DNS update propagates unevenly. One resolver returns an old IP. Your probe fails, but other probes and users hit the new IP.
What to do:
- Query multiple resolvers and compare answers
- Log the IP you got per location
- Delay alerting until results align
Example 5: Cache Hides the Real Problem
Your WordPress cache serves a clean homepage to monitors while the uncached site throws a database error. Cache-bypass checks fix this by adding a query parameter like ?site_check=1.
Build This Into Your Monitoring Strategy
- Add multi-location confirmation before notifications
- Show evidence in the alert: location, DNS result, status code, response time, and matched keywords
- Use clear incident labels: down, degraded, blocked, regional
- Add a quick automatic recheck to filter one-off network hiccups