Pattern Chaser: Audit Analytics for Internal Audit

PATTERN CHASER

Audit Analytics for Internal Audit in Financial Services

This Fraud Test Is a Comfort Blanket

It’s built into most audit analytics tools. Taught in countless fraud courses. Elegant math. Poor fraud detection.

Hey Reader 👋

There’s a mathematical formula discovered by accident in a library 145 years ago.

It’s built into most major audit analytics tools. It’s taught in countless fraud examination courses. It looks elegant and scientific. And the academic evidence is damning.

It almost never catches fraud.

Worse: when researchers tested clean, real-world datasets that clearly follow Benford’s distribution, the standard statistical test rejected them anyway. The method meant to catch anomalies creates false alarms on legitimate data.

In today’s issue:

↳

The formula, its strange origin, and why it fails at real fraud rates

↳

Three audit scenarios that guarantee false positives

↳

What actually works instead

Examples in this newsletter are composites drawn from industry experience, peer conversations and published research. Not attributable to any single organisation.

DEEP DIVE

The Mathematical Comfort Blanket

In 1881, astronomer Simon Newcomb noticed something odd in his library. The early pages of logarithm books were worn and dirty. The later pages were clean. People were looking up numbers starting with 1 far more often than numbers starting with 9.

Nobody cared. His observation was ignored for 57 years until physicist Frank Benford rediscovered it in 1938, applied it to 20,000+ observations across 20 different datasets, and published the finding. The formula now bears his name, not Newcomb’s. Stigler’s Law strikes again: no scientific discovery is named after its original discoverer.

The pattern is real. In naturally occurring datasets that span several orders of magnitude, the digit 1 appears as the first digit about 30.1% of the time. Digit 9 appears only 4.6% of the time. It works for river lengths, population sizes, stock prices, and street addresses.

And auditors weaponised it. The theory: if someone fabricates numbers, they won’t follow Benford’s distribution. Deviations from the expected pattern signal potential fraud. It’s elegant. It’s mathematical. And it has three fatal problems.

Problem 1

Detection fails below 10-12% fraud rate

In 2008, researcher Mark Linville ran controlled experiments. Graduate students seeded fraudulent invoices into clean datasets.

The result? Digit analysis failed to detect fraud in every single trial when fraudulent data comprised less than 12.5% of the population.

Think about what that means. If a fraudster is careful enough to keep their fake invoices under 10% of total volume, Benford won’t see them. And most real fraud operates well below that threshold. The test is structurally under-powered for actual audit environments.

Problem 2

Chi-squared rejects clean data at scale

The standard statistical test for Benford conformity is the chi-squared test. Think of it as the pass/fail gate: it compares your observed digit distribution against the expected Benford distribution and tells you whether the difference is statistically significant.

The problem? Chi-squared is sensitive to sample size, not fraud. On any sufficiently large dataset, it will reject Benford conformity even when the data is completely clean.

Kossovsky demonstrated this in 2021 by testing six real-world datasets with clear Benford conformity. The chi-square test rejected all six. Not because they deviate from the expected distribution, but because the test is sample-size sensitive. At high transaction volumes, it flags conformity failures that are too small to matter. The test stops measuring fraud risk and starts measuring how much data you have.

Problem 3

Policy-driven patterns break the assumption

Benford’s Law assumes numbers arise from natural, multiplicative processes. Most audit populations don’t. Common patterns that violate Benford’s assumptions:

↳

Approval thresholds: expenses capped at £10,000 create heavy digit-9 representation

↳

Psychological pricing: retail transactions ending in .99 or .95

↳

Round numbers: budgets, salaries, and contract values set at £50,000 or £100,000

↳

Narrow ranges: invoice populations that don’t span multiple orders of magnitude

These aren’t edge cases. They’re the norm. And every one of them will produce Benford anomalies that have nothing to do with fraud.

The $1,153.35 Invoice Spike

A fraud examiner ran Benford analysis on an accounts payable population. The test flagged a spike at first-two-digits “11” as anomalous. Investigation revealed 2,263 invoices at exactly $1,153.35.

The cause? A bulk purchasing price for items bought six times a day from two legitimate vendors. Completely normal business activity. Days of investigation time consumed by a mathematical false positive.

The ACFE’s Report to the Nations consistently finds that around 43% of occupational fraud is detected by tips. Three times more than any other method.

Proactive data monitoring and analysis is associated with significantly lower fraud losses.

Benford’s Law is not the analytics. It’s the warm-up act that keeps the real work from happening.

WHAT’S POSSIBLE

The 3-Step Alternative

Instead of leading with Benford’s Law and hoping it catches something, flip the sequence.

Step 1: Profile your population

Before running any test, understand what normal looks like. Calculate the average invoice value by vendor over the past 12 months. Flag any vendor more than 2 standard deviations above their historical mean. Do the same by employee, by cost centre, by transaction type. This surfaces anomalies that matter, not anomalies that exist because someone set a policy threshold at £10,000.

Step 2: Run targeted tests

↳

Duplicate detection: same amount, same vendor, same date, different invoice numbers

↳

Gap analysis: missing sequence numbers in controlled document series

↳

Threshold analysis: transactions clustering just below approval limits

Step 3: If you still want Benford, use MAD

If your population genuinely qualifies for Benford analysis (spans several orders of magnitude, no policy constraints, at least 1,000 records), use the Mean Absolute Deviation (MAD) test instead of chi-squared. MAD is less sensitive to sample size and produces fewer false positives. But even then, treat the results as a conversation starter, not a finding.

The profession has better tools. Duplicate detection, gap analysis, threshold clustering. These target specific fraud schemes. They produce actionable results. And they don’t waste three days chasing a legitimate bulk purchase order.

BEST LINKS

Worth your time.

🔍 THE PATTERN

Benford’s Law and Fraud Detection: Facts and Legends. A rigorous look at what the research actually says about digit analysis. Spoiler: the legends don’t hold up. The false negative problem alone should give every audit function pause.

🧠 THE MIND

Confirmation Bias: Why We Keep Using What Doesn’t Work. Farnam Street on the mental trap: we favour evidence that supports what we already believe. Darwin’s rule: write it down immediately when you find evidence against your position. Munger’s rule: never hold an opinion unless you know the other side better than they do.

📊 THE PROOF

Google: AI-Powered Attacks Now Self-Modify to Evade Detection. LLMs can identify logical flaws invisible to pattern-matching tools. If fraudsters adopt the same approach, static statistical tests like Benford’s won’t see them coming.

🗞️ WHERE IT’S HEADING

IIA Survey: Only 4 in 10 Audit Functions Ready for AI-Enabled Fraud. The profession knows the threat is evolving. Most don’t feel prepared. Static methods like Benford’s Law won’t close that gap. The shift to behaviour-based detection is already underway.

THAT’S A WRAP