Data Provenance Report

The Cold Hard Truth About Data Quality

Building an identity graph is not a process that happens overnight. It requires years of gathering data sources, defining linkage, then the most important step — verification and cleaning for accuracy.

So what is the difference between AudienceLab data and everyone else? Why pay more for an intent list when someone can get it for $50 on Fiverr?

The Industry Landscape

There Are 2 Types of
Data Companies

Understanding this distinction is the single most important thing you can do before choosing a data partner.

Source-Level

Identity Graphs

Data Sourcing
Source the data directly, own the files
Linkage
Manage all identity linkage in-house
Revenue Scale
Typically billions in annual revenue
Volume
Work in large volumes with annual commitments
Accuracy
98%+ accuracy at the source file level
DDRs

Derivative Data Resellers

Data Sourcing
Resell or white-label API endpoints
Linkage
No control over linkage quality
Appeared
Many launched virtually overnight since 2022
Access
Purchase 'watered-down' derivative datasets
Accuracy
33-60% accuracy after multiple derivatives
How It Works

Where Do DDRs
Get Their Data?

Major identity graphs do not usually work in small amounts. Most require annual commitments at $250,000 to $1M+ just for access — and even more to license it.

This means in order to buy access, a data reseller needs to purchase a "derivative" dataset that has been watered down enough to be classified as a different product.

Real Example

Experian has a core dataset retailing at $3M–$5M/year. The mobile numbers from this raw consumer data may be used in a co-reg dataset so it is officially a "derivative product" and can be sold separately.

But here's the problem — the data quality goes down with every derivative and every new build. The decay is rapid.

The Accuracy Waterfall

Data quality degrades with each derivative build

Source File0%
Derivative Build 10%
Derivative Build 20%
Derivative Build 30%

This is the core difference between a core dataset worth millions of dollars and one you could grab on Fiverr for $50. It's all about provenance — being as close to the source file as possible.

Why This Matters

Very Few Companies Track
Data Provenance

Very few data companies do the due diligence to track the provenance of the dataset and make sure they have the real source — not a derivative.

Derivative datasets have the same labels but often contain data from dozens of different sources — only 10% of which are verifiable.

Data decay visualization showing quality degradation from source to derivative
98%
Source
60%
Build 1
40%
Build 2
33%
Build 3
The Real Cost

Derivative Reseller vs.
Original Identity Graph

The upfront price of derivative data looks cheaper. But when you factor in cleaning, verification, match rates, and wasted spend — the true cost tells a different story.

01

Consumer-Base Accuracy & Data Decay

Consumer data decays at roughly 2-3% per month. Source-level identity graphs refresh continuously. Derivative builds are snapshots that age rapidly, compounding the accuracy loss from being a derivative in the first place.

Identity Graph
95-98% accuracy, refreshed monthly
DDR / Derivative
33-60% accuracy, snapshot ages fast
02

Email Validation

Even if an email address is 'valid' (deliverable), it may not belong to the right person. Derivative datasets often have mismatched email-to-person linkages because the original linkage was lost in the derivative process.

Identity Graph
Verified email-to-person linkage
DDR / Derivative
Deliverable ≠ correct person
03

Probabilistic Matching Without Guardrails

Many DDRs use probabilistic matching to fill gaps in their derivative data. Without the original deterministic linkage, they guess — and those guesses compound errors across every record.

Identity Graph
Deterministic + verified linkage
DDR / Derivative
Probabilistic guessing, error compounding
04

Contact Data Accuracy

Phone numbers, job titles, and company associations change constantly. Source-level graphs track these changes. Derivatives inherit stale data from the moment they're created.

Identity Graph
Live-updated contact records
DDR / Derivative
Stale from day one, no update pipeline
05

Intent Data False Negatives

When your contact data is wrong, your intent signals are wrong. You're not just missing leads — you're building campaigns on phantom signals from people who don't exist at those companies anymore.

Identity Graph
Intent matched to verified identities
DDR / Derivative
Intent signals on ghost records
The Evidence

How Do These Numbers
Make Sense?

Let's walk through the actual cost to clean and verify data so it's usable — and the realistic identity match rates when you're working off derivatives.

Consumer-Base Accuracy & Data Decay

Consumer data decays 2-3% monthly. A derivative dataset that starts at 60% accuracy can drop below 40% within 6 months — with no refresh pipeline.

Source: 95-98%
Derivative: 33-60%

Email Validation

"Valid" only means deliverable. It does not mean the email belongs to the right person. Derivative datasets lose the email-to-identity linkage that makes the data actionable.

Source: Verified linkage
Derivative: Deliverable ≠ correct

Probabilistic Matching Without Guardrails

DDRs use probabilistic matching to fill gaps. Without deterministic anchors, every guess compounds errors — turning a 60% dataset into a 40% one after matching.

Source: Deterministic
Derivative: Probabilistic guessing

Contact Data Accuracy

Phone numbers, job titles, and company associations change constantly. Source graphs track changes in real-time. Derivatives inherit stale data from the moment they're created.

Source: Live-updated
Derivative: Stale from day one

Intent Data False Negatives

Wrong contact data means wrong intent signals. You're not just missing leads — you're building campaigns on phantom signals from people who don't exist at those companies anymore.

Source: Verified intent signals
Derivative: Ghost record signals
The AudienceLab Difference

We Track Provenance.
We Verify at the Source.

AudienceLab was built from the ground up as an identity-first platform. We don't resell derivative datasets. We source, link, verify, and continuously refresh our data — so you get accuracy that derivative resellers simply cannot match.

98%+
Source-Level Accuracy
600M+
Consumer Profiles
250B
Weekly Intent Signals
Side by Side

The Full Comparison

FactorIdentity GraphDDR / Derivative
Data SourceOwned & managed in-houseResold / white-labeled API
Base Accuracy95-98%33-60%
Refresh RateContinuous / monthlySnapshot, ages rapidly
Linkage TypeDeterministic + verifiedProbabilistic guessing
Email AccuracyVerified person-to-emailDeliverable ≠ correct person
Contact FreshnessLive-updated recordsStale from day one
Intent ReliabilityMatched to verified IDsGhost record signals
Provenance TrackingFull chain of custodyUnknown / unverifiable
Typical License$250K–$5M/year$50–$5K (you get what you pay for)
Cleaning CostMinimal (pre-cleaned)Significant (30-50% of records)

Stop Paying for
Derivative Data

Get access to source-level identity data with verified provenance. See the difference in your campaign performance from day one.