Comp Benchmarking With AI: 5-Source Method

When a 70-person SaaS founder told me she'd just lost three engineers in a quarter and couldn't figure out why, her comp data was a single screenshot from Levels.fyi for "Senior Engineer, San Francisco." Her engineers were remote and senior-to-staff. The screenshot was off by about 30%, and she didn't know.

Why does single-source comp data fail SMBs?

Each source has a different sampling bias. Levels.fyi over-indexes on big-tech IC roles in major US metros. Pave skews to venture-backed startups that opted into data sharing. Carta sees cap-table companies. Radford is enterprise-heavy. Raw scrapes catch whatever the public surface area exposes — usually job ads with optimistic ranges.

Definition: Sampling bias — when the data you can see systematically excludes the data you actually need. Every comp source has it; the question is which way.

A 30-500-employee company hiring a remote senior engineer in Eastern Europe is in none of these sources cleanly. Using any single one gives you a confident number that's wrong in a specific direction — and confident-wrong is the worst possible state for a comp decision.

What does the 5-source method actually look like?

You pull a slice from each source, you normalize them, and you let an AI workflow blend them with explicit weights. The blend isn't the answer — the spread is the answer. A narrow spread means the role is well-defined and the market agrees. A wide spread is a signal to investigate, not to average.

Source 1 — Levels.fyi

Strength: senior IC roles in tech, transparent levels framework. Weakness: US-metro skew, big-tech anchored. Pull: role + level + region.

Source 2 — Pave

Strength: venture-backed startup data, real ranges. Weakness: opt-in dataset, growth-stage skew. Pull: role family + stage + headcount band.

Source 3 — Carta

Strength: cap-table view, broad SMB coverage, equity data. Weakness: equity-heavy compensation may distort cash band. Pull: role + stage + region + cash-vs-equity split.

Source 4 — Radford / Mercer / similar enterprise survey

Strength: methodologically rigorous, true enterprise data. Weakness: expensive, slow, conservative. Pull: role family + revenue band + geography.

Source 5 — Targeted raw scrape

Strength: real-time market signal from active job ads. Weakness: optimistic ranges, posted-not-paid bias. Pull: 20-30 active postings from comparable companies, AI-extracts the band.

How does the AI workflow blend them?

The workflow does four things: normalizes role titles across sources, normalizes geography and currency, applies a weight matrix you control, and surfaces the spread.

Definition: Spread analysis — the gap between the lowest and highest source median for the same role, expressed as a percentage of the midpoint. Under 15% = converged. 15-30% = investigate. Over 30% = the role is wrong or the market is in flux.

The weights matter. For a senior engineering hire in a remote-friendly mid-stage SaaS, a defensible weight matrix might be Pave 30%, Carta 25%, Levels 20%, Radford 15%, scrape 10%. For an enterprise sales hire at a 200-person company selling to Fortune 500, Radford goes up to 40% and Levels drops near zero.

If you can't articulate why your weights look the way they do, AI is going to confidently average garbage. The weights are the judgment; the AI is the calculator.

Copy/paste prompt template — Multi-source comp blend

You are blending comp data from 5 sources for a single role.

Role: [TITLE]
Level: [JUNIOR / MID / SENIOR / STAFF / PRINCIPAL]
Function: [ENG / PRODUCT / SALES / DESIGN / OPS / FINANCE / OTHER]
Location: [METRO or REMOTE-region]
Currency target: [USD / EUR / etc.]
Company stage: [SEED / SERIES A / SERIES B / SERIES C+ / PROFITABLE-PRIVATE / OTHER]
Headcount band: [<50 / 50-150 / 150-500 / 500+]

Source data (paste tables or summary rows):
- Levels.fyi: [P25 / P50 / P75 base, bonus, equity]
- Pave: [P25 / P50 / P75 base, bonus, equity]
- Carta: [P25 / P50 / P75 base, bonus, equity]
- Radford or enterprise survey: [P25 / P50 / P75 base, bonus]
- Raw scrape (job ads): [low-high range across N postings]

Weights (must sum to 100):
- Levels: [N]
- Pave: [N]
- Carta: [N]
- Radford: [N]
- Scrape: [N]

Output:
1. Per-source normalized P50 base in target currency.
2. Weighted blend: blended P25 / P50 / P75 base, bonus target, equity target.
3. Spread analysis: range across source P50s, expressed as % of blended P50.
4. Spread verdict: CONVERGED (<15%) / INVESTIGATE (15-30%) / WRONG-ROLE-OR-FLUX (>30%).
5. Top 3 risks given the spread.

Do NOT recommend a number to offer. Output the band; the decision is human.

That last line matters. The prompt outputs a band and a spread verdict. The hire-makers decide where in the band to land and why.

Tool tip (Course for Business): Most SMBs that fail at comp benchmarking fail because the work sits with one tired generalist who never had time to learn the method. The Augment, don't replace framing in our 6-week program puts the comp workflow in the hands of the person actually making the offers — usually the founder, a head-of-people, or a senior manager — and trains them to run the 5-source blend in under 4 hours per role. The AI Champions (1:15-20) ratio means there's always someone in the building who can help when a wide spread shows up. Walk through the program at https://course.aiadvisoryboard.me/business.

Team scan (what AI champions report after week 1)

Comp workflow ownership: one named person per function (eng, sales, ops) — no orphan roles
Source access: Levels free, Pave or Carta via existing investor network, Radford via partner-share, scrape automated
Time-per-role: under 4 hours including the spread investigation
First 3 roles benchmarked: every blend produced a defensible band with documented weights
Wide-spread roles: 1 of 3 surfaced as "role definition is broken," not a comp issue — caught upstream
Decisions documented: every offer has a written "we landed at P60 because X" note
Manager education: 5 hiring managers ran the prompt themselves; comp conversations got shorter and calmer
Calibration cadence: weekly 30-min review of any new offer outside the band
Equity normalization: Carta equity converted to dollar value using current FMV; comparable across sources
Anti-pattern killed: "Levels says X" with no source-weight context — no longer accepted in hire decks

Micro-case (what changes after 7-14 days)

A 110-person profitable SaaS company was losing senior engineers and didn't know why. Their offers used a single Levels.fyi screenshot. They ran the 5-source method on three open roles. The blended P50 was about 22% above the offer they'd been making. The spread on one of the three roles was 38% — wide enough to flag the role as poorly defined, which it was; it was a "Senior Engineer" req that was really a Staff role with team-lead expectations. They split the role, re-leveled it, and re-benchmarked. Two weeks later they closed a senior engineer at a number that was inside the blended band, defensible to the board, and no longer dependent on a screenshot.

Note on this case: This example is illustrative — based on typical patterns we observe with companies of 30-500 employees, not a single named client. Specific numbers are rounded approximations of common ranges, not guarantees.

Tool tip (Course for Business): The hardest part of teaching comp benchmarking is that hiring managers want a single number to put in the offer letter, not a band with a weight matrix. Shoulder-to-Shoulder hot seats in our 6-week program run a live offer through the 5-source workflow with the actual hiring manager in the chair — by the end of the session they can defend the band to the candidate, to finance, and to the board. That's the moment comp benchmarking stops being a People-Ops black box. Book a 30-min mapping call at https://course.aiadvisoryboard.me/business.

FAQ

Do I need access to all 5 sources? Three minimum, four ideal, five if you want defensibility for a board conversation. Levels and a scrape are free; Pave or Carta usually accessible via your investors; Radford is the expensive one. Start with three and add as roles get more senior.

What if the spread is over 30%? Stop benchmarking and re-define the role. A 30%+ spread almost always means the title is doing too much work — it's covering two different jobs in the market, and you're trying to compress them into one offer. Split the role or re-level it before re-running the blend.

How often should I re-benchmark? Quarterly for active hire roles, semi-annually for filled roles. The market moves; a band that was right in January is often wrong by July. The 4-hour-per-role cost makes quarterly cheap.

Is this overkill for sub-50-person companies? No — it's more important under 50, because every hire is a bigger fraction of comp budget and one bad offer compounds for 18 months. Skip Radford if cost is real; the 4-source version is still defensible.

Can the AI just tell me the offer number? No, and don't trust any tool that says it can. The blend produces a band; the position-in-band is the judgment. If you outsource that judgment, you'll over-pay for the wrong reasons and under-pay for the right ones.

Conclusion

Single-source comp data gives you a confident wrong number. Five sources, AI-blended, with documented weights and an honest spread, gives you a band you can defend. Pick three sources you can access by Friday. Run a role you're currently hiring. Look at the spread before you look at the midpoint.

If you want every employee to ship their first AI automation in five days — including the People-Ops generalist running comp without a team — book a 30-min call and we'll map your team's first week at https://course.aiadvisoryboard.me/business.

Comp Benchmarking With AI: 5-Source Method for SMBs

TL;DR

Why does single-source comp data fail SMBs?

What does the 5-source method actually look like?

Source 1 — Levels.fyi

Source 2 — Pave

Source 3 — Carta

Source 4 — Radford / Mercer / similar enterprise survey

Source 5 — Targeted raw scrape

How does the AI workflow blend them?

Copy/paste prompt template — Multi-source comp blend

Team scan (what AI champions report after week 1)

Micro-case (what changes after 7-14 days)

FAQ

Conclusion

Frequently Asked Questions

Ready to transform your team's daily workflow?

Get weekly insights on team management

Related Articles

The CS 1-on-1 Template That Catches Churn Risks 30 Days Early

Cross-Functional AI Meeting Prep: Same Context for Everyone

Contract Review with AI: a 3-Tier Triage Process for SMBs