CFO anti-patterns with AI — letting LLMs calculate figures

CFO anti-patterns with AI — letting LLMs calculate figures

5/9/20263 views9 min read

TL;DR

  • LLMs are not calculators.** Letting one do arithmetic in a CFO context is the single most damaging finance anti-pattern.
  • AI miscalculations don't fail loudly — they fail plausibly. Without audit trails, the error lives in the model.
  • Plan vs Fact vs Gap on finance workflows surfaces every one of the patterns below within a week.

When a CFO of a 90-person SaaS firm told me their forecast had a $400K error because the model "did the math wrong" inside an LLM, my reaction was not anger but recognition — I'd seen that exact mistake six times in twelve months. The CFO is the role most exposed to AI's quiet failure modes.

Why CFO mistakes are uniquely dangerous

Finance is the role where wrong-but-confident AI output looks the most like correct AI output. A made-up number with two decimals reads identical to a real one. Gartner's research famously found that CIOs miscalculate AI infrastructure costs by up to 1,000% — the CFO function is supposed to catch that, but only if the CFO themselves hasn't outsourced their numerical thinking to the same tools.

Anti-pattern 1 — Letting an LLM do the arithmetic

What it looks like: "ChatGPT, model our cash flow for the next two quarters." The model produces a confident, well-formatted spreadsheet. Three of the line totals are wrong by 4-12%. Nobody notices for two weeks.

Why it happens: LLMs feel like analysts. Their output reads like analyst output. They don't say "I can't do reliable arithmetic" — they say "$847,230" with no caveat.

Visible damage: Forecast variance the board can't reconcile. Decisions made on bad numbers. The Builder.ai $1.3B collapse had AI-narrative components, but the more common SMB version is quieter — six-figure forecast errors that don't get caught until quarter-end.

What to do instead: LLMs orchestrate; deterministic tools calculate. Use the LLM to read inputs, shape the question, and explain results — but the actual numerical ops happen in code (Python, SQL, spreadsheet formulas, dedicated agents that call tools). Modern agentic frameworks (function-calling, tool use) make this easy and cheap. The phrase to internalise: "LLMs orchestrate, calculators calculate."

Definition: Plausible-wrong output — an AI answer that is well-formatted, confident, and structurally correct but numerically off. Hardest to catch in finance because the form is what people verify.

Anti-pattern 2 — Deploying AI without an audit trail

What it looks like: AI-assisted journal entries, AI-suggested expense categorisations, AI-drafted variance commentary — all with no record of which model, which prompt, which inputs produced the output.

Why it happens: Speed. Adding logging feels like overhead. The CFO trusts the senior analyst who's running the AI workflow.

Visible damage: First time auditors ask "how did you arrive at this allocation?" — silence. EU AI Act fines reach €35M or 7% global turnover for high-risk systems without proper documentation. SOX, GDPR, country-specific tax authorities — they all expect you to reproduce the decision trail.

What to do instead: Every AI-assisted finance output gets logged: model version, prompt, inputs, raw output, human review step, final entry. Tools to do this exist; the discipline is the bottleneck. If the workflow can't produce a trail, it shouldn't be in finance.

Anti-pattern 3 — Miscalculating AI infrastructure cost

What it looks like: The CFO greenlights an AI platform based on a per-seat licence price. Six months later actual costs are 3-10× higher because of token consumption, vector database storage, embedding regeneration, and integration engineering.

Why it happens: Vendors quote licence prices. They don't quote token bills, retraining cycles, or the eng time to keep the system fed. Gartner's 1,000% miscalc figure didn't come from nowhere.

Visible damage: A line item that grows quietly in cloud spend. By the time it's a board topic, the platform is too embedded to walk away from.

What to do instead: Total cost of ownership model with three parts: licence + variable usage + integration/maintenance. For LLM workloads specifically, the Anthropic Batch API is 50% cheaper than per-call for non-realtime work — that single decision can halve a meaningful budget line. Build cost monitoring into the deployment from day one, not after surprise.

Definition: Variable AI cost — the per-token, per-query, per-vector cost that accumulates with usage and is invisible at procurement time but dominant at month 6.

Anti-pattern 4 — Single-month ROI thinking

What it looks like: The CFO measures AI ROI at month 1 or month 3. Numbers are unimpressive. Programme cut.

Why it happens: Finance instincts demand fast feedback. Most other capex evaluates well at 90 days.

Visible damage: Cancelling productive AI investments before they hit the productivity dip exit. Microsoft's internal data shows 89% of users who push past the productivity dip stay active 20 weeks later. Cut at week 8, you lose them; measured at week 24, you keep them.

What to do instead: AI ROI is a 6-12 month curve, not a single point. Measure adoption, then time-saved, then quality, then financial impact — in that order, with appropriate horizons. Build a 12-month evaluation cycle into the contract, not a 90-day one.

Anti-pattern 5 — Skipping CFO-level AI training

What it looks like: The CFO has not personally used the AI tools they're funding. They sign cheques on demos and team summaries.

Why it happens: Time. The CFO's calendar is the most constrained at the executive table.

Visible damage: Finance becomes the function least equipped to challenge or extend AI plans. BCG's 5-hour training threshold says programmes under five hours don't change behaviour — a CFO with zero hours has zero ability to spot the LLM-doing-arithmetic anti-pattern when it walks past them in a deck.

What to do instead: Five hours, on your own workflows. Run a variance analysis with AI assistance. Draft one investor letter. Reconcile one cycle. The literacy compounds; the AI Tax (~37% rework) drops fastest in finance because verification is your native skill.

Manager scan (2-minute digest example)

  • Plan: Q3 forecast assembled with AI assistance. Target: 3-day cycle vs 7.
  • Fact: Cycle 4 days. Three line items 4-12% off vs source data.
  • Gap: LLM did arithmetic on raw data — no calculator-tool integration. No audit log on how outputs were derived.
  • Plan: AI expense categorisation for travel & hospitality.
  • Fact: 67% accuracy. 33% needs rework.
  • Gap: Training data didn't include 2024 vendor rebrands. AI Tax 37% confirmed.
  • Plan: AI infra budget: $48K/year per platform contract.
  • Fact: $11K/month token bill in month 5. Annualised $132K.
  • Gap: No variable-cost model at procurement. No batch API consideration for non-realtime workloads.
  • Plan: AI-drafted month-close commentary.
  • Fact: Drafts reviewed and rewritten 60% of the time.
  • Gap: No audit trail; same prompts used across teams without finance context.

Tool tip (AIAdvisoryBoard.me): AI Advisory Board's Plan → Fact → Gap diagnostic catches the CFO anti-patterns before they hit the P&L. The daily digest shows where AI workflows are producing variance, where audit gaps exist, and where infrastructure costs are diverging from forecast — across finance, ops, and the rest of the business. It's the cheapest second pair of eyes a finance function can buy. See it: https://aiadvisoryboard.me/?lang=en

Micro-case (what changes after 7-14 days)

A 140-person professional services firm had a CFO who'd approved three AI tools across the company over 18 months. None of them had audit trails. None of them had variable-cost monitoring. The diagnostic surfaced two things in week one: the FP&A team was producing forecasts via LLM that had a 6-9% line-level variance against source systems, and the actual platform spend was running 4× the budgeted licence cost because of token volume. Total monthly correction: ~$25K of saved spend, plus a forecast reconciliation that prevented an over-hire decision in Q4.

Note on this case: This example is illustrative — based on typical patterns we observe with companies of 30-500 employees, not a single named client. Specific numbers are rounded approximations of common ranges, not guarantees.

Tool tip (AIAdvisoryBoard.me): Run a Plan → Fact → Gap diagnostic on your finance team for one week before the next AI tool decision. The output: where forecasts diverge from source data, where audit trails are missing, and which workflows are consuming variable cost faster than budgeted. AI Advisory Board surfaces this without finance integration projects: https://aiadvisoryboard.me/?lang=en — it's a fraction of one quarter's worth of forecast-error damage.

FAQ

Q: We use Excel formulas with AI assistance — is that the same as letting the LLM do arithmetic? No, that's the right pattern. The LLM helps you write or audit the formula; the formula does the math deterministically. The anti-pattern is when the LLM produces the number directly, in chat, without a calculator step.

Q: How do I evaluate AI tools that promise finance ROI? Ask three questions: where's the audit trail, what's the variable-cost model at our usage, and what's the 12-month ROI curve. If the vendor can't answer all three concretely, you don't have enough to sign.

Q: Should I build an internal AI finance team? For 30-500-person companies, usually no. One AI-literate FP&A lead plus a clear vendor architecture beats an in-house team that doesn't have enough volume to specialise. Hire the literacy, buy the platforms.

Q: Is there a regulatory deadline I should know about? EU AI Act phasing through 2026-2027 — high-risk finance systems face strict documentation and audit requirements. Even non-EU firms with EU customers are exposed. Start the audit-trail discipline now; retrofitting it is expensive.

Conclusion

The CFO's AI job is not to use the most AI. It's to ensure every AI workflow that touches a number can be reproduced, audited, and budgeted. Five anti-patterns above all share a fix: visibility into what AI is actually doing in finance workflows, daily, in plain language.

If you want a system that surfaces the Plan → Fact → Gap automatically — every day, across finance and ops — see how the 7-day diagnostic works: https://aiadvisoryboard.me/?lang=en

Frequently Asked Questions

AI-Powered Solution

Ready to transform your team's daily workflow?

AI Advisory Board helps teams automate daily standups, prevent burnout, and make data-driven decisions. Join hundreds of teams already saving 2+ hours per week.

Save 2+ hours weekly
Boost team morale
Data-driven insights
Start 14-Day Free TrialNo credit card required
Newsletter

Get weekly insights on team management

Join 2,000+ leaders receiving our best tips on productivity, burnout prevention, and team efficiency.

No spam. Unsubscribe anytime.