AI in Finance

Why Finance AI Hallucinates — And How the Deterministic Layer Fixes It

by Matt Kopiec

June 5, 2026

8 min read

The CFO asks the question everyone in the room is thinking: “What is our EBITDA for Q1?” The AI tool responds instantly. The number is wrong by 14%. Nobody catches it until the board meeting. The demo looked flawless. The production deployment did not.

This is not a story about bad AI. It is a story about what the AI was sitting on. And understanding the difference between those two things is the most important thing a CFO can do before signing any AI-in-finance contract.

The real reason finance AI hallucinates

When an AI finance tool gives you a wrong number, the instinct is to blame the model. But in almost every case we have investigated at incro, the model was not the problem. The problem was upstream. Specifically, it was one or more of these three failure modes:

Failure mode 1: The AI computed the number. Most AI tools — language models, BI-integrated assistants, natural-language query interfaces — do not retrieve a pre-calculated metric. They compute it on the fly by interpreting whatever data they can reach. If your GL accounts are not consistently mapped across entities, if your intercompany eliminations have not been run, or if your revenue recognition policies differ between subsidiaries, the AI will produce a number that is internally consistent with the data it accessed and entirely inconsistent with reality.

Failure mode 2: The data is not consolidation-grade. A single-entity P&L is hard enough to get right. A multi-entity consolidation is exponentially harder, because every entity brings its own GL structure, its own close timing, and its own treatment of shared costs. AI on top of this produces confident-sounding numbers that reflect the chaos underneath, not the business.

Failure mode 3: There is no audit trail. The AI gives you a number. You ask where it came from. The answer is a plausible-sounding explanation that you cannot verify. This is not a security problem. It is a structural problem. If the number cannot be traced back to a source transaction, it cannot be trusted — regardless of how confident the AI sounds.

The fix: separate computation from explanation

The solution is architecturally simple even if it is operationally demanding to implement. You need to separate the thing that computes the number from the thing that explains it.

This is what we call the deterministic layer — a semantic layer anchored to your GL that pre-computes every financial metric using explicit, auditable formulas. EBITDA is not interpreted by the AI. It is calculated by the deterministic layer according to a defined rule set that reconciles to source. The AI sits on top of this layer and explains, narrates, and reasons — but it never calculates.

The phrase we use in every client engagement is this:

“We compute the number deterministically. We let AI explain it.”

This distinction eliminates hallucination at the source. The AI cannot produce a wrong EBITDA because it does not produce EBITDA at all. It receives EBITDA from the deterministic layer, traces the components, and then explains what drove the change.

The structural error in most AI-in-finance tools is asking the model to do two fundamentally different jobs at once: compute the number and explain it. Computation requires deterministic logic - the same inputs producing the same output, every time, traceable to source. Explanation benefits from natural-language reasoning. Conflating the two is precisely why an impressive demo breaks on contact with real data. The diagram below shows the difference between the two architectures.

The architecture in practice

The diagram below shows how the three layers of the incro Finance Foundry interact. Every source system — ERP, CRM, billing, banking, ops data — feeds into a GL-anchored semantic layer. The semantic layer is where all computation happens: deterministically, according to rules that your finance team can inspect and your auditors can verify. The AI layer sits entirely above the computation, reasoning on pre-calculated outputs.

Layer 01 — Foundation: Source systems (ERP/GL, CRM, Billing, Ops, Banking) are cleaned and reconciled entity by entity. The GL becomes the single anchor point for all downstream calculations.

Layer 02 — Deterministic Semantic Layer: All financial metrics — EBITDA, ARR/MRR, cash position, KPI matrix — are pre-computed using explicit formulas anchored to the GL. Nothing is inferred. Every number has a formula, a source, and an audit trail.

Layer 03 — AI Reasoning Layer: AI agents receive pre-calculated numbers and perform reasoning tasks: variance commentary, anomaly detection, proforma scenarios, CFO Q&A. The AI explains. It does not calculate. The result: zero hallucination risk, every answer source-traceable.

The key property of this architecture is that every number the AI touches has a source address. When a CFO asks “why did EBITDA drop by 8% in March?”, the AI does not guess. It accesses the variance breakdown from the semantic layer, identifies the cost lines that moved, and narrates the explanation. The number was pre-computed. The explanation is genuine reasoning on top of a clean foundation.

What this requires upstream

Building the deterministic layer is not a software installation. It is a data foundation project. Before the semantic layer can compute reliably, three things have to be true:

The chart of accounts must be consolidation-grade. Every entity in the group needs to map to a common GL hierarchy. Accounts that mean different things in different entities will produce silently wrong metrics at the group level. This is the most common failure point we find in diagnostic work — and the most impactful fix.

Intercompany transactions must be eliminated before aggregation. Revenue from entity A to entity B is internal flow, not group revenue. If this elimination is not automated and systematic, every group-level metric is overstated in a way the AI will confidently reproduce.

Close timing must be standardised across entities. An AI asked for group cash position on the 5th of the month will return a mixture of data from entities that have closed and entities that have not. The answer will be numerically precise and factually misleading. Near-real-time finance requires that close timing is aligned, or that the semantic layer explicitly handles the lag.

None of these requirements are new. Finance teams have known about them for years. What has changed is the consequence of ignoring them. When the interface was a spreadsheet or a dashboard, a wrong number was a visible artefact that a skilled analyst would catch. When the interface is a conversational AI that responds instantly and confidently, the wrong number reaches the decision-maker before anyone has had time to check it.

The questions to ask any AI-in-finance vendor

If you are evaluating AI-in-finance tools — and you should be, because the category is genuinely useful when implemented correctly — here are the four questions that will immediately distinguish vendors with a real architecture from vendors with an impressive demo:

1. Where is EBITDA computed? If the answer involves the AI model, walk away. EBITDA should be computed by a deterministic layer according to your chart of accounts rules, not inferred by a language model.

2. Can you show me the audit trail? Every number the AI surfaces should trace back to a source transaction or a pre-calculated metric with a defined formula. If the answer is “the AI explains the reasoning,” that is not an audit trail.

3. What happens with multi-entity consolidation? Any vendor who has not solved the intercompany elimination problem at the data layer — before the AI touches it — has not solved the problem at all.

4. What does the GL mapping look like? This question usually produces silence or deflection from vendors whose product is UI-first. It is the most reliable filter we know.

The order matters

The incro Finance Foundry has three layers for a reason. Foundation comes first: clean the data, entity by entity, system by system, until the consolidation is deterministic. Intelligence comes second: build the reporting, KPIs, and board pack on the clean foundation. Agents come third: deploy AI on top of verified numbers.

Skipping Foundation is why most AI-in-finance projects fail. The tool does not fail — the data underneath it fails, and the tool faithfully reproduces that failure at scale, with confidence, and without a visible error message.

The CFO who asked about Q1 EBITDA at the start of this article was not failed by AI. She was failed by a vendor who sold Layer 03 to a business that had not yet built Layer 01. The fix is not a better model. The fix is doing the boring work first.

If you want to understand what that work looks like for your group, the starting point is a conversation about your GL — not a product demo.

AUTHOR

Matt Kopiec

Co-founder

Matt co-founded incro in 2020. Since then, he has worked with over 100 companies - from early-stage startups to PE-backed businesses across a wide range of industries.

He is an expert in corporate finance, FP&A set-up, and forecasting, and has built a reputation for turning complex financial functions into clear, board&investor-ready systems.

At incro, Matt drives company growth while designing the analytical frameworks and solutions that define how the firm delivers transformation — including co-creating CFO Studio, incro's AI-powered financial analysis platform.

He holds a degree from SGH Warsaw School of Economics and is the author of Methods and Procedures in the Assessment of Capital Investments (Difin, 2023), a rigorous framework for evaluating investment profitability, now used as an academic reference. Matt was named to the Forbes 30 Under 30 Poland list in 2025.

TABLE OF CONTENTS

Heading 2

Want to see what we'd build for you?

Book a call

EXPLORE WITH AI

LET’S TALK

Your financial data won't fix itself.

30 minutes. We'll tell you exactly where your data is costing you money — and what AI can do about it.

Book a call

More from the blog

See all articles

AI in Finance

Power BI vs CFO Studio: Which Tool Should You Choose?

Both tools connect to the same data sources. The difference is purpose: Power BI shows what happened, while CFO Studio is built for the finance function — combining reporting, planning, conversation with data, and agents on one standardized dataset.

AI in Finance

Which Claude for Which Finance Process — And What Has to Be True Underneath

Most AI-in-finance projects fail not because the use case was wrong, but because the foundation underneath wasn't ready. A working guide to picking the right Claude deployment — agents, Cowork, chat, or hands-off — for the work in front of you.

Why Finance AI Hallucinates — And How the Deterministic Layer Fixes It

The real reason finance AI hallucinates

The fix: separate computation from explanation

The architecture in practice

What this requires upstream

The questions to ask any AI-in-finance vendor

The order matters

Want to see what we'd build for you?

Your financial data won't fix itself.

More from the blog

Power BI vs CFO Studio: Which Tool Should You Choose?

Which Claude for Which Finance Process — And What Has to Be True Underneath

FP&A Diagnostic & Roadmap

Investor Readiness & Preparation

Exit Readiness & Preparation

Group Finance Integration

AI Finance Readiness Assessment

Finance Function Design

FP&A & Controlling Process Build

Management Reporting Redesign

Board Reporting Setup

Financial Policy & Controls Setup

Finance Data Architecture

BI & Reporting Development

Financial AI Agent Implementation

Finance Process Automation

Finance Data Integration

CFO Studio

Case Studies

Projects

About

Manifesto

Contact

Blog

Resources