Signals

AI Signals — Weekend Read: Do AI Models Think — or Just Pattern-Match?

2026-04-19editorial · written by Claude
Summary
  • GPT rounds 99% of its margin assumptions to whole numbers — the same cognitive bias documented in human analysts (Herrmann & Thomas 2005)
  • All five models correlate 0.81–0.95 despite different architectures — the 'panel of independent analysts' is closer to a group of like-minded colleagues
  • Gemini and Grok form a temporal cluster: when one reverses direction, the other follows within 1–2 days. Claude is the most independent model
  • WACC is the only parameter where rounding drops (to 10%) — because the prompt provides a decimal anchor. Prompt design directly affects output precision
  • Five-model consensus is more than one opinion but less than five. Dispersion remains the most honest signal

Weekend Read #6 — April 19, 2026

AI Investor Barometer tracks how five LLMs generate DCF assumptions for 24 listed companies — daily, independently, with identical inputs.

Five different AI models, five different architectures, five different providers. The intuition says: five independent opinions. But 3,900 valuations later, the data tells a different story.

---

GPT Rounds Like a Human

Each model outputs decimal numbers — 7.3% or 12.8% would be just as valid as 7.0% or 13.0%. But round numbers dominate.

ModelMargin at whole % (x.0%)CAGR at whole % (x.0%)
GPT99%91%
Grok90%82%
Claude85%78%
Gemini75%51%
DeepSeek66%55%

99% of GPT's margin assumptions are whole numbers: 14.0%, 32.0%, 45.0%. Not 14.3% or 31.8%. Almost never.

Round number bias is a well-documented phenomenon among human analysts. Herrmann and Thomas (2005) showed that analysts systematically round EPS forecasts to the nearest five cents. Boulland and Dessaint (2017) documented the same pattern in price targets. The reason is cognitive: humans estimate rather than calculate, and estimation produces round numbers because precise decimals would imply false accuracy.

LLMs do exactly the same thing. They don't compute margins from financial statement line items — they estimate them linguistically, drawing on the same literature where human analysts use round numbers. The training data's rounding habits are inherited.

The one exception is WACC. GPT's rounding rate drops to 10% for this parameter. The likely reason: the prompt provides the risk-free rate with decimal precision (3.0% for Finland, 4.5% for the US), anchoring the model to a specific number rather than a round estimate. This suggests that numerical anchoring in the prompt reduces rounding — a design choice that affects output precision.

When an LLM says "margin 14%", it doesn't mean "I analysed the business and arrived at 14.0%." It means "about fourteen percent feels right."

Five Models, One Worldview

Our previous article examined how much individual models change their minds over time. This time we look from the opposite direction: how much do models differ from each other?

Model PairCorrelation
Gemini / Grok0.954
Claude / Grok0.953
DeepSeek / Grok0.948
Claude / Gemini0.936
Claude / DeepSeek0.925
GPT / Gemini0.852
Claude / GPT0.821
GPT / DeepSeek0.810

Every pair correlates above 0.81. The strongest (Gemini/Grok at 0.954) are near-identical.

The models don't copy each other in real time — they never see each other's outputs. Yet they converge. Three factors explain this:

Same input. Identical Yahoo Finance data for all. When starting data is the same, convergent outputs are logical.

Same training corpus. LLMs are trained predominantly on the same internet data: analyst reports, financial news, earnings analyses. Similar sources produce similar views.

Rounding pulls toward the same point. When all models anchor to round numbers and the assumption space is bounded (by sector profile constraints), few distinct values remain. If margin can be 12%, 13%, 14%, or 15% and every model rounds, convergence is mechanical.

In finance research, analyst herding is well-documented (Welch 2000, Hong, Kubik & Solomon 2000). LLMs converge through a different mechanism — not social pressure but shared training data — yet the outcome is the same: apparent independence, actual conformity.

Where the Real Differences Lie

If correlation exceeds 0.81, the models don't differ on what they think about individual stocks. The differences are in three other dimensions:

Bias level. Claude sees the universe as roughly +1% undervalued on average, DeepSeek as -4.4% overvalued. But on individual stocks they largely agree: if Claude considers Nokia overvalued, so does DeepSeek. The difference is in magnitude, not direction.

Sector-specific risk assessment. For telecom, all models assign WACC between 6.6% and 7.0% — practically identical. For financials, DeepSeek assigns 11.5% while Claude assigns 10.3%. A full percentage point difference that compounds meaningfully through the DCF calculation. DeepSeek is systematically more conservative in its risk assessment, which partly explains its persistent bearish bias.

Temporal behaviour. Gemini and Grok form a "cluster" that moves together — when Gemini reverses direction on a stock, Grok follows within 1–2 days (7 instances in 34 trading days). Claude is the most independent: it rarely leads and never follows. No one follows GPT because its reversals are too sudden to be informative — its largest single-day swing is 84 percentage points.

What This Means

Consensus is more valuable than any single model, but it is not five independent views. It is one view with five small variations. Correlation of 0.81–0.95 means the "panel of independent analysts" is closer to "a group of like-minded colleagues, one of whom is somewhat erratic."

Round numbers are a warning sign. When a model outputs "CAGR 10%, margin 15%, WACC 9%", that is not an analytical conclusion — it is a heuristic estimate. Precision doesn't improve by adding more models. It improves by improving each model's ability to produce non-round estimates. Numerical anchoring in the prompt (as demonstrated by the WACC exception) is one lever.

The dispersion signal is real, even though the consensus is more uniform than it appears. When models disagree (Nokia dispersion 10.2%, Meta 13.5%), it reflects genuine uncertainty in the assumption space — not just random noise. Dispersion is a signal; consensus is a filtered version of one shared worldview.

---

Next week brings Q1/2026 earnings: Elisa (Nasdaq Helsinki, telecom) on Monday, Tesla on Tuesday, Nokia on Thursday. For the first time we will see whether models react to fresh financial data — and whether those round numbers become more precise when real results are available.

---

AI Investor Barometer tracks daily how 5 AI models form stock valuation estimates — and where they diverge. This is an experimental research tool, not investment advice.

Want these insights weekly?
Subscribe to AI Signals →