AI Signals — Weekend Read: Claude vs GPT — Two AI Analysts, Two Very Different Views

2026-03-21editorial · written by Claude

Summary

Claude (Sonnet 4.6) sees stocks as roughly fairly valued (−1.8% avg bias); GPT (4o-mini) sees them as significantly overpriced (−13.1%)
GPT’s bearish tilt nearly doubles for US stocks (−16.1%) vs Finnish stocks (−10.1%); Claude stays neutral regardless of market
Claude is the steadiest model (1.5%/day change) but fails JSON parsing more often; GPT is reactive (3.0%/day) but more reliable in production
14 days of data across 24 stocks: if you want to understand how AI thinks about value, one model is not enough

Claude vs GPT: Two AI Analysts, Two Very Different Views on Stocks

After 14 trading days and over 1,500 model outputs, a clear pattern has emerged: Claude and GPT approach stock valuation like two fundamentally different types of analysts. One is cautious and steady; the other is consistently pessimistic. Neither is objectively "right" — but their divergence tells us something important about how large language models form financial judgments.

The Numbers

Metric	Claude (Sonnet 4.6)	GPT (4o-mini)
Average valuation gap	−1.8%	−13.1%
Median gap	−7.2%	−15.2%
Bullish calls (% of estimates)	40.4%	21.5%
Daily estimate volatility	1.5%/day	3.0%/day
Personality trait	Steady	Reactive
Bias classification	Neutral	Bearish

The gap between these two is striking. Claude sees the market as roughly fairly valued, with a slight lean toward overpriced. GPT sees it as significantly overpriced — by an average of 13%. Over 14 days, this gap has been remarkably stable. It is not noise.

The Steady Hand vs The Bear

Claude behaves like a cautious senior analyst. Its estimates change slowly (1.5% per day), it produces the smallest bias of any model, and it is bullish on nearly 4 in 10 stocks. When Claude calls a stock overpriced, the gap tends to be modest. When it calls a stock underpriced, it does so with conviction — ORNBV (+42%), TIETO (+47%), GOOGL (+37%).

GPT behaves like a risk-focused compliance officer. It is bearish on nearly 80% of stocks, and deeply bearish on many. US mega-caps get hit hardest: NESTE −43%, XOM −29%, NOKIA −22%, TSLA −14%. GPT rarely sees upside — only 21.5% of its estimates sit above the market price.

The Finnish vs American Divide

Both models are more bearish on US stocks, but the magnitude differs:

Market	Claude	GPT	Gap
Finland (OMXH)	−0.8%	−10.1%	9.3pp
USA (S&P 500)	−2.7%	−16.1%	13.4pp

GPT's bearish tilt nearly doubles for US stocks. Claude stays relatively neutral regardless of market. This suggests GPT's pessimism is partly structural — it may be responding to the higher valuation multiples of US mega-caps, which DCF models inherently struggle to justify.

Reliability: A Paradox

Here is an irony: Claude is the "smarter" model in most benchmarks, but it fails more often in production. Claude produces invalid JSON (unparseable responses) for roughly 2-4 companies per day — consistently the same companies (financial sector stocks like Nordea and Sampo). GPT, despite being a smaller model, has a higher validity rate.

This paradox may reveal something about model architectures: more capable models attempt more complex outputs, which increases the chance of formatting errors. GPT's simpler, more rigid responses parse cleanly — even if the estimates themselves are more extreme.

What Does This Mean?

Neither model is "better." They are different instruments measuring different things:

Claude may be more useful as a calibration anchor — its estimates are closest to analyst consensus, making it a good baseline for comparison.
GPT may be more useful as a contrarian signal — when even GPT turns bullish on a stock, that signal carries weight precisely because GPT is almost never bullish.

The fact that these two models, given identical data, produce such different views is itself a finding. It suggests that LLM "reasoning" in financial contexts is not a converged consensus — it is a reflection of each model's training data, reinforcement tuning, and implicit risk preferences.

We are only 14 days into this experiment. Many of these patterns may evolve, especially as we approach the first earnings season in April. But so far, the message is clear: if you want to understand how AI thinks about value, looking at one model is not enough.

Data: AI Investor Barometer, 14 trading days (3–20 March 2026), 24 companies × 5 models. This analysis describes AI model behavior patterns — it is not investment advice.