AI Signals — Weekly Model Behavior Summary

How five AI models' estimates and biases change — summarized weekly.

AI SIGNALS21 reports

2026-05-18 → 2026-05-22claude

AI Signals — Week 21, May 18–22, 2026

Every single sector lost model favor this week — a rare, uniform bearish sweep that suggests macro anxiety is overriding stock-specific analysis.
Grok turned sharply more bearish, its bias dropping **3.4 percentage points** in a single week — the most dramatic single-model shift in the dataset.
**NOKIA** received the largest target price revision of the week at **+15.7%**, yet still sits **38% below spot** — a dispersion story that exposes deep model disagreement.
DeepSeek remains the cost anomaly of the panel: **17x cheaper than Claude** per thousand valuations while matching it on output validity and confidence.

2026-05-17editorial · written by Claude

AI Signals — Weekend Read: Which AI is best at investing?

Calibration and backtesting are still in progress (Engine v8 / Prompt v11 due late May; meaningful 3-month accuracy data lands July 2026, 12-month March 2027). What we can compare today is observable behaviour, not predictive performance
Behavioural wins by category: Claude is best calibrated (raw output clipped only 27 % of the time vs 42 % for Grok), DeepSeek is 100 % reliable and the cheapest ($2.07/1K), Grok is the fastest (7.7s end-to-end), Gemini is the most willing to take extreme calls, and GPT is the only model whose answers are partially uncorrelated with the rest
The panel collapses statistically: effective number of independent estimators dropped from 1.21 (early March) to 1.10 (early May). Herd is intensifying, not loosening
But part of that 'agreement' is engine-produced: on 19 % of company-days all five models hit a cap (pre-cap raw spread on those days averages 15 pp, post-cap 0). On 41 % of days at least three models are capped. The site already shows raw vs calibrated agreement per company with a flag when the consensus is partly mechanical
Bonus finding: every model's daily TP autocorrelation is negative (−0.14 to −0.31). AI does not anchor on yesterday's view — opposite of the +0.3 to +0.5 anchoring well documented in human analysts. Whether this helps or hurts predictive accuracy is a question only the 3- and 12-month backtests will answer

2026-05-11 → 2026-05-15claude

AI Signals — Week 20, May 11–15, 2026

GPT is the only model bullish on the market this week, with a +2.8% average upside — every other model sees stocks as overvalued.
Gemini made the sharpest sentiment reversal of the week, swinging from +0.9% to -0.6% bias, a -1.5 percentage point shift in a single week.
Technology lost the most model favor of any sector, dropping 3.2 points to -1.1% consensus upside — the models are quietly souring on Big Tech.
DeepSeek prices 115 valuations for just $0.25, making it 19x cheaper than Claude while maintaining comparable output volume — the cost gap is becoming impossible to ignore.

2026-05-10editorial · written by Claude

AI Signals — Weekend Read: Same earnings, five readings

Q1 2026 is the observatory's first fully observed earnings season. Across 18 reporters, mean five-model spread did not shrink: 6.3pp before earnings, 6.1pp after. Seven companies tightened, eight widened, three held still
Sampo is the dramatic exception — 16pp pre-earnings spread collapsed to 2pp on May 8. But four of five models were forced onto the analyst-TP floor (7.57 €); the consensus is partly engine-produced, not genuine agreement
Microsoft, P&G, METSO and UPM all widened post-earnings. Reports with new capex programs or shifting assumptions split AI models the same way Stickel & Diether documented they split human analysts
Direction hit rate: AI's pre-earnings consensus matched the stock's 1-day reaction in only 6 of 18 cases (33%), below the 50% coin flip. When AI predicted upside, the stock rose just 1 time in 7. The sample is thin but the pattern recurs
Agreement is not accuracy. Five models can converge near truth, far from it, or pulled together by the same anchor. Real accuracy emerges in July when 3-month post-earnings prices are available

2026-05-04 → 2026-05-08claude

AI Signals — Week 19, May 04–08, 2026

GPT posted the sharpest sentiment reversal of the week, swinging from +4.5% to +1.1% average upside — the largest single-model bias collapse in the dataset.
Gemini bucked every trend by turning bullish, flipping from -1.4% to +0.9% while all other models grew more pessimistic.
Nokia's consensus target price fell 10.4% in a single week yet still carries the widest dispersion in the universe at 0.204 — the models cannot agree on what it's worth, only that it's falling.
DeepSeek runs the entire 23-company universe for $0.25 — nineteen times cheaper than Claude — yet produces structurally identical terminal growth assumptions, raising hard questions about what premium inference actually buys.

2026-05-03editorial · written by Claude

AI Signals — Weekend Read: Same prompt, five answers

On May 1, five AI models valued Meta on identical inputs. The spread between highest and lowest target price was 62 percentage points — and it is the rule, not the exception
Where the prompt locks the answer (WACC mid-point), models comply within 0.4pp; where it leaves slack (CAGR), they diverge by 2.6pp — model character lives in the slack
GPT calls 30-day direction correctly 63% on US stocks but only 44% on Finnish ones (z=4.3, p<0.001). Sector mix, market-cap, coverage, and training-data density all confound the geographic story
AI consensus moved from −15% to −5% over 60 days — but ~80% of that is engine recalibration (v6, v7, prompt v10), not learning. DeepSeek's residual −9% pessimism is the genuinely informative residual
Across 44 days, five LLMs are not five independent estimators — they are five recognisable personalities. Standardisation makes the differences visible, it does not erase them

2026-04-27 → 2026-05-01claude

AI Signals — Week 18, Apr 27–May 01, 2026

GPT swung from near-neutral to the most bullish model in the panel, posting a +4.2 percentage point bias shift in a single week — the largest move of any model this year.
Nokia's consensus target price surged 18.4% week-on-week yet the stock still sits 29% below that target, a gap that exposes deep model disagreement about a turnaround that may or may not be happening.
DeepSeek remains the panel's perma-bear at -4.6% average upside, costs 19x less than Claude per thousand valuations, and has not changed its mind in two weeks — make of that what you will.
Healthcare is the only sector where models collectively see meaningful upside (+22.5%), while energy has deteriorated further to -23.5% — the widest sector gap in the dataset.
Apple and Tesla are the panel's most convicted sells: both carry zero dispersion across models, meaning every AI agrees the stocks are overvalued — rare unanimity that is itself a signal worth scrutinizing.

2026-04-20 → 2026-04-24claude

AI Signals — Week 17, Apr 20–24, 2026

Four of five models turned more bearish this week — but GPT broke ranks, swinging from -0.7% to +0.3% average upside, the only model to move in the opposite direction.
Technology lost model favor despite remaining the most-loved sector, with consensus upside slipping from 7.2% to 4.5% — a quiet but meaningful retreat.
DeepSeek continues to deliver 100% parse validity at $2.29 per thousand valuations, roughly 18x cheaper than Claude while producing structurally coherent output every single time.
Wärtsilä earned the week's most persistent model conviction signal: four consecutive days of rising consensus targets with zero down-days and a 6.6% range — unusual discipline for an industrial name.

2026-04-19editorial · written by Claude

AI Signals — Weekend Read: Do AI Models Think — or Just Pattern-Match?

GPT rounds 99% of its margin assumptions to whole numbers — the same cognitive bias documented in human analysts (Herrmann & Thomas 2005)
All five models correlate 0.81–0.95 despite different architectures — the 'panel of independent analysts' is closer to a group of like-minded colleagues
Gemini and Grok form a temporal cluster: when one reverses direction, the other follows within 1–2 days. Claude is the most independent model
WACC is the only parameter where rounding drops (to 10%) — because the prompt provides a decimal anchor. Prompt design directly affects output precision
Five-model consensus is more than one opinion but less than five. Dispersion remains the most honest signal

2026-04-13 → 2026-04-17claude

AI Signals — Week 16, Apr 13–17, 2026

Every single AI model turned bearish this week — a synchronized sentiment collapse that hasn't been seen in this dataset before.
Grok made the sharpest pivot, swinging from +2.1% bullish bias to -2.6%, a shift of nearly 5 percentage points in one week.
Technology lost half its model-assigned upside in seven days, falling from +14.6% to +7.2%, yet still leads all sectors — which tells you how bad everything else looks.
DeepSeek remains the cost anomaly of the AI analyst world: 18x cheaper than Claude per thousand valuations, with comparable output validity.

2026-04-12editorial · written by Claude

AI Signals — Weekend Read: How Often and How Much Do AI Models Change Their Minds About Stocks?

Claude and Grok are the most stable: uncapped estimates unchanged on 63–64% of days. GPT produces >10% daily moves once a week
META is every model's problem child — GPT's temporal σ is 35.2%, more than double any other stock. NVDA's CAGR assumption range spans 9–55%
Technology sector runs 3× more volatile than healthcare in DCF terms — a structural property of the model, not a quality issue
DeepSeek has never crossed zero bias in 29 trading days. Training data pessimism, anchoring, or correct market view? We don't know yet
Temperature change from 1.0 to 0.4 shifted GPT's median bias from -23% to near-neutral overnight — one of the first empirical observations of the temperature-sentiment link in financial LLMs

2026-04-06 → 2026-04-10claude

AI Signals — Week 15, Apr 06–10, 2026

Every single AI model turned more bearish this week — GPT led the retreat with a bias shift of -5.6 percentage points, the sharpest single-week sentiment collapse in the dataset.
DeepSeek is the only model with a negative average upside (-1.6%), making it the lone structural bear in a panel of cautious bulls.
Energy staged the week's most dramatic rehabilitation: model consensus upside improved by +9.2 points, yet the sector still sits at a deeply negative -26.6% — rescued from the basement, not yet off the floor.
Gemini's 82.6% validity rate is a persistent reliability gap that no amount of CAGR optimism can paper over — one in six valuations simply fails to parse.
The models collectively see XOM's consensus target price jumping +48.9% week-on-week, the single largest target revision in the dataset — a number that raises more questions than it answers.

2026-04-04editorial · written by Claude

AI Signals — Weekend Read: One Month In — What 2,760 AI Valuations Taught Us

After 24 trading days and 2,760 estimates, we cannot separate methodology effects from genuine model behavior — every engine or prompt change moved the numbers
Five distinct model personalities emerged: Claude is the only optimist (+1.0%), GPT has best directional accuracy (52.7%) but highest volatility, DeepSeek achieves 100% reliability at 1/15th the cost
XOM dropped 31% in one day after all models reacted to Iran de-escalation signals — while 9 major banks raised their price targets. DCF amplifies short-term sentiment for cyclical stocks
Directional accuracy is 47-53% at 1-day horizon — statistically a coin flip. The real test begins at 3 months (July) and 12 months (March 2027)
Model-specific calibration coming in late April when 30 days of v7 data is available

2026-03-30 → 2026-04-03claude

AI Signals — Week 14, Mar 30–Apr 03, 2026

Four out of five models turned more bullish this week, yet the average consensus upside across 23 companies barely moved — the optimism is concentrated, not broad.
DeepSeek flipped from mildly bullish to the panel's only bear, even as every other model grew more constructive: a rare and meaningful divergence.
ExxonMobil's consensus target price collapsed by 30% in a single week — the sharpest single-name revision in the dataset's history and a stress test the framework did not handle gracefully.
Gemini's bullish bias jumped by 3 full percentage points week-on-week, the largest single-model shift recorded, while its terminal growth rate remains locked at exactly 2.00% for every single company it covers.
DeepSeek prices 115 valuations for $0.26 — seventeen times cheaper than Claude for outputs that, this week at least, told a meaningfully different story.

2026-03-28editorial · written by Claude

AI Signals — Weekend Read: When the Market Moves Toward AI

The gap between AI model estimates and market prices narrowed from -13% to -4% over 20 trading days
Two simultaneous factors: the market declined (MSFT -15%) AND our methodology improved (Engine v6→v7)
We cannot separate these effects — this is an observation, not evidence of predictive power
Model personality rankings unchanged for 20 days: Claude least bearish, GPT most bearish
Real test ahead: Q1 2026 earnings season will show if models react to new financial data

2026-03-23 → 2026-03-27claude

AI Signals — Week 13, Mar 23–27, 2026

GPT staged the most dramatic sentiment reversal of the year, swinging from a -7.3% bearish bias last week to +6.0% bullish — a 13.3-point lurch that dwarfs every other model's move.
Technology sector model consensus surged by 8.3 points this week, the largest sectoral shift in the dataset, yet the underlying stocks remain largely priced above model targets.
DeepSeek costs just $2.23 per thousand valuations versus Claude's $39.25 — a 17x price gap that raises hard questions about what the premium actually buys.
Nokia is the week's only trend stock, posting three consecutive days of rising model consensus within a 7.8% target-price range — unusually tight conviction for a name this contested.

2026-03-21editorial · written by Claude

AI Signals — Weekend Read: Claude vs GPT — Two AI Analysts, Two Very Different Views

Claude (Sonnet 4.6) sees stocks as roughly fairly valued (−1.8% avg bias); GPT (4o-mini) sees them as significantly overpriced (−13.1%)
GPT’s bearish tilt nearly doubles for US stocks (−16.1%) vs Finnish stocks (−10.1%); Claude stays neutral regardless of market
Claude is the steadiest model (1.5%/day change) but fails JSON parsing more often; GPT is reactive (3.0%/day) but more reliable in production
14 days of data across 24 stocks: if you want to understand how AI thinks about value, one model is not enough

2026-03-16 → 2026-03-20claude

AI Signals — Week 12, Mar 16–20, 2026

Every single AI model turned meaningfully more bullish this week — a synchronized shift that says more about shared training data than market fundamentals.
GPT remains the most pessimistic model at **-7.4%** average upside, yet it just recorded its largest weekly bias swing of any model at **+9.6 percentage points**.
Neste is the week's most brutal consensus call: models price it at **€16.66** against a spot of **€29.70**, a **-44%** implied downside that no analyst desk would publish without a disclaimer.
DeepSeek delivers full output quality at **$2.19 per thousand valuations** — roughly 16x cheaper than Claude — making the cost-per-insight gap between frontier models increasingly hard to justify.
Technology is the only sector where models see genuine upside (**+6.3%**), yet even there the conviction is shallow; healthcare leads on raw numbers but the sample is just two companies.

2026-03-09 → 2026-03-13claude

AI Signals — Week 11, Mar 09–13, 2026

Every model thinks the market is overvalued — average upside across all five models is negative, ranging from GPT's brutal **-17%** verdict to Claude's relatively sanguine **-3%**, a 14-percentage-point gap that tells you more about model personality than market reality.
DeepSeek delivers perfect parse reliability at **100% validity** for a cost of **$2.10 per thousand valuations** — roughly 16x cheaper than Claude, which raises uncomfortable questions about what you're actually paying for.
Gemini's terminal growth rate is locked to a suspiciously tight band with a standard deviation of just **0.09%**, suggesting the model has hardwired a near-constant assumption rather than reasoning from first principles on each company.
GPT is the only model to peg terminal growth at exactly **2.0%** with zero standard deviation across 115 valuations — a statistical signature that is not analysis, it is a default setting masquerading as judgment.

2026-03-07editorial · written by Claude

AI Signals — Weekend Read: What Five AI Models Taught Us About Stock Valuation

Early data from 460 valuations over 4 days: all five LLMs lean bearish, with average bias from -2.8% to -13.8% vs analyst consensus
GPT outputs exactly 2.0% terminal growth for every company (σ=0.00) — a prompt fallback adopted as a final answer, not a system cap
Five mid-tier AI models run in parallel for $45/month — constrained to text-only reasoning with no tools or web browsing
Finnish stocks appear well-calibrated (-3.3%) but US large-caps show -12.7% gap — hypotheses to track as data accumulates

2026-03-02 → 2026-03-05claude

AI Signals — Week 10, Mar 02–05, 2026

Every model called the market overvalued this week — the most bearish cross-model consensus since this platform launched, with average downsides ranging from -3% to -15% across all five models.
GPT's terminal growth rate is locked at exactly 2.00% with zero standard deviation across 63 valuations, a statistical impossibility in genuine analysis that exposes hard-coded assumptions.
DeepSeek delivers the only perfect validity score (100%) at a cost of $2.03 per thousand valuations — roughly 16x cheaper than Claude while expressing greater conviction with a 0.65 confidence average.
Tesla's consensus target price of $253 against a spot of $406 represents the widest absolute bearish call of the week, with zero dispersion across models — a rare moment of unanimous AI pessimism.