Back to dashboard

Questions & Answers

Common questions about AI Investor Barometer — how it works, what the numbers mean, and what it is (and isn't) designed to do.

Methodology & Calculation

What is AI Investor Barometer?
An experimental observatory that tracks how 5 independent AI models (GPT, Claude, Gemini, DeepSeek, Grok) form stock valuation assumptions. A deterministic valuation engine then computes estimates from those assumptions. The site visualizes where the models agree, disagree, and how their views change over time. It is not an investment advisory service.
Why do you use a DCF model instead of P/E multiples?
DCF forces each AI model to explicitly state its assumptions about growth, profitability, and risk — rather than just picking a comparable multiple. This makes differences between models transparent and measurable. A forward P/E cap is applied as a safety check to prevent extreme valuations.
Why do financial companies use a different model?
For banks and insurance companies (Sampo, Nordea, JPMorgan, Berkshire), debt is a raw material of business, not a financing item. Traditional FCFF-DCF produces systematically misleading results for these. We use an Excess Return on Equity model instead, which is based on return on equity relative to its cost.
What is the 'deterministic calculator' and why does it matter?
All five AI models feed their assumptions into exactly the same mathematical formula. This means any differences in estimates come purely from the models' different views — not from one model 'calculating better'. Without this separation, comparing models would be impossible.
How is the discount rate (WACC) determined?
WACC is built from market-specific and sector-specific components. The risk-free rate is 3.0% for Finland and 4.5% for the USA. Equity risk premium is 4.5% (Finland) and 5.0% (USA). The AI model estimates a company-specific beta component. Sector-specific WACC ranges guide the models (e.g., Telecom 6.5–7.5%, Technology 9–11%).
What is 'Bayesian calibration' (shrinkage)?
The final estimate is a blend of 70% DCF model output and 30% analyst consensus. This 'shrinkage' reduces the systematic bias that DCF models tend to have (usually too pessimistic). The pre-calibration DCF signal is preserved separately for the AI indices (ACDI, ADI).
What does 'Agreement score' mean?
It measures how many of the 5 models agree on the direction — whether a stock is above or below the current price. A score of 1.00 means all models point the same way. A low score (e.g., 0.65) indicates significant disagreement between models.
What is terminal growth and how is it set?
Terminal growth is the assumed long-term growth rate after the explicit forecast period. It is NOT set by AI models — it is deterministic, based on a market × sector matrix (e.g., Finnish technology 2.0%, US healthcare 2.5%). This prevents LLMs from guessing unrealistic long-term rates.

Indices & Metrics

What does ACDI (AI Valuation Gap) measure?
The median valuation gap across all AI model estimates versus market prices for the entire stock universe. A negative value means models see stocks as overpriced on average relative to their calculated values.
What does ADI (Model Disagreement) measure?
How much the AI models disagree with each other — not their relationship to the market price. High ADI can indicate uncertainty, but also that one model is particularly aggressive or conservative on a given day.
What does ANM (Sentiment Shift) measure?
The day-over-day change in the models' collective view. It does not describe an absolute level but the dynamics — whether AI models are moving toward a more optimistic or pessimistic direction.
Why do all models currently show a negative bias?
This is one of the project's most interesting findings. Possible reasons: DCF models are sensitive to discount rates, and a high-interest-rate environment pushes values down. Additionally, models' training data may reflect historically lower valuation levels. We track this systematically on the Bias Index page.
What does 'Calibration score' actually mean?
It measures how close a model gets to analyst consensus without external corrections. A higher score does NOT mean the model is 'right' — consensus can be wrong too. It primarily shows how well calibrated a model is relative to the market's general view.

AI Model Behavior

Why is GPT systematically the most bearish?
We don't know exactly — this is one of the project's research questions. Possible explanations: GPT may weight risks more conservatively, or its training data reflects a different market cycle. The systematic bias is documented and visible in the Bias Index history.
Is Claude 'better' because its bias is smallest?
Not necessarily. Low bias relative to market price can mean good calibration — or it can mean the model has learned to 'follow the price' rather than assess fundamentals. The Accuracy page tracks which model best predicts direction over time.
Why do estimates change day to day even when fundamentals don't?
AI models are not deterministic — the same input can produce slightly different output on different runs. Additionally, the pipeline fetches the latest market data daily, which can affect margin or growth assumptions. The 'Steady' vs 'Reactive' classification in Model Traits describes this variation at the model level.
Can models influence or learn from each other?
No. Each model receives exactly the same input and runs completely independently. There is no communication between models. Results are aggregated only after the calculation.
What specific AI model versions are used?
Currently: GPT-4o-mini (OpenAI), Claude Sonnet 4.6 (Anthropic), Gemini 2.5 Flash (Google), DeepSeek Chat (DeepSeek), and Grok-3 (xAI). Model versions are updated as providers release new capabilities. Version changes are documented in the Changelog.
How are the prompts structured? Is this zero-shot, few-shot, or RAG?
Pure zero-shot structured output. Each model receives the same prompt containing: company financials (income statement, balance sheet, cash flow from Yahoo Finance), recent news headlines (IR feeds + Yahoo News), analyst consensus target price, sector-specific guidance (margin ranges, growth expectations, WACC context — 10 sector profiles), and instructions to output 3 valuation parameters (revenue CAGR, EBIT margin target, WACC). Temperature is set to 0.4 for all models. No few-shot examples, no retrieval-augmented generation. The prompt does not contain previous model outputs or historical AI estimates.
What happens under the hood every day?
Every weekday at 06:00 EET, an automated pipeline: (1) fetches fresh financial data from Yahoo Finance for all tracked companies — income statements, balance sheets, analyst consensus, and news headlines; (2) sends the same data to all 5 AI models in parallel, each producing valuation assumptions (revenue growth, EBIT margin, WACC); (3) runs a deterministic DCF engine that calculates target prices from these assumptions; (4) aggregates consensus, disagreement scores, and market indices; (5) checks for alerts and generates a weekly report. The full methodology is documented on the Methodology page.

Data & Limitations

How fresh is the data?
The pipeline runs automatically on business days (Mon–Fri). Spot prices and market data come from the previous trading day's closing prices. Analyst consensus may update with a lag depending on the source. On weekends and holidays, the site shows the latest available data.
How reliable are the model estimates?
The estimates are experimental and should not be used as investment advice. In the early stage, all 5 models show systematic bearish bias (undervaluing stocks relative to market prices). Model validity is approximately 100% (all models produce valid outputs daily). However, directional accuracy — whether models correctly predict if a stock goes up or down — requires significantly more data to assess reliably. Early-stage accuracy metrics are available on the Accuracy page. The key value is not in individual estimates but in comparing how different AI models form views and where they disagree.
Why do some companies show fewer than 5 models?
Some models may return an invalid or unparseable response for a particular company on a given day — for example, if a model refuses to estimate or produces a number that fails validation checks. Such results are automatically rejected. Invalid models fall back to the spot price and are excluded from the consensus.
Why only 24 companies?
The universe is intentionally limited: 12 Finnish OMXH stocks and 12 US S&P 500 stocks. The project is in active development, and a smaller universe makes building, configuring, and quality assurance easier and more affordable. Each company requires sector-specific configuration and validation. Expansion is planned as the methodology matures.
Why do some companies show extreme gaps (+40% or -44%)?
Large gaps usually have a structural explanation. For example, DCF models tend to undervalue high-P/E growth stocks (like mega-cap US tech) and may overvalue defensive stocks with stable cash flows. Sector-specific prompt instructions and engine calibration are being developed to reduce these extremes over time.
What do the colored dots next to model names mean?
Each AI model has a signature color for consistent identification across all charts and tables. GPT is cyan, Claude is orange, Gemini is purple, DeepSeek is green, and Grok is yellow. The same colors are used in the Model Spread chart, Bias Index, and model detail pages.
What does σ (sigma/dispersion) mean?
Sigma (σ) measures how spread out the model estimates are relative to their median. A low sigma means models agree closely on a value. A high sigma means wide disagreement. It is calculated as the interquartile range divided by the median.

Use & Disclaimer

Is the data free to use?
The website is freely accessible. Data is generated by AI models and should not be treated as financial research. If you wish to reference the data in academic or journalistic work, please credit 'AI Investor Barometer (aiinvestorbarometer.com)'. A public API is under consideration.
Can I use this for investment decisions?
The tool is designed for comparing AI models and observing systematic patterns — not for buy or sell recommendations. If you use the data in your own analysis, note that the data history is short and models can be systematically wrong. See our Terms of Use for full details.
Is this regulated investment research under MiFID II?
No. The tool does not meet the definition of regulated investment research: it does not give recommendations, it does not comment on buy or sell decisions, and it is not aimed at making individual investment decisions. It is a comparison tool for analyzing AI model behavior.
How is the service evolving?
AI Investor Barometer is developed incrementally as data accumulates. The valuation engine, prompts, and model calibration are improved based on empirical findings from production data. Recent milestones include Engine v7 with sector-specific ROIC calibration, temperature harmonization across all models, and 10 dedicated sector prompts. You can follow development progress on the Changelog page. To stay informed about new features and findings, subscribe to email updates — we send weekly AI Signals reports and notify about significant platform changes.

Research & Preliminary Findings

Why are all models currently bearish — is this a bug?
No. All 5 models consistently produce estimates below market prices (median gap approximately −8% to −17%). This appears to be a combination of two factors: (1) DCF models are structurally sensitive to discount rates, and the current high-interest-rate environment pushes calculated values down; (2) LLMs may weight downside risks more heavily than upside opportunities in their assumptions. We track this systematically and consider it one of the most important research questions.
Why are models more bearish on US stocks than Finnish stocks?
All models show 5–7 percentage points more negative bias for US stocks compared to Finnish stocks. The likely explanation is that US mega-cap stocks trade at significantly higher P/E multiples (often >30×), which DCF models struggle to justify through cash flow projections alone. Finnish stocks, trading at lower multiples, are closer to what DCF models naturally produce. This suggests DCF may have a structural limitation for high-growth, high-multiple markets.
Do models have distinct 'personalities'?
Yes — this is one of the clearest findings. Claude behaves like a cautious senior analyst: low volatility (1.6%/day change), smallest bias, highest consistency. GPT is conservative and risk-focused: highest bearish bias, moderate volatility. Gemini is the most impulsive: largest day-to-day swings. DeepSeek follows the group efficiently at the lowest cost. Grok is fast and bold with the highest cap rate. These 'personalities' have been stable across all 13+ trading days observed.
Can high agreement (1.00) be misleading?
Yes. When all 5 models show perfect agreement (score = 1.00), it can mean two different things: genuine consensus based on aligned views, or artificial convergence caused by safety caps forcing all models into the same band. For example, if the analyst target price cap limits all models to the same range, they appear to 'agree' even though their underlying DCF estimates may have been very different. We are developing ways to distinguish genuine from cap-induced agreement.
Whose analysis style do LLMs actually follow?
This is an open research question. LLMs are trained on vast amounts of financial text — analyst reports, news, academic papers, forums. They may have learned a blend of institutional analyst conventions (like those of Goldman Sachs or Morgan Stanley) mixed with retail investor perspectives. One hypothesis: their tendency toward bearish estimates may reflect the conservative bias common in published analyst reports, where overestimating (being 'too bullish') carries more reputational risk than underestimating.
Are LLMs actually good at stock valuation?
Too early to tell definitively — we need at least 90 days of data for meaningful statistical analysis. However, preliminary observations suggest that: (1) the 5-model consensus is more stable than any single model; (2) LLMs struggle most with high-P/E growth stocks; (3) they perform better on European stocks with lower multiples; (4) their value may lie not in accurate price targets but in revealing systematic patterns and biases in how AI 'thinks' about value.
How do models react to new information like earnings reports?
We don't have earnings season data yet (first Q1/2026 results expected April–May). This will be a crucial test: do some models react faster to new data while others maintain their prior view? The 'Steady' vs 'Reactive' model trait classification predicts that Claude will be slow to change while Gemini may overreact. We plan to publish specific analysis after the first earnings cycle.
Methodology · About · Terms · Privacy