Question 1

What is AI Investor Barometer?

Accepted Answer

An experimental observatory that tracks how 5 independent AI models (GPT, Claude, Gemini, DeepSeek, Grok) form stock valuation assumptions. A deterministic valuation engine then computes estimates from those assumptions. The site visualizes where the models agree, disagree, and how their views change over time. It is not an investment advisory service.

Question 2

Why do you use a DCF model instead of P/E multiples?

Accepted Answer

DCF forces each AI model to explicitly state its assumptions about growth, profitability, and risk — rather than just picking a comparable multiple. This makes differences between models transparent and measurable. A forward P/E cap is applied as a safety check to prevent extreme valuations.

Question 3

Why do financial companies use a different model?

Accepted Answer

For banks and insurance companies (Sampo, Nordea, JPMorgan, Berkshire), debt is a raw material of business, not a financing item. Traditional FCFF-DCF produces systematically misleading results for these. We use an Excess Return on Equity model instead, which is based on return on equity relative to its cost.

Question 4

What is the 'deterministic calculator' and why does it matter?

Accepted Answer

All five AI models feed their assumptions into exactly the same mathematical formula. This means any differences in estimates come purely from the models' different views — not from one model 'calculating better'. Without this separation, comparing models would be impossible.

Question 5

How is the discount rate (WACC) determined?

Accepted Answer

WACC is built from market-specific and sector-specific components. The risk-free rate is 3.0% for Finland and 4.5% for the USA. Equity risk premium is 4.5% (Finland) and 5.0% (USA). The AI model estimates a company-specific beta component. Sector-specific WACC ranges guide the models (e.g., Telecom 6.5–7.5%, Technology 9–11%).

Question 6

What is 'Bayesian calibration' (shrinkage)?

Accepted Answer

The final estimate is a blend of 70% DCF model output and 30% analyst consensus. This 'shrinkage' reduces the systematic bias that DCF models tend to have (usually too pessimistic). The pre-calibration DCF signal is preserved separately for the AI indices (ACDI, ADI).

Question 7

What does 'Agreement score' mean?

Accepted Answer

It measures how many of the 5 models agree on the direction — whether a stock is above or below the current price. A score of 1.00 means all models point the same way. A low score (e.g., 0.65) indicates significant disagreement between models.

Question 8

What is terminal growth and how is it set?

Accepted Answer

Terminal growth is the assumed long-term growth rate after the explicit forecast period. It is NOT set by AI models — it is deterministic, based on a market × sector matrix (e.g., Finnish technology 2.0%, US healthcare 2.5%). This prevents LLMs from guessing unrealistic long-term rates.

Question 9

What does ACDI (AI Valuation Gap) measure?

Accepted Answer

The median valuation gap across all AI model estimates versus market prices for the entire stock universe. A negative value means models see stocks as overpriced on average relative to their calculated values.

Question 10

What does ADI (Model Disagreement) measure?

Accepted Answer

How much the AI models disagree with each other — not their relationship to the market price. High ADI can indicate uncertainty, but also that one model is particularly aggressive or conservative on a given day.

Question 11

What does ANM (Sentiment Shift) measure?

Accepted Answer

The day-over-day change in the models' collective view. It does not describe an absolute level but the dynamics — whether AI models are moving toward a more optimistic or pessimistic direction.

Question 12

Why do all models currently show a negative bias?

Accepted Answer

This is one of the project's most interesting findings. Possible reasons: DCF models are sensitive to discount rates, and a high-interest-rate environment pushes values down. Additionally, models' training data may reflect historically lower valuation levels. We track this systematically on the Bias Index page.

Question 13

What does 'Calibration score' actually mean?

Accepted Answer

It measures how close a model gets to analyst consensus without external corrections. A higher score does NOT mean the model is 'right' — consensus can be wrong too. It primarily shows how well calibrated a model is relative to the market's general view.

Question 14

Why is GPT systematically the most bearish?

Accepted Answer

We don't know exactly — this is one of the project's research questions. Possible explanations: GPT may weight risks more conservatively, or its training data reflects a different market cycle. The systematic bias is documented and visible in the Bias Index history.

Question 15

Is Claude 'better' because its bias is smallest?

Accepted Answer

Not necessarily. Low bias relative to market price can mean good calibration — or it can mean the model has learned to 'follow the price' rather than assess fundamentals. The Accuracy page tracks which model best predicts direction over time.

Question 16

Why do estimates change day to day even when fundamentals don't?

Accepted Answer

AI models are not deterministic — the same input can produce slightly different output on different runs. Additionally, the pipeline fetches the latest market data daily, which can affect margin or growth assumptions. The 'Steady' vs 'Reactive' classification in Model Traits describes this variation at the model level.

Question 17

Can models influence or learn from each other?

Accepted Answer

No. Each model receives exactly the same input and runs completely independently. There is no communication between models. Results are aggregated only after the calculation.

Question 18

What specific AI model versions are used?

Accepted Answer

Currently: GPT-4o-mini (OpenAI), Claude Sonnet 4.6 (Anthropic), Gemini 2.5 Flash (Google), DeepSeek Chat (DeepSeek), and Grok-3 (xAI). Model versions are updated as providers release new capabilities. Version changes are documented in the Changelog.

Question 19

How are the prompts structured? Is this zero-shot, few-shot, or RAG?

Accepted Answer

Pure zero-shot structured output. Each model receives the same prompt containing: company financials (income statement, balance sheet, cash flow from Yahoo Finance), recent news headlines (IR feeds + Yahoo News), analyst consensus target price, sector-specific guidance (margin ranges, growth expectations, WACC context — 10 sector profiles), and instructions to output 3 valuation parameters (revenue CAGR, EBIT margin target, WACC). Temperature is set to 0.4 for all models. No few-shot examples, no retrieval-augmented generation. The prompt does not contain previous model outputs or historical AI estimates.

Question 20

What happens under the hood every day?

Accepted Answer

Every weekday at 06:00 EET, an automated pipeline: (1) fetches fresh financial data from Yahoo Finance for all tracked companies — income statements, balance sheets, analyst consensus, and news headlines; (2) sends the same data to all 5 AI models in parallel, each producing valuation assumptions (revenue growth, EBIT margin, WACC); (3) runs a deterministic DCF engine that calculates target prices from these assumptions; (4) aggregates consensus, disagreement scores, and market indices; (5) checks for alerts and generates a weekly report. The full methodology is documented on the Methodology page.

Question 21

How fresh is the data?

Accepted Answer

The pipeline runs automatically on business days (Mon–Fri). Spot prices and market data come from the previous trading day's closing prices. Analyst consensus may update with a lag depending on the source. On weekends and holidays, the site shows the latest available data.

Question 22

How reliable are the model estimates?

Accepted Answer

The estimates are experimental and should not be used as investment advice. In the early stage, all 5 models show systematic bearish bias (undervaluing stocks relative to market prices). Model validity is approximately 100% (all models produce valid outputs daily). However, directional accuracy — whether models correctly predict if a stock goes up or down — requires significantly more data to assess reliably. Early-stage accuracy metrics are available on the Accuracy page. The key value is not in individual estimates but in comparing how different AI models form views and where they disagree.

Question 23

Why do some companies show fewer than 5 models?

Accepted Answer

Some models may return an invalid or unparseable response for a particular company on a given day — for example, if a model refuses to estimate or produces a number that fails validation checks. Such results are automatically rejected. Invalid models fall back to the spot price and are excluded from the consensus.

Question 24

Why only 24 companies?

Accepted Answer

The universe is intentionally limited: 12 Finnish OMXH stocks and 12 US S&P 500 stocks. The project is in active development, and a smaller universe makes building, configuring, and quality assurance easier and more affordable. Each company requires sector-specific configuration and validation. Expansion is planned as the methodology matures.

Question 25

Why do some companies show extreme gaps (+40% or -44%)?

Accepted Answer

Large gaps usually have a structural explanation. For example, DCF models tend to undervalue high-P/E growth stocks (like mega-cap US tech) and may overvalue defensive stocks with stable cash flows. Sector-specific prompt instructions and engine calibration are being developed to reduce these extremes over time.

Question 26

What do the colored dots next to model names mean?

Accepted Answer

Each AI model has a signature color for consistent identification across all charts and tables. GPT is cyan, Claude is orange, Gemini is purple, DeepSeek is green, and Grok is yellow. The same colors are used in the Model Spread chart, Bias Index, and model detail pages.

Question 27

What does σ (sigma/dispersion) mean?

Accepted Answer

Sigma (σ) measures how spread out the model estimates are relative to their median. A low sigma means models agree closely on a value. A high sigma means wide disagreement. It is calculated as the interquartile range divided by the median.

Question 28

Is the data free to use?

Accepted Answer

The website is freely accessible. Data is generated by AI models and should not be treated as financial research. If you wish to reference the data in academic or journalistic work, please credit 'AI Investor Barometer (aiinvestorbarometer.com)'. A public API is under consideration.

Question 29

Can I use this for investment decisions?

Accepted Answer

The tool is designed for comparing AI models and observing systematic patterns — not for buy or sell recommendations. If you use the data in your own analysis, note that the data history is short and models can be systematically wrong. See our Terms of Use for full details.

Question 30

Is this regulated investment research under MiFID II?

Accepted Answer

No. The tool does not meet the definition of regulated investment research: it does not give recommendations, it does not comment on buy or sell decisions, and it is not aimed at making individual investment decisions. It is a comparison tool for analyzing AI model behavior.

Question 31

How is the service evolving?

Accepted Answer

AI Investor Barometer is developed incrementally as data accumulates. The valuation engine, prompts, and model calibration are improved based on empirical findings from production data. Recent milestones include Engine v7 with sector-specific ROIC calibration, temperature harmonization across all models, and 10 dedicated sector prompts. You can follow development progress on the Changelog page. To stay informed about new features and findings, subscribe to email updates — we send weekly AI Signals reports and notify about significant platform changes.

Question 32

Why are all models currently bearish — is this a bug?

Accepted Answer

No. All 5 models consistently produce estimates below market prices (median gap approximately −8% to −17%). This appears to be a combination of two factors: (1) DCF models are structurally sensitive to discount rates, and the current high-interest-rate environment pushes calculated values down; (2) LLMs may weight downside risks more heavily than upside opportunities in their assumptions. We track this systematically and consider it one of the most important research questions.

Question 33

Why are models more bearish on US stocks than Finnish stocks?

Accepted Answer

All models show 5–7 percentage points more negative bias for US stocks compared to Finnish stocks. The likely explanation is that US mega-cap stocks trade at significantly higher P/E multiples (often >30×), which DCF models struggle to justify through cash flow projections alone. Finnish stocks, trading at lower multiples, are closer to what DCF models naturally produce. This suggests DCF may have a structural limitation for high-growth, high-multiple markets.

Question 34

Do models have distinct 'personalities'?

Accepted Answer

Yes — this is one of the clearest findings. Claude behaves like a cautious senior analyst: low volatility (1.6%/day change), smallest bias, highest consistency. GPT is conservative and risk-focused: highest bearish bias, moderate volatility. Gemini is the most impulsive: largest day-to-day swings. DeepSeek follows the group efficiently at the lowest cost. Grok is fast and bold with the highest cap rate. These 'personalities' have been stable across all 13+ trading days observed.

Question 35

Can high agreement (1.00) be misleading?

Accepted Answer

Yes. When all 5 models show perfect agreement (score = 1.00), it can mean two different things: genuine consensus based on aligned views, or artificial convergence caused by safety caps forcing all models into the same band. For example, if the analyst target price cap limits all models to the same range, they appear to 'agree' even though their underlying DCF estimates may have been very different. We are developing ways to distinguish genuine from cap-induced agreement.

Question 36

Whose analysis style do LLMs actually follow?

Accepted Answer

This is an open research question. LLMs are trained on vast amounts of financial text — analyst reports, news, academic papers, forums. They may have learned a blend of institutional analyst conventions (like those of Goldman Sachs or Morgan Stanley) mixed with retail investor perspectives. One hypothesis: their tendency toward bearish estimates may reflect the conservative bias common in published analyst reports, where overestimating (being 'too bullish') carries more reputational risk than underestimating.

Question 37

Are LLMs actually good at stock valuation?

Accepted Answer

Too early to tell definitively — we need at least 90 days of data for meaningful statistical analysis. However, preliminary observations suggest that: (1) the 5-model consensus is more stable than any single model; (2) LLMs struggle most with high-P/E growth stocks; (3) they perform better on European stocks with lower multiples; (4) their value may lie not in accurate price targets but in revealing systematic patterns and biases in how AI 'thinks' about value.

Question 38

How do models react to new information like earnings reports?

Accepted Answer

We don't have earnings season data yet (first Q1/2026 results expected April–May). This will be a crucial test: do some models react faster to new data while others maintain their prior view? The 'Steady' vs 'Reactive' model trait classification predicts that Claude will be slow to change while Gemini may overreact. We plan to publish specific analysis after the first earnings cycle.

Questions & Answers

Methodology & Calculation

Indices & Metrics

AI Model Behavior

Data & Limitations

Use & Disclaimer

Research & Preliminary Findings