Related papers: Your AI, Not Your View: The Bias of LLMs in Investment Analysis

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

URL: http://arxiv.org/abs/2507.20957v4
Date: Thu, 16 Oct 2025 18:06:41 GMT
Title: Your AI, Not Your View: The Bias of LLMs in Investment Analysis
Authors: Hoyoung Lee, Junhyuk Seo, Suhwan Park, Junhyeong Lee, Wonbin Ahn, Chanyeol Choi, Alejandro Lopez-Lira, Yongjae Lee,
Abstract summary: In finance, Large Language Models (LLMs) face frequent knowledge conflicts arising from discrepancies between their pre-trained parametric knowledge and real-time market data.<n>These conflicts are especially problematic in real-world investment services, where a model's inherent biases can misalign with institutional objectives.<n>We propose an experimental framework to investigate emergent behaviors in such conflict scenarios, offering a quantitative analysis of bias in investment analysis.
Score: 62.388554963415906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In finance, Large Language Models (LLMs) face frequent knowledge conflicts arising from discrepancies between their pre-trained parametric knowledge and real-time market data. These conflicts are especially problematic in real-world investment services, where a model's inherent biases can misalign with institutional objectives, leading to unreliable recommendations. Despite this risk, the intrinsic investment biases of LLMs remain underexplored. We propose an experimental framework to investigate emergent behaviors in such conflict scenarios, offering a quantitative analysis of bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract the latent biases of models and measure their persistence. Our analysis, centered on sector, size, and momentum, reveals distinct, model-specific biases. Across most models, a tendency to prefer technology stocks, large-cap stocks, and contrarian strategies is observed. These foundational biases often escalate into confirmation bias, causing models to cling to initial judgments even when faced with increasing counter-evidence. A public leaderboard benchmarking bias across a broader set of models is available at https://linqalpha.com/leaderboard

Related papers

$φ$-DPO: Fairness Direct Preference Optimization Approach to Continual Learning in Large Multimodal Models [58.217707070069885]
This paper presents a novel Fairness Direct Preference Optimization (FaiDPO or $$-DPO) framework for continual learning in LMMs.<n>We first propose a new continual learning paradigm based on Direct Preference Optimization (DPO) to mitigate catastrophic forgetting by aligning learning with pairwise preference signals.<n> Extensive experiments and ablation studies show the proposed $$-DPO achieves State-of-the-Art performance across multiple benchmarks.
arXiv Detail & Related papers (2026-02-26T04:14:33Z)
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection [64.75447949495307]
Large language models (LLMs) have been widely applied across various domains of finance.<n> behavioral biases can lead to instability and uncertainty in decision-making.<n>mfmdscen is a benchmark for evaluating behavioral biases in mfmd across diverse economic scenarios.
arXiv Detail & Related papers (2026-01-08T22:00:32Z)
Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models [0.06749750044497731]
This paper focuses on representation bias in open-source Qwen models.<n>Using statistical tests and variance analysis, we find that firm size and valuation consistently increase model confidence.<n>When models are prompted for specific financial categories, their confidence rankings best align with fundamental data, moderately with technical signals, and least with growth indicators.
arXiv Detail & Related papers (2025-10-07T09:10:13Z)
To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions [0.0]
Large language models (LLMs) are increasingly deployed in agentic frameworks.<n>We develop an agentic system that uses LLMs to iteratively discover differential equations for financial time series.<n>We find that model-informed trading strategies outperform standard LLM-based agents.
arXiv Detail & Related papers (2025-07-11T13:29:32Z)
Relative Bias: A Comparative Framework for Quantifying Bias in LLMs [29.112649816695203]
Relative Bias is a method designed to assess how an LLM's behavior deviates from other LLMs within a specified target domain.<n>We introduce two complementary methodologies: (1) Embedding Transformation analysis, which captures relative bias patterns through sentence representations over the embedding space, and (2) LLM-as-a-Judge, which employs a language model to evaluate outputs comparatively.<n>Applying our framework to several case studies on bias and alignment scenarios following by statistical tests for validation, we find strong alignment between the two scoring methods.
arXiv Detail & Related papers (2025-05-22T01:59:54Z)
LLM-Enhanced Black-Litterman Portfolio Optimization [30.37210534945387]
This study proposes and validates a systematic frame- work that translates return forecasts and predictive uncertainty from Large Language Models into the core inputs for the Black-Litterman model.<n>Through a backtest on S&P 500 constituents, we demonstrate that portfolios driven by top-performing LLMs significantly out- perform traditional baselines in both absolute and risk-adjusted terms.
arXiv Detail & Related papers (2025-04-19T16:26:14Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.<n>In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs [0.0]
Large Language Models (LLMs) are being adopted across a wide range of tasks. Recent research indicates that LLMs can harbor implicit biases even when they pass explicit bias evaluations. This study highlights that newer or larger language models do not automatically exhibit reduced bias.
arXiv Detail & Related papers (2024-10-13T03:43:18Z)
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Financial Statement Analysis with Large Language Models [0.0]
We provide standardized and anonymous financial statements to GPT4 and instruct the model to analyze them.<n>The model outperforms financial analysts in its ability to predict earnings changes directionally.<n>Our trading strategies based on GPT's predictions yield a higher Sharpe ratio and alphas than strategies based on other models.
arXiv Detail & Related papers (2024-07-25T08:36:58Z)
The Economic Implications of Large Language Model Selection on Earnings and Return on Investment: A Decision Theoretic Model [0.0]
We use a decision-theoretic approach to compare the financial impact of different language models. The study reveals how the superior accuracy of more expensive models can, under certain conditions, justify a greater investment. This article provides a framework for companies looking to optimize their technology choices.
arXiv Detail & Related papers (2024-05-27T20:08:41Z)
Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs [44.53203911878139]
Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. Financial Bias Indicators (FBI) is a framework with components like Bias Unveiler, Bias Detective, Bias Tracker, and Bias Antidote. We evaluate 23 leading LLMs and propose a de-biasing method based on financial causal knowledge.
arXiv Detail & Related papers (2024-02-20T04:26:08Z)
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models [11.154814189699735]
Large Language Models (LLMs) pre-trained on extensive corpora have demonstrated superior performance across various NLP tasks. We introduce a retrieval-augmented LLMs framework for financial sentiment analysis. Our approach achieves 15% to 48% performance gain in accuracy and F1 score.
arXiv Detail & Related papers (2023-10-06T05:40:23Z)
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks. We propose the first open-source comprehensive framework for exploring LLMs for credit scoring. We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics. By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention. By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.