Related papers: Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

URL: http://arxiv.org/abs/2601.05403v1
Date: Thu, 08 Jan 2026 22:00:32 GMT
Title: Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
Authors: Zhiwei Liu, Yupen Cao, Yuechen Jiang, Mohsinul Kabir, Polydoros Giannouris, Chen Xu, Ziyang Xu, Tianlei Zhu, Tariquzzaman Faisal, Triantafillos Papadopoulos, Yan Wang, Lingfei Qian, Xueqing Peng, Zhuohan Xie, Ye Yuan, Saeed Almheiri, Abdulrazzaq Alnajjar, Mingbin Chen, Harry Stuart, Paul Thompson, Prayag Tiwari, Alejandro Lopez-Lira, Xue Liu, Jimin Huang, Sophia Ananiadou,
Abstract summary: Large language models (LLMs) have been widely applied across various domains of finance.<n> behavioral biases can lead to instability and uncertainty in decision-making.<n>mfmdscen is a benchmark for evaluating behavioral biases in mfmd across diverse economic scenarios.
Score: 64.75447949495307
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (\mfmd). In this work, we propose \mfmdscen, a comprehensive benchmark for evaluating behavioral biases of LLMs in \mfmd across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, \mfmdscen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project will be available at https://github.com/lzw108/FMD.

Related papers

Evaluating LLMs in Finance Requires Explicit Bias Consideration [88.38155218924999]
Finance-specific biases can inflate performance, contaminate backtests, and make reported results useless for deployment claims.<n>No single bias is discussed in more than 28 percent of studies.<n>We propose a Structural Validity Framework and an evaluation checklist with minimal requirements for bias diagnosis and future system design.
arXiv Detail & Related papers (2026-02-15T17:02:01Z)
The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems [54.12165004393043]
FinMMEval 2026 offers three interconnected tasks that span financial understanding, reasoning, and decision-making.<n>The lab aims to promote the development of robust, transparent, and globally inclusive financial AI systems.
arXiv Detail & Related papers (2026-02-11T14:14:06Z)
UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos [22.530796761115766]
We propose UniFinEval, the first unified multimodal benchmark for high-information-density financial environments.<n>UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems.<n> Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts.
arXiv Detail & Related papers (2026-01-09T10:15:32Z)
Fine-tuning of lightweight large language models for sentiment classification on heterogeneous financial textual data [0.8921166277011348]
We investigate the ability of lightweight open-source large language models (LLMs) to generalize sentiment understanding from financial datasets.<n>We find that LLMs, specially Qwen3 8B and Llama3 8B, perform best in most scenarios, even from using only 5% of the available training data.
arXiv Detail & Related papers (2025-11-30T15:58:22Z)
Your AI, Not Your View: The Bias of LLMs in Investment Analysis [62.388554963415906]
In finance, Large Language Models (LLMs) face frequent knowledge conflicts arising from discrepancies between their pre-trained parametric knowledge and real-time market data.<n>These conflicts are especially problematic in real-world investment services, where a model's inherent biases can misalign with institutional objectives.<n>We propose an experimental framework to investigate emergent behaviors in such conflict scenarios, offering a quantitative analysis of bias in investment analysis.
arXiv Detail & Related papers (2025-07-28T16:09:38Z)
QuantMCP: Grounding Large Language Models in Verifiable Financial Reality [0.43512163406552007]
Large Language Models (LLMs) hold immense promise for revolutionizing financial analysis and decision-making.<n>However, their direct application is often hampered by issues of data hallucination and lack of access to real-time, verifiable financial information.<n>This paper introduces QuantMCP, a novel framework designed to rigorously ground LLMs in financial reality.
arXiv Detail & Related papers (2025-06-07T01:52:39Z)
SNFinLLM: Systematic and Nuanced Financial Domain Adaptation of Chinese Large Language Models [6.639972934967109]
Large language models (LLMs) have become powerful tools for advancing natural language processing applications in the financial industry. We propose a novel large language model specifically designed for the Chinese financial domain, named SNFinLLM. SNFinLLM excels in domain-specific tasks such as answering questions, summarizing financial research reports, analyzing sentiment, and executing financial calculations.
arXiv Detail & Related papers (2024-08-05T08:24:24Z)
RiskLabs: Predicting Financial Risk Using Large Language Model based on Multimodal and Multi-Sources Data [7.76579913330606]
We introduce RiskLabs, a novel framework that leverages large language models (LLMs) to analyze and predict financial risks.<n> Empirical results demonstrate RiskLabs' effectiveness in forecasting both market volatility and variance.
arXiv Detail & Related papers (2024-04-11T03:14:50Z)
Are LLMs Rational Investors? A Study on Detecting and Reducing the Financial Bias in LLMs [44.53203911878139]
Large Language Models (LLMs) are increasingly adopted in financial analysis for interpreting complex market data and trends. Financial Bias Indicators (FBI) is a framework with components like Bias Unveiler, Bias Detective, Bias Tracker, and Bias Antidote. We evaluate 23 leading LLMs and propose a de-biasing method based on financial causal knowledge.
arXiv Detail & Related papers (2024-02-20T04:26:08Z)
Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models [53.620827459684094]
Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks. We propose the first open-source comprehensive framework for exploring LLMs for credit scoring. We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks.
arXiv Detail & Related papers (2023-10-01T03:50:34Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.