Deficiency of Large Language Models in Finance: An Empirical Examination
of Hallucination
- URL: http://arxiv.org/abs/2311.15548v1
- Date: Mon, 27 Nov 2023 05:27:13 GMT
- Title: Deficiency of Large Language Models in Finance: An Empirical Examination
of Hallucination
- Authors: Haoqiang Kang and Xiao-Yang Liu
- Abstract summary: hallucination is recognized as a fundamental deficiency of large language models (LLMs)
This paper empirically investigates LLM models' ability of explaining financial concepts and terminologies.
We evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command.
- Score: 7.627664978437055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The hallucination issue is recognized as a fundamental deficiency of large
language models (LLMs), especially when applied to fields such as finance,
education, and law. Despite the growing concerns, there has been a lack of
empirical investigation. In this paper, we provide an empirical examination of
LLMs' hallucination behaviors in financial tasks. First, we empirically
investigate LLM model's ability of explaining financial concepts and
terminologies. Second, we assess LLM models' capacity of querying historical
stock prices. Third, to alleviate the hallucination issue, we evaluate the
efficacy of four practical methods, including few-shot learning, Decoding by
Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method
and the prompt-based tool learning method for a function to generate a query
command. Finally, our major finding is that off-the-shelf LLMs experience
serious hallucination behaviors in financial tasks. Therefore, there is an
urgent need to call for research efforts in mitigating LLMs' hallucination.
Related papers
- Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning [16.883679810267342]
Iterative Model-level Contrastive Learning (Iter-AHMCL) to address hallucination.
This paper introduces a novel approach called Iterative Model-level Contrastive Learning (Iter-AHMCL) to address hallucination.
arXiv Detail & Related papers (2024-10-16T00:15:40Z) - SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection [10.54378596443678]
Large language models (LLMs) are highly capable but face latency challenges in real-time applications.
This study optimize the real-time interpretable hallucination detection by introducing effective prompting techniques.
arXiv Detail & Related papers (2024-08-22T22:13:13Z) - LLM Internal States Reveal Hallucination Risk Faced With a Query [62.29558761326031]
Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries.
This paper investigates whether Large Language Models can estimate their own hallucination risk before response generation.
By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32% at run time.
arXiv Detail & Related papers (2024-07-03T17:08:52Z) - Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models [12.27217471495276]
Hallucinations in large language models (LLMs) produce responses that are coherent but factually inaccurate.
We present MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection.
We also present HELM, a new benchmark for evaluating hallucination detection across multiple LLMs.
arXiv Detail & Related papers (2024-03-11T05:51:03Z) - Benchmarking Hallucination in Large Language Models based on
Unanswerable Math Word Problem [58.3723958800254]
Large language models (LLMs) are highly effective in various natural language processing (NLP) tasks.
They are susceptible to producing unreliable conjectures in ambiguous contexts called hallucination.
This paper presents a new method for evaluating LLM hallucination in Question Answering (QA) based on the unanswerable math word problem (MWP)
arXiv Detail & Related papers (2024-03-06T09:06:34Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination
Evaluation [58.19101663976327]
Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations.
evaluating MLLMs' hallucinations is becoming increasingly important in model improvement and practical application deployment.
We propose an LLM-free multi-dimensional benchmark AMBER, which can be used to evaluate both generative task and discriminative task.
arXiv Detail & Related papers (2023-11-13T15:25:42Z) - A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [40.79317187623401]
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP)
LLMs are prone to hallucination, generating plausible yet nonfactual content.
This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval systems.
arXiv Detail & Related papers (2023-11-09T09:25:37Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large
Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.
LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.