Related papers: EconNLI: Evaluating Large Language Models on Economics Reasoning

EconNLI: Evaluating Large Language Models on Economics Reasoning

URL: http://arxiv.org/abs/2407.01212v1
Date: Mon, 1 Jul 2024 11:58:24 GMT
Title: EconNLI: Evaluating Large Language Models on Economics Reasoning
Authors: Yue Guo, Yi Yang,
Abstract summary: Large Language Models (LLMs) are widely used for writing economic analysis reports or providing financial advice. We propose a new dataset, natural language inference on economic events (EconNLI), to evaluate LLMs' knowledge and reasoning abilities in the economic domain. Our experiments reveal that LLMs are not sophisticated in economic reasoning and may generate wrong or hallucinated answers.
Score: 22.754757518792395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are widely used for writing economic analysis reports or providing financial advice, but their ability to understand economic knowledge and reason about potential results of specific economic events lacks systematic evaluation. To address this gap, we propose a new dataset, natural language inference on economic events (EconNLI), to evaluate LLMs' knowledge and reasoning abilities in the economic domain. We evaluate LLMs on (1) their ability to correctly classify whether a premise event will cause a hypothesis event and (2) their ability to generate reasonable events resulting from a given premise. Our experiments reveal that LLMs are not sophisticated in economic reasoning and may generate wrong or hallucinated answers. Our study raises awareness of the limitations of using LLMs for critical decision-making involving economic reasoning and analysis. The dataset and codes are available at https://github.com/Irenehere/EconNLI.

Related papers

Left Leaning Models: AI Assumptions on Economic Policy [0.0]
This paper uses a conjoint experiment to tease out the main factors influencing large language models' evaluation of economic policy.<n>It finds that LLMs are most sensitive to unemployment, inequality, financial stability, and environmental harm and less sensitive to traditional macroeconomic concerns such as economic growth, inflation, and government debt.
arXiv Detail & Related papers (2025-07-21T16:27:16Z)
Revealing economic facts: LLMs know more than they say [1.433758865948252]
We investigate whether the hidden states of large language models (LLMs) can be used to estimate and impute economic statistics.<n>We show that a simple linear model trained on the hidden states of open-source LLMs outperforms the models' text outputs.
arXiv Detail & Related papers (2025-05-13T15:24:08Z)
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models [51.85792055455284]
Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks. System 1 reasoning is computationally efficient but leads to suboptimal performance. System 2 reasoning often incurs substantial computational costs due to its slow thinking nature and inefficient or unnecessary reasoning behaviors.
arXiv Detail & Related papers (2025-03-31T17:58:07Z)
Gender Bias of LLM in Economics: An Existentialism Perspective [1.024113475677323]
This paper investigates gender bias in large language models (LLMs) LLMs reinforce gender stereotypes even without explicit gender markers. We argue that bias in LLMs is not an unintended flaw but a systematic result of their rational processing.
arXiv Detail & Related papers (2024-10-14T01:42:01Z)
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments [19.366120861935105]
Large Language Models (LLMs) show significant potential in economic and strategic interactions. These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems. We introduce a benchmark for standardizing research on two-player, sequential, language-based games.
arXiv Detail & Related papers (2024-10-07T17:55:35Z)
LLM economicus? Mapping the Behavioral Biases of LLMs via Utility Theory [20.79199807796242]
Utility theory is an approach to evaluate the economic biases of large language models. We find that the economic behavior of current LLMs is neither entirely human-like nor entirely economicus-like.
arXiv Detail & Related papers (2024-08-05T19:00:43Z)
Understanding Intrinsic Socioeconomic Biases in Large Language Models [4.276697874428501]
We introduce a novel dataset of one million English sentences to quantify socioeconomic biases. Our findings reveal pervasive socioeconomic biases in both established models like GPT-2 and state-of-the-art models like Llama 2 and Falcon.
arXiv Detail & Related papers (2024-05-28T23:54:44Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system. We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks. We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
Is Knowledge All Large Language Models Needed for Causal Reasoning? [11.476877330365664]
This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. We propose a novel causal attribution model that utilizes do-operators" for constructing counterfactual scenarios.
arXiv Detail & Related papers (2023-12-30T04:51:46Z)
CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.