Can Language Representation Models Think in Bets?
- URL: http://arxiv.org/abs/2210.07519v1
- Date: Fri, 14 Oct 2022 05:01:04 GMT
- Title: Can Language Representation Models Think in Bets?
- Authors: Zhisheng Tang, Mayank Kejriwal
- Abstract summary: transformer-based language representation models (LRMs) have achieved state-of-the-art results on difficult natural language understanding problems.
This article investigates LRMs' rational decision-making ability through a carefully designed set of decision-making benchmarks and experiments.
- Score: 8.185725740857594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, transformer-based language representation models (LRMs) have
achieved state-of-the-art results on difficult natural language understanding
problems, such as question answering and text summarization. As these models
are integrated into real-world applications, evaluating their ability to make
rational decisions is an important research agenda, with practical
ramifications. This article investigates LRMs' rational decision-making ability
through a carefully designed set of decision-making benchmarks and experiments.
Inspired by classic work in cognitive science, we model the decision-making
problem as a bet. We then investigate an LRM's ability to choose outcomes that
have optimal, or at minimum, positive expected gain. Through a robust body of
experiments on four established LRMs, we show that a model is only able to
`think in bets' if it is first fine-tuned on bet questions with an identical
structure. Modifying the bet question's structure, while still retaining its
fundamental characteristics, decreases an LRM's performance by more than 25\%,
on average, although absolute performance remains well above random. LRMs are
also found to be more rational when selecting outcomes with non-negative
expected gain, rather than optimal or strictly positive expected gain. Our
results suggest that LRMs could potentially be applied to tasks that rely on
cognitive decision-making skills, but that more research is necessary before
they can robustly make rational decisions.
Related papers
- Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown [20.753374166695494]
We propose Uncertain-aware RM (URM) and Uncertain-aware RM Ensemble (URME) to incorporate and manage uncertainty in reward modeling.
URM can model the distribution of disentangled attributes within human preferences, while URME quantifies uncertainty through discrepancies in the ensemble.
Experiment results indicate that the proposed URM achieves state-of-the-art performance compared to models with the same size.
arXiv Detail & Related papers (2024-10-01T16:29:59Z) - Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework [77.45983464131977]
We focus on how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications.
Our research identifies two critical latent factors affecting RAG's confidence in its predictions.
We develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers.
arXiv Detail & Related papers (2024-09-24T14:52:14Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) can estimate causal effects under interventions on different parts of a system.
We conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - The ART of LLM Refinement: Ask, Refine, and Trust [85.75059530612882]
We propose a reasoning with refinement objective called ART: Ask, Refine, and Trust.
It asks necessary questions to decide when an LLM should refine its output.
It achieves a performance gain of +5 points over self-refinement baselines.
arXiv Detail & Related papers (2023-11-14T07:26:32Z) - Make Your Decision Convincing! A Unified Two-Stage Framework:
Self-Attribution and Decision-Making [24.906886146275127]
We propose a unified two-stage framework known as Self-Attribution and Decision-Making (SADM)
We demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale.
arXiv Detail & Related papers (2023-10-20T15:59:57Z) - Rational Decision-Making Agent with Internalized Utility Judgment [91.80700126895927]
Large language models (LLMs) have demonstrated remarkable advancements and have attracted significant efforts to develop LLMs into agents capable of executing intricate multi-step decision-making tasks beyond traditional NLP applications.
This paper proposes RadAgent, which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning.
Experimental results on the ToolBench dataset demonstrate RadAgent's superiority over baselines, achieving over 10% improvement in Pass Rate on diverse tasks.
arXiv Detail & Related papers (2023-08-24T03:11:45Z) - Learning Optimal Features via Partial Invariance [18.552839725370383]
Invariant Risk Minimization (IRM) is a popular framework that aims to learn robust models from multiple environments.
We show that IRM can over-constrain the predictor and to remedy this, we propose a relaxation via $textitpartial invariance$.
Several experiments, conducted both in linear settings as well as with deep neural networks on tasks over both language and image data, allow us to verify our conclusions.
arXiv Detail & Related papers (2023-01-28T02:48:14Z) - Invariant Rationalization [84.1861516092232]
A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
arXiv Detail & Related papers (2020-03-22T00:50:27Z) - Causal Strategic Linear Regression [5.672132510411465]
In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' propensity to "game" the decision rule.
We join concurrent work in modeling agents' outcomes as a function of their changeable attributes.
We provide efficient algorithms for learning decision rules that optimize three distinct decision-maker objectives.
arXiv Detail & Related papers (2020-02-24T03:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.