View From Above: A Framework for Evaluating Distribution Shifts in Model Behavior
- URL: http://arxiv.org/abs/2407.00948v3
- Date: Sat, 28 Sep 2024 00:07:27 GMT
- Title: View From Above: A Framework for Evaluating Distribution Shifts in Model Behavior
- Authors: Tanush Chopra, Michael Li, Jacob Haimes,
- Abstract summary: Large language models (LLMs) are asked to perform certain tasks.
How can we be sure that their learned representations align with reality?
We propose a domain-agnostic framework for systematically evaluating distribution shifts.
- Score: 0.9043709769827437
- License:
- Abstract: When large language models (LLMs) are asked to perform certain tasks, how can we be sure that their learned representations align with reality? We propose a domain-agnostic framework for systematically evaluating distribution shifts in LLMs decision-making processes, where they are given control of mechanisms governed by pre-defined rules. While individual LLM actions may appear consistent with expected behavior, across a large number of trials, statistically significant distribution shifts can emerge. To test this, we construct a well-defined environment with known outcome logic: blackjack. In more than 1,000 trials, we uncover statistically significant evidence suggesting behavioral misalignment in the learned representations of LLM.
Related papers
- Understanding Multimodal LLMs Under Distribution Shifts: An Information-Theoretic Approach [33.463823493423554]
Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts.
We argue that establishing a formal framework that can characterize and quantify the risk of MLLMs is necessary to ensure the safe and reliable application of MLLMs in the real world.
arXiv Detail & Related papers (2025-02-01T22:06:56Z) - Benchmarking Distributional Alignment of Large Language Models [43.0198231524816]
Language models (LMs) are increasingly used as simulacra for people, yet their ability to match the distribution of views of a specific demographic group remains uncertain.
We construct a dataset expanding beyond political values, create human baselines for this task, and evaluate the extent to which an LM can align with a particular group's opinion distribution.
Our analysis reveals open problems regarding if, and how, LMs can be used to simulate humans, and that LLMs can more accurately describe the opinion distribution than simulate such distributions.
arXiv Detail & Related papers (2024-11-08T08:41:17Z) - Fair In-Context Learning via Latent Concept Variables [17.216196320585922]
Large language models (LLMs) can inherit social bias and discrimination from their pre-training data.
We design data augmentation strategies that reduce correlation between predictive outcomes and sensitive variables helping to promote fairness during latent concept learning.
arXiv Detail & Related papers (2024-11-04T23:10:05Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - Evaluating Human Alignment and Model Faithfulness of LLM Rationale [66.75309523854476]
We study how well large language models (LLMs) explain their generations through rationales.
We show that prompting-based methods are less "faithful" than attribution-based explanations.
arXiv Detail & Related papers (2024-06-28T20:06:30Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation [73.58618024960968]
An increasing number of studies are employing large language models (LLMs) as agents to emulate the sequential decision-making processes of humans.
This arouses curiosity regarding the capacity of LLM agents to comprehend probability distributions.
Our analysis indicates that LLM agents can understand probabilities, but they struggle with probability sampling.
arXiv Detail & Related papers (2024-04-13T16:59:28Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware
Classification [7.696798306913988]
We introduce a framework outlining fairness regulations aligned with various fairness definitions.
We explore the configuration for in-context learning and the procedure for selecting in-context demonstrations using RAG.
Experiments conducted with different LLMs indicate that GPT-4 delivers superior results in terms of both accuracy and fairness compared to other models.
arXiv Detail & Related papers (2024-02-28T17:29:27Z) - Intuitive or Dependent? Investigating LLMs' Behavior Style to
Conflicting Prompts [9.399159332152013]
This study investigates the behaviors of Large Language Models (LLMs) when faced with conflicting prompts versus their internal memory.
This will help to understand LLMs' decision mechanism and also benefit real-world applications, such as retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2023-09-29T17:26:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.