Related papers: Limitations of Agents Simulated by Predictive Models

Limitations of Agents Simulated by Predictive Models

URL: http://arxiv.org/abs/2402.05829v1
Date: Thu, 8 Feb 2024 17:08:08 GMT
Title: Limitations of Agents Simulated by Predictive Models
Authors: Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria Krakovna
Abstract summary: We outline two structural reasons for why predictive models can fail when turned into agents. We show that both of those failures are fixed by including a feedback loop from the environment. Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
Score: 1.6649383443094403
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There is increasing focus on adapting predictive models into agent-like systems, most notably AI assistants based on language models. We outline two structural reasons for why these models can fail when turned into agents. First, we discuss auto-suggestive delusions. Prior work has shown theoretically that models fail to imitate agents that generated the training data if the agents relied on hidden observations: the hidden observations act as confounding variables, and the models treat actions they generate as evidence for nonexistent observations. Second, we introduce and formally study a related, novel limitation: predictor-policy incoherence. When a model generates a sequence of actions, the model's implicit prediction of the policy that generated those actions can serve as a confounding variable. The result is that models choose actions as if they expect future actions to be suboptimal, causing them to be overly conservative. We show that both of those failures are fixed by including a feedback loop from the environment, that is, re-training the models on their own actions. We give simple demonstrations of both limitations using Decision Transformers and confirm that empirical results agree with our conceptual and formal analysis. Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.

Related papers

Mode Collapse Happens: Evaluating Critical Interactions in Joint Trajectory Prediction Models [2.9498907601878974]
We introduce metrics for mode collapse, mode correctness, and coverage, emphasizing the sequential dimension of predictions.<n>Our framework can help researchers gain new insights and advance the development of more consistent and accurate prediction models.
arXiv Detail & Related papers (2025-06-29T09:53:12Z)
Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank [50.9530591265324]
Training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance.<n>We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks.<n>We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior.
arXiv Detail & Related papers (2025-06-25T14:47:43Z)
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph. We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z)
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels. Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features. These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Exploring Social Posterior Collapse in Variational Autoencoder for Interaction Modeling [26.01824780050843]
Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling. VAE is prone to ignoring historical social context when predicting the future trajectory of an agent. We propose a novel sparse graph attention message-passing layer, which helps us detect social posterior collapse.
arXiv Detail & Related papers (2021-12-01T06:20:58Z)
Shaking the foundations: delusions in sequence models for interaction and control [45.34593341136043]
We show that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions. We show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.
arXiv Detail & Related papers (2021-10-20T23:31:05Z)
You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance. It remains unclear which features such black-box models actually learn to use for making predictions. This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z)
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return. Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions. The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
Learning Opinion Dynamics From Social Traces [25.161493874783584]
We propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces. We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart. We apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect.
arXiv Detail & Related papers (2020-06-02T14:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.