Limitations of Agents Simulated by Predictive Models
- URL: http://arxiv.org/abs/2402.05829v1
- Date: Thu, 8 Feb 2024 17:08:08 GMT
- Title: Limitations of Agents Simulated by Predictive Models
- Authors: Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria
Krakovna
- Abstract summary: We outline two structural reasons for why predictive models can fail when turned into agents.
We show that both of those failures are fixed by including a feedback loop from the environment.
Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
- Score: 1.6649383443094403
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is increasing focus on adapting predictive models into agent-like
systems, most notably AI assistants based on language models. We outline two
structural reasons for why these models can fail when turned into agents.
First, we discuss auto-suggestive delusions. Prior work has shown theoretically
that models fail to imitate agents that generated the training data if the
agents relied on hidden observations: the hidden observations act as
confounding variables, and the models treat actions they generate as evidence
for nonexistent observations. Second, we introduce and formally study a
related, novel limitation: predictor-policy incoherence. When a model generates
a sequence of actions, the model's implicit prediction of the policy that
generated those actions can serve as a confounding variable. The result is that
models choose actions as if they expect future actions to be suboptimal,
causing them to be overly conservative. We show that both of those failures are
fixed by including a feedback loop from the environment, that is, re-training
the models on their own actions. We give simple demonstrations of both
limitations using Decision Transformers and confirm that empirical results
agree with our conceptual and formal analysis. Our treatment provides a
unifying view of those failure modes, and informs the question of why
fine-tuning offline learned policies with online learning makes them more
effective.
Related papers
- Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Exploring Social Posterior Collapse in Variational Autoencoder for
Interaction Modeling [26.01824780050843]
Variational Autoencoder (VAE) has been widely applied in multi-agent interaction modeling.
VAE is prone to ignoring historical social context when predicting the future trajectory of an agent.
We propose a novel sparse graph attention message-passing layer, which helps us detect social posterior collapse.
arXiv Detail & Related papers (2021-12-01T06:20:58Z) - Shaking the foundations: delusions in sequence models for interaction
and control [45.34593341136043]
We show that sequence models "lack the understanding of the cause and effect of their actions" leading them to draw incorrect inferences due to auto-suggestive delusions.
We show that in supervised learning, one can teach a system to condition or intervene on data by training with factual and counterfactual error signals respectively.
arXiv Detail & Related papers (2021-10-20T23:31:05Z) - You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory
Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance.
It remains unclear which features such black-box models actually learn to use for making predictions.
This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Learning Opinion Dynamics From Social Traces [25.161493874783584]
We propose an inference mechanism for fitting a generative, agent-like model of opinion dynamics to real-world social traces.
We showcase our proposal by translating a classical agent-based model of opinion dynamics into its generative counterpart.
We apply our model to real-world data from Reddit to explore the long-standing question about the impact of backfire effect.
arXiv Detail & Related papers (2020-06-02T14:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.