Inverse Rational Control with Partially Observable Continuous Nonlinear
Dynamics
- URL: http://arxiv.org/abs/2009.12576v2
- Date: Fri, 30 Oct 2020 07:09:41 GMT
- Title: Inverse Rational Control with Partially Observable Continuous Nonlinear
Dynamics
- Authors: Minhae Kwon, Saurabh Daptardar, Paul Schrater, Xaq Pitkow
- Abstract summary: A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information.
This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function.
We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model.
Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes.
- Score: 6.65264113799989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental question in neuroscience is how the brain creates an internal
model of the world to guide actions using sequences of ambiguous sensory
information. This is naturally formulated as a reinforcement learning problem
under partial observations, where an agent must estimate relevant latent
variables in the world from its evidence, anticipate possible future states,
and choose actions that optimize total expected reward. This problem can be
solved by control theory, which allows us to find the optimal actions for a
given system dynamics and objective function. However, animals often appear to
behave suboptimally. Why? We hypothesize that animals have their own flawed
internal model of the world, and choose actions with the highest expected
subjective reward according to that flawed model. We describe this behavior as
rational but not optimal. The problem of Inverse Rational Control (IRC) aims to
identify which internal model would best explain an agent's actions. Our
contribution here generalizes past work on Inverse Rational Control which
solved this problem for discrete control in partially observable Markov
decision processes. Here we accommodate continuous nonlinear dynamics and
continuous actions, and impute sensory observations corrupted by unknown noise
that is private to the animal. We first build an optimal Bayesian agent that
learns an optimal policy generalized over the entire model space of dynamics
and subjective rewards using deep reinforcement learning. Crucially, this
allows us to compute a likelihood over models for experimentally observable
action trajectories acquired from a suboptimal agent. We then find the model
parameters that maximize the likelihood using gradient ascent.
Related papers
- Inverse decision-making using neural amortized Bayesian actors [19.128377007314317]
We amortize the Bayesian actor using a neural network trained on a wide range of parameter settings in an unsupervised fashion.
We show how our method allows for principled model comparison and how it can be used to disentangle factors that may lead to unidentifiabilities between priors and costs.
arXiv Detail & Related papers (2024-09-04T10:31:35Z) - Ego-Foresight: Agent Visuomotor Prediction as Regularization for RL [34.6883445484835]
Ego-Foresight is a self-supervised method for disentangling agent and environment based on motion and prediction.
We show that visuomotor prediction of the agent provides regularization to the RL algorithm, by encouraging the actions to stay within predictable bounds.
We integrate Ego-Foresight with a model-free RL algorithm to solve simulated robotic manipulation tasks, showing an average improvement of 23% in efficiency and 8% in performance.
arXiv Detail & Related papers (2024-05-27T13:32:43Z) - Towards Generalizable and Interpretable Motion Prediction: A Deep
Variational Bayes Approach [54.429396802848224]
This paper proposes an interpretable generative model for motion prediction with robust generalizability to out-of-distribution cases.
For interpretability, the model achieves the target-driven motion prediction by estimating the spatial distribution of long-term destinations.
Experiments on motion prediction datasets validate that the fitted model can be interpretable and generalizable.
arXiv Detail & Related papers (2024-03-10T04:16:04Z) - Representation Surgery: Theory and Practice of Affine Steering [72.61363182652853]
Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text.
One natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model's representations.
This paper investigates the formal and empirical properties of steering functions.
arXiv Detail & Related papers (2024-02-15T00:20:30Z) - Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases [76.9127853906115]
Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative.
We propose Temporal Diffusion Policy Optimization with critic active neuron Reset (TDPO-R), a policy gradient algorithm that exploits the temporal inductive bias of diffusion models.
Empirical results demonstrate the superior efficacy of our methods in mitigating reward overoptimization.
arXiv Detail & Related papers (2024-02-13T15:55:41Z) - A General Neural Causal Model for Interactive Recommendation [24.98550634633534]
Survivor bias in observational data leads the optimization of recommender systems towards local optima.
We propose a neural causal model to achieve counterfactual inference.
arXiv Detail & Related papers (2023-10-30T13:21:04Z) - Optimistic Active Exploration of Dynamical Systems [52.91573056896633]
We develop an algorithm for active exploration called OPAX.
We show how OPAX can be reduced to an optimal control problem that can be solved at each episode.
Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.
arXiv Detail & Related papers (2023-06-21T16:26:59Z) - Deep Grey-Box Modeling With Adaptive Data-Driven Models Toward
Trustworthy Estimation of Theory-Driven Models [88.63781315038824]
We present a framework that enables us to analyze a regularizer's behavior empirically with a slight change in the neural net's architecture and the training objective.
arXiv Detail & Related papers (2022-10-24T10:42:26Z) - Inference of Affordances and Active Motor Control in Simulated Agents [0.5161531917413706]
We introduce an output-probabilistic, temporally predictive, modular artificial neural network architecture.
We show that our architecture develops latent states that can be interpreted as affordance maps.
In combination with active inference, we show that flexible, goal-directed behavior can be invoked.
arXiv Detail & Related papers (2022-02-23T14:13:04Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Goal-Directed Planning by Reinforcement Learning and Active Inference [16.694117274961016]
We propose a novel computational framework of decision making with Bayesian inference.
Goal-directed behavior is determined from the posterior distribution of $z$ by planning.
We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.
arXiv Detail & Related papers (2021-06-18T06:41:01Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.