Related papers: Deconstructing deep active inference

Deconstructing deep active inference

URL: http://arxiv.org/abs/2303.01618v2
Date: Mon, 8 May 2023 08:20:23 GMT
Title: Deconstructing deep active inference
Authors: Th\'eophile Champion and Marek Grze\'s and Lisa Bonheme and Howard Bowman
Abstract summary: Active inference is a theory of perception, learning and decision making. The goal of this activity is to solve more complicated tasks using deep active inference.
Score: 2.236663830879273
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Active inference is a theory of perception, learning and decision making, which can be applied to neuroscience, robotics, and machine learning. Recently, reasearch has been taking place to scale up this framework using Monte-Carlo tree search and deep learning. The goal of this activity is to solve more complicated tasks using deep active inference. First, we review the existing literature, then, we progresively build a deep active inference agent. For two agents, we have experimented with five definitions of the expected free energy and three different action selection strategies. According to our experiments, the models able to solve the dSprites environment are the ones that maximise rewards. Finally, we compare the similarity of the representation learned by the layers of various agents using centered kernel alignment. Importantly, the agent maximising reward and the agent minimising expected free energy learn very similar representations except for the last layer of the critic network (reflecting the difference in learning objective), and the variance layers of the transition and encoder networks. We found that the reward maximising agent is a lot more certain than the agent minimising expected free energy. This is because the agent minimising expected free energy always picks the action down, and does not gather enough data for the other actions. In contrast, the agent maximising reward, keeps on selecting the actions left and right, enabling it to successfully solve the task. The only difference between those two agents is the epistemic value, which aims to make the outputs of the transition and encoder networks as close as possible. Thus, the agent minimising expected free energy picks a single action (down), and becomes an expert at predicting the future when selecting this action. This makes the KL divergence between the output of the transition and encoder networks small.

Related papers

Multi-Agent Inverse Q-Learning from Demonstrations [3.4136908117644698]
Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL) is a novel sample-efficient framework for multi-agent IRL. We show MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x.
arXiv Detail & Related papers (2025-03-06T18:22:29Z)
Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning [20.76991315856237]
We propose EMAI, a novel agent-level explanation approach that evaluates the individual agent's importance. Inspired by counterfactual reasoning, a larger change in reward caused by the randomized action of agent indicates its higher importance. EMAI achieves higher fidelity in explanations than baselines and provides more effective guidance in practical applications.
arXiv Detail & Related papers (2024-12-20T07:24:43Z)
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z)
Decentralized scheduling through an adaptive, trading-based multi-agent system [1.7403133838762448]
In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. This work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs.
arXiv Detail & Related papers (2022-07-05T13:50:18Z)
On the Expressivity of Markov Reward [89.96685777114456]
This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories.
arXiv Detail & Related papers (2021-11-01T12:12:16Z)
Contrastive Active Inference [12.361539023886161]
We propose a contrastive objective for active inference that reduces the computational burden in learning the agent's generative model and planning future actions. Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train.
arXiv Detail & Related papers (2021-10-19T16:20:49Z)
Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge [42.37872230561632]
In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. Most of the existing models are only effective for single-agent navigation, and a single agent has low efficiency and poor fault tolerance when completing more complicated tasks. We propose the multi-agent visual semantic navigation, in which multiple agents collaborate with others to find multiple target objects.
arXiv Detail & Related papers (2021-09-20T13:31:03Z)
What is Going on Inside Recurrent Meta Reinforcement Learning Agents? [63.58053355357644]
Recurrent meta reinforcement learning (meta-RL) agents are agents that employ a recurrent neural network (RNN) for the purpose of "learning a learning algorithm" We shed light on the internal working mechanisms of these agents by reformulating the meta-RL problem using the Partially Observable Markov Decision Process (POMDP) framework.
arXiv Detail & Related papers (2021-04-29T20:34:39Z)
Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function. Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
Deep active inference agents using Monte-Carlo methods [3.8233569758620054]
We present a neural architecture for building deep active inference agents in continuous state-spaces using Monte-Carlo sampling. Our approach enables agents to learn environmental dynamics efficiently, while maintaining task performance. Results show that deep active inference provides a flexible framework to develop biologically-inspired intelligent agents.
arXiv Detail & Related papers (2020-06-07T15:10:42Z)
Maximizing Information Gain in Partially Observable Environments via Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent. We derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks. Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.