Object-based active inference
- URL: http://arxiv.org/abs/2209.01258v1
- Date: Fri, 2 Sep 2022 20:08:43 GMT
- Title: Object-based active inference
- Authors: Ruben S. van Bergen and Pablo L. Lanillos
- Abstract summary: We introduce 'object-based active inference' (OBAI) with recent deep object-based neural networks.
OBAI represents distinct objects with separate variational beliefs, and uses selective attention to route inputs to their corresponding object slots.
We show that OBAI learns to correctly segment the action-perturbed objects from video input, and to manipulate these objects towards arbitrary goals.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The world consists of objects: distinct entities possessing independent
properties and dynamics. For agents to interact with the world intelligently,
they must translate sensory inputs into the bound-together features that
describe each object. These object-based representations form a natural basis
for planning behavior. Active inference (AIF) is an influential unifying
account of perception and action, but existing AIF models have not leveraged
this important inductive bias. To remedy this, we introduce 'object-based
active inference' (OBAI), marrying AIF with recent deep object-based neural
networks. OBAI represents distinct objects with separate variational beliefs,
and uses selective attention to route inputs to their corresponding object
slots. Object representations are endowed with independent action-based
dynamics. The dynamics and generative model are learned from experience with a
simple environment (active multi-dSprites). We show that OBAI learns to
correctly segment the action-perturbed objects from video input, and to
manipulate these objects towards arbitrary goals.
Related papers
- Clink! Chop! Thud! -- Learning Object Sounds from Real-World Interactions [17.352378821998304]
We introduce the sounding object detection task to evaluate a model's ability to link these sounds to the objects directly involved.<n>Inspired by human perception, our multimodal object-aware framework learns from in-the-wild egocentric videos.
arXiv Detail & Related papers (2025-10-02T17:59:52Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.
We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.
We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection [4.938957922033169]
Out-of-distribution (OOD) detection targets to detect and reject test samples with semantic shifts.
We propose a novel Uncertainty-Guided Appearance-Motion Association Network (UAAN)
We show that UAAN beats state-of-the-art methods by a significant margin, illustrating its effectiveness.
arXiv Detail & Related papers (2024-09-16T02:53:49Z) - Which objects help me to act effectively? Reasoning about physically-grounded affordances [0.6291443816903801]
A key aspect of this understanding lies in detecting an object's affordances.
Our approach leverages a dialogue of large language models (LLMs) and vision-language models (VLMs) to achieve open-world affordance detection.
By grounding our system in the physical world, we account for the robot's embodiment and the intrinsic properties of the objects it encounters.
arXiv Detail & Related papers (2024-07-18T11:08:57Z) - Unsupervised Dynamics Prediction with Object-Centric Kinematics [22.119612406160073]
We propose Object-Centric Kinematics (OCK), a framework for dynamics prediction leveraging object-centric representations.
OCK consists of low-level structured states of objects' position, velocity, and acceleration.
Our model demonstrates superior performance when handling objects and backgrounds in complex scenes characterized by a wide range of object attributes and dynamic movements.
arXiv Detail & Related papers (2024-04-29T04:47:23Z) - Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding.
Our approach leverages both object patterns and contextual cues to produce robust features.
Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Leveraging Next-Active Objects for Context-Aware Anticipation in
Egocentric Videos [31.620555223890626]
We study the problem of Short-Term Object interaction anticipation (STA)
We propose NAOGAT, a multi-modal end-to-end transformer network, to guide the model to predict context-aware future actions.
Our model outperforms existing methods on two separate datasets.
arXiv Detail & Related papers (2023-08-16T12:07:02Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Object-Region Video Transformers [100.23380634952083]
We present Object-Region Transformers Video (ORViT), an emphobject-centric approach that extends transformer video layers with object representations.
Our ORViT block consists of two object-level streams: appearance and dynamics.
We show strong improvement in performance across all tasks and considered, demonstrating the value of a model that incorporates object representations into a transformer architecture.
arXiv Detail & Related papers (2021-10-13T17:51:46Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Plug and Play, Model-Based Reinforcement Learning [60.813074750879615]
We introduce an object-based representation that allows zero-shot integration of new objects from known object classes.
This is achieved by representing the global transition dynamics as a union of local transition functions.
Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.
arXiv Detail & Related papers (2021-08-20T01:20:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.