Goal-directed Planning and Goal Understanding by Active Inference:
Evaluation Through Simulated and Physical Robot Experiments
- URL: http://arxiv.org/abs/2202.09976v1
- Date: Mon, 21 Feb 2022 03:48:35 GMT
- Title: Goal-directed Planning and Goal Understanding by Active Inference:
Evaluation Through Simulated and Physical Robot Experiments
- Authors: Takazumi Matsumoto, Wataru Ohata, Fabien C. Y. Benureau and Jun Tani
- Abstract summary: We show that goal-directed action planning can be formulated using the free energy principle.
The proposed model is built on a variational recurrent neural network model.
- Score: 3.7660066212240757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We show that goal-directed action planning and generation in a teleological
framework can be formulated using the free energy principle. The proposed
model, which is built on a variational recurrent neural network model, is
characterized by three essential features. These are that (1) goals can be
specified for both static sensory states, e.g., for goal images to be reached
and dynamic processes, e.g., for moving around an object, (2) the model can not
only generate goal-directed action plans, but can also understand goals by
sensory observation, and (3) the model generates future action plans for given
goals based on the best estimate of the current state, inferred using past
sensory observations. The proposed model is evaluated by conducting experiments
on a simulated mobile agent as well as on a real humanoid robot performing
object manipulation.
Related papers
- Human-level 3D shape perception emerges from multi-view learning [63.048728487674815]
We develop a modeling framework that predicts human 3D shape inferences for arbitrary objects.<n>We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data.<n>We find that human-level 3D perception can emerge from a simple, scalable learning objective over naturalistic visual-spatial data.
arXiv Detail & Related papers (2026-02-19T18:56:05Z) - Causal World Modeling for Robot Control [56.31803788587547]
Video world models provide the ability to imagine the near future by understanding the causality between actions and visual dynamics.<n>We introduce LingBot-VA, an autoregressive diffusion framework that learns frame prediction and policy execution simultaneously.<n>We evaluate our model on both simulation benchmarks and real-world scenarios, where it shows significant promise in long-horizon manipulation, data efficiency in post-training, and strong generalizability to novel configurations.
arXiv Detail & Related papers (2026-01-29T17:07:43Z) - Envision: Embodied Visual Planning via Goal-Imagery Video Diffusion [61.63215708592008]
Embodied visual planning aims to enable manipulation tasks by imagining how a scene evolves toward a desired goal.<n>Video diffusion models provide a promising foundation for such visual imagination.<n>We propose Envision, a diffusion-based framework that performs visual planning for embodied agents.
arXiv Detail & Related papers (2025-12-27T15:46:41Z) - Decoupled Generative Modeling for Human-Object Interaction Synthesis [35.78156236836254]
Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network.<n>We propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI)<n>A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions.
arXiv Detail & Related papers (2025-12-22T05:33:59Z) - Do-Undo: Generating and Reversing Physical Actions in Vision-Language Models [57.71440995598757]
We introduce the Do-Undo task and benchmark to address a critical gap in vision-language models.<n>Do-Undo requires models to simulate the outcome of a physical action and then accurately reverse it, reflecting true cause-and-effect in the visual world.
arXiv Detail & Related papers (2025-12-15T18:03:42Z) - Goal-Directedness is in the Eye of the Beholder [48.937781898861815]
Probing for goal-directed behavior comes in two flavors: Behavioral and mechanistic.<n>We identify technical and conceptual problems that arise from formalizing goals in agent systems.<n>We outline new directions for modeling goal-directedness as an emergent property of dynamic, multi-agent systems.
arXiv Detail & Related papers (2025-08-18T11:04:18Z) - Continuous-Time SO(3) Forecasting with Savitzky--Golay Neural Controlled Differential Equations [51.510040541600176]
This work proposes modeling continuous-time rotational object dynamics on $SO(3)$.<n>Unlike existing methods that rely on simplified motion assumptions, our method learns a general latent dynamical system of the underlying object trajectory.<n> Experimental results on real-world data demonstrate compelling forecasting capabilities compared to existing approaches.
arXiv Detail & Related papers (2025-06-07T12:41:50Z) - Learning Coordinated Bimanual Manipulation Policies using State Diffusion and Inverse Dynamics Models [22.826115023573205]
We infuse the predictive nature of human manipulation strategies into robot imitation learning.
We train a diffusion model to predict future states and compute robot actions that achieve the predicted states.
Our framework consistently outperforms state-of-the-art state-to-action mapping policies.
arXiv Detail & Related papers (2025-03-30T01:25:35Z) - WANDR: Intention-guided Human Motion Generation [67.07028110459787]
We introduce WANDR, a data-driven model that takes an avatar's initial pose and a goal's 3D position and generates natural human motions that place the end effector (wrist) on the goal location.
Intention guides the agent to the goal, and interactively adapts the generation to novel situations without needing to define sub-goals or the entire motion path.
We evaluate our method extensively and demonstrate its ability to generate natural and long-term motions that reach 3D goals and to unseen goal locations.
arXiv Detail & Related papers (2024-04-23T10:20:17Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects [14.034256001448574]
We propose a vision-based system that learns to predict the potential motions of the parts of a variety of articulated objects.
We deploy an analytical motion planner based on this vector field to achieve a policy that yields maximum articulation.
Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments.
arXiv Detail & Related papers (2022-05-09T15:35:33Z) - Modeling human intention inference in continuous 3D domains by inverse
planning and body kinematics [31.421686048250827]
We describe a computational framework for evaluating models of goal inference in the domain of 3D motor actions.
We evaluate our framework in three behavioural experiments using a novel Target Reaching Task, in which human observers infer intentions of actors reaching for targets among distracts.
We show that human observers indeed rely on inverse body kinematics in such scenarios, suggesting that modeling body kinematic can improve performance of inference algorithms.
arXiv Detail & Related papers (2021-12-02T00:55:58Z) - Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Hierarchical Object-to-Zone Graph for Object Navigation [43.558927774552295]
In the unseen environment, when the target object is not in egocentric view, the agent may not be able to make wise decisions.
We propose a hierarchical object-to-zone (HOZ) graph to guide the agent in a coarse-to-fine manner.
Online-learning mechanism is also proposed to update HOZ according to the real-time observation in new environments.
arXiv Detail & Related papers (2021-09-05T13:02:17Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - Enhancing a Neurocognitive Shared Visuomotor Model for Object
Identification, Localization, and Grasping With Learning From Auxiliary Tasks [0.0]
We present a follow-up study on our unified visuomotor neural model for the robotic tasks of identifying, localizing, and grasping a target object in a scene with multiple objects.
Our Retinanet-based model enables end-to-end training of visuomotor abilities in a biologically inspired developmental approach.
arXiv Detail & Related papers (2020-09-26T19:45:15Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Goal-Directed Planning for Habituated Agents by Active Inference Using a
Variational Recurrent Neural Network [5.000272778136268]
This study shows that the predictive coding (PC) and active inference (AIF) frameworks can develop better generalization by learning a prior distribution in a low dimensional latent state space.
In our proposed model, learning is carried out by inferring optimal latent variables as well as synaptic weights for maximizing the evidence lower bound.
Our proposed model was evaluated with both simple and complex robotic tasks in simulation, which demonstrated sufficient generalization in learning with limited training data.
arXiv Detail & Related papers (2020-05-27T06:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.