Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning
- URL: http://arxiv.org/abs/2110.01770v1
- Date: Tue, 5 Oct 2021 01:06:53 GMT
- Title: Procedure Planning in Instructional Videosvia Contextual Modeling and
Model-based Policy Learning
- Authors: Jing Bi, Jiebo Luo, Chenliang Xu
- Abstract summary: This work focuses on learning a model to plan goal-directed actions in real-life videos.
We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
- Score: 114.1830997893756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning new skills by observing humans' behaviors is an essential capability
of AI. In this work, we leverage instructional videos to study humans'
decision-making processes, focusing on learning a model to plan goal-directed
actions in real-life videos. In contrast to conventional action recognition,
goal-directed actions are based on expectations of their outcomes requiring
causal knowledge of potential consequences of actions. Thus, integrating the
environment structure with goals is critical for solving this task. Previous
works learn a single world model will fail to distinguish various tasks,
resulting in an ambiguous latent space; planning through it will gradually
neglect the desired outcomes since the global information of the future goal
degrades quickly as the procedure evolves. We address these limitations with a
new formulation of procedure planning and propose novel algorithms to model
human behaviors through Bayesian Inference and model-based Imitation Learning.
Experiments conducted on real-world instructional videos show that our method
can achieve state-of-the-art performance in reaching the indicated goals.
Furthermore, the learned contextual information presents interesting features
for planning in a latent space.
Related papers
- Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making.
Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations.
Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z) - Dynamic planning in hierarchical active inference [0.0]
We refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions.
This study focuses on the topic of dynamic planning in active inference.
arXiv Detail & Related papers (2024-02-18T17:32:53Z) - Discovering Temporally-Aware Reinforcement Learning Algorithms [42.016150906831776]
We propose a simple augmentation to two existing objective discovery approaches.
We find that commonly used meta-gradient approaches fail to discover adaptive objective functions.
arXiv Detail & Related papers (2024-02-08T17:07:42Z) - Robotic Imitation of Human Actions [16.26334759935617]
We introduce a new approach to imitation learning that tackles the challenges of a robot imitating a human.
Our approach can use a single human demonstration to abstract information about the demonstrated task, and use that information to generalise and replicate it.
arXiv Detail & Related papers (2024-01-16T14:11:54Z) - PALM: Predicting Actions through Language Models [74.10147822693791]
We introduce PALM, an approach that tackles the task of long-term action anticipation.
Our method incorporates an action recognition model to track previous action sequences and a vision-language model to articulate relevant environmental details.
Our experimental results demonstrate that PALM surpasses the state-of-the-art methods in the task of long-term action anticipation.
arXiv Detail & Related papers (2023-11-29T02:17:27Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Towards Interpretable Deep Reinforcement Learning Models via Inverse
Reinforcement Learning [27.841725567976315]
We propose a novel framework utilizing Adversarial Inverse Reinforcement Learning.
This framework provides global explanations for decisions made by a Reinforcement Learning model.
We capture intuitive tendencies that the model follows by summarizing the model's decision-making process.
arXiv Detail & Related papers (2022-03-30T17:01:59Z) - Behavior Priors for Efficient Reinforcement Learning [97.81587970962232]
We consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors.
We discuss how such latent variable formulations connect to related work on hierarchical reinforcement learning (HRL) and mutual information and curiosity based objectives.
We demonstrate the effectiveness of our framework by applying it to a range of simulated continuous control domains.
arXiv Detail & Related papers (2020-10-27T13:17:18Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.