Related papers: Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

URL: http://arxiv.org/abs/2403.02782v2
Date: Sat, 15 Jun 2024 17:55:58 GMT
Title: Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Authors: Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan,
Abstract summary: We explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. We coin our approach KEPP, a novel Knowledge-Enhanced Procedure Planning system, which harnesses a probabilistic procedural knowledge graph extracted from training data.
Score: 16.333295670635557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we explore the capability of an agent to construct a logical sequence of action steps, thereby assembling a strategic procedural plan. This plan is crucial for navigating from an initial visual observation to a target visual outcome, as depicted in real-life instructional videos. Existing works have attained partial success by extensively leveraging various sources of information available in the datasets, such as heavy intermediate visual observations, procedural names, or natural language step-by-step instructions, for features or supervision signals. However, the task remains formidable due to the implicit causal constraints in the sequencing of steps and the variability inherent in multiple feasible plans. To tackle these intricacies that previous efforts have overlooked, we propose to enhance the capabilities of the agent by infusing it with procedural knowledge. This knowledge, sourced from training procedure plans and structured as a directed weighted graph, equips the agent to better navigate the complexities of step sequencing and its potential variations. We coin our approach KEPP, a novel Knowledge-Enhanced Procedure Planning system, which harnesses a probabilistic procedural knowledge graph extracted from training data, effectively acting as a comprehensive textbook for the training domain. Experimental evaluations across three widely-used datasets under settings of varying complexity reveal that KEPP attains superior, state-of-the-art results while requiring only minimal supervision.

Related papers

On Sequential Fault-Intolerant Process Planning [60.66853798340345]
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP) SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage.
arXiv Detail & Related papers (2025-02-07T15:20:35Z)
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos [48.15438373870542]
VidAssist is an integrated framework designed for zero/few-shot goal-oriented planning in instructional videos. It employs a breadth-first search algorithm for optimal plan generation. Experiments demonstrate that VidAssist offers a unified framework for different goal-oriented planning setups.
arXiv Detail & Related papers (2024-09-30T17:57:28Z)
REVEAL-IT: REinforcement learning with Visibility of Evolving Agent poLicy for InTerpretability [23.81322529587759]
REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments. We visualize the policy structure and the agent's learning process for various training tasks. A GNN-based explainer learns to highlight the most important section of the policy, providing a more clear and robust explanation of the agent's learning process.
arXiv Detail & Related papers (2024-06-20T11:29:26Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos [13.99137623722021]
Procedural activities are sequences of key-steps aimed at achieving specific goals. Task graphs have emerged as a human-understandable representation of procedural activities.
arXiv Detail & Related papers (2024-06-03T16:11:39Z)
Procedure-Aware Pretraining for Instructional Video Understanding [58.214549181779006]
Key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge. Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks. This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form.
arXiv Detail & Related papers (2023-03-31T17:41:31Z)
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision [31.73732506824829]
We study the problem of procedure planning in instructional videos. Here, an agent must produce a plausible sequence of actions that can transform the environment from a given start to a desired goal state. We propose a weakly supervised approach by learning from natural language instructions.
arXiv Detail & Related papers (2022-05-04T19:37:32Z)
Procedure Planning in Instructional Videosvia Contextual Modeling and Model-based Policy Learning [114.1830997893756]
This work focuses on learning a model to plan goal-directed actions in real-life videos. We propose novel algorithms to model human behaviors through Bayesian Inference and model-based Imitation Learning.
arXiv Detail & Related papers (2021-10-05T01:06:53Z)
Crop-Transform-Paste: Self-Supervised Learning for Visual Tracking [137.26381337333552]
In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data. Since the object state is known in all synthesized data, existing deep trackers can be trained in routine ways without human annotation.
arXiv Detail & Related papers (2021-06-21T07:40:34Z)
Self-Imitation Learning by Planning [3.996275177789895]
Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge. In long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data. We propose self-imitation learning by planning (SILP), where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner.
arXiv Detail & Related papers (2021-03-25T13:28:38Z)
Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process. Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved. We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.