Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
- URL: http://arxiv.org/abs/2406.01486v2
- Date: Mon, 28 Oct 2024 18:56:49 GMT
- Title: Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
- Authors: Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari,
- Abstract summary: Procedural activities are sequences of key-steps aimed at achieving specific goals.
Task graphs have emerged as a human-understandable representation of procedural activities.
- Score: 13.99137623722021
- License:
- Abstract: Procedural activities are sequences of key-steps aimed at achieving specific goals. They are crucial to build intelligent agents able to assist users effectively. In this context, task graphs have emerged as a human-understandable representation of procedural activities, encoding a partial ordering over the key-steps. While previous works generally relied on hand-crafted procedures to extract task graphs from videos, in this paper, we propose an approach based on direct maximum likelihood optimization of edges' weights, which allows gradient-based learning of task graphs and can be naturally plugged into neural network architectures. Experiments on the CaptainCook4D dataset demonstrate the ability of our approach to predict accurate task graphs from the observation of action sequences, with an improvement of +16.7% over previous approaches. Owing to the differentiability of the proposed framework, we also introduce a feature-based approach, aiming to predict task graphs from key-step textual or video embeddings, for which we observe emerging video understanding abilities. Task graphs learned with our approach are also shown to significantly enhance online mistake detection in procedural egocentric videos, achieving notable gains of +19.8% and +7.5% on the Assembly101-O and EPIC-Tent-O datasets. Code for replicating experiments is available at https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.
Related papers
- Instance-Aware Graph Prompt Learning [71.26108600288308]
We introduce Instance-Aware Graph Prompt Learning (IA-GPL) in this paper.
The process involves generating intermediate prompts for each instance using a lightweight architecture.
Experiments conducted on multiple datasets and settings showcase the superior performance of IA-GPL compared to state-of-the-art baselines.
arXiv Detail & Related papers (2024-11-26T18:38:38Z) - Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention [10.149523817328921]
We introduce the Gaze-guided Action Anticipation algorithm, which establishes a visual-semantic graph from the video input.
Our method utilizes a Graph Neural Network to recognize the agent's intention and predict the action sequence to fulfill this intention.
Our method outperforms state-of-the-art techniques, achieving a 7% improvement in accuracy for 18-class intention recognition.
arXiv Detail & Related papers (2024-04-10T21:03:23Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - Video-Mined Task Graphs for Keystep Recognition in Instructional Videos [71.16703750980143]
Procedural activity understanding requires perceiving human actions in terms of a broader task.
We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps.
We show the impact: more reliable zero-shot keystep localization and improved video representation learning.
arXiv Detail & Related papers (2023-07-17T18:19:36Z) - Unsupervised Task Graph Generation from Instructional Video Transcripts [53.54435048879365]
We consider a setting where text transcripts of instructional videos performing a real-world activity are provided.
The goal is to identify the key steps relevant to the task as well as the dependency relationship between these key steps.
We propose a novel task graph generation approach that combines the reasoning capabilities of instruction-tuned language models along with clustering and ranking components.
arXiv Detail & Related papers (2023-02-17T22:50:08Z) - A Comprehensive Analytical Survey on Unsupervised and Semi-Supervised
Graph Representation Learning Methods [4.486285347896372]
This survey aims to evaluate all major classes of graph embedding methods.
We organized graph embedding techniques using a taxonomy that includes methods from manual feature engineering, matrix factorization, shallow neural networks, and deep graph convolutional networks.
We designed experiments on top of PyTorch Geometric and DGL libraries and run experiments on different multicore CPU and GPU platforms.
arXiv Detail & Related papers (2021-12-20T07:50:26Z) - Self-supervised Auxiliary Learning for Graph Neural Networks via
Meta-Learning [16.847149163314462]
We propose a novel self-supervised auxiliary learning framework to effectively learn graph neural networks.
Our method is learning to learn a primary task with various auxiliary tasks to improve generalization performance.
Our methods can be applied to any graph neural networks in a plug-in manner without manual labeling or additional data.
arXiv Detail & Related papers (2021-03-01T05:52:57Z) - Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous
Graphs [21.617020380894488]
We propose a novel self-supervised auxiliary learning method to learn graph neural networks on heterogeneous graphs.
Our method can be applied to any graph neural networks in a plug-in manner without manual labeling or additional data.
arXiv Detail & Related papers (2020-07-16T12:32:11Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.