United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure
Learning from Videos
- URL: http://arxiv.org/abs/2311.03550v1
- Date: Mon, 6 Nov 2023 21:33:56 GMT
- Title: United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure
Learning from Videos
- Authors: Siddhant Bansal, Chetan Arora, C.V. Jawahar
- Abstract summary: Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task.
This makes key-steps discovery challenging as the algorithms lack inter-videos perspective.
We propose an unsupervised Graph-based Procedure Learning framework that represents all the videos of a task as a graph to obtain both intra-video and inter-videos context.
- Score: 37.53372462270059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given multiple videos of the same task, procedure learning addresses
identifying the key-steps and determining their order to perform the task. For
this purpose, existing approaches use the signal generated from a pair of
videos. This makes key-steps discovery challenging as the algorithms lack
inter-videos perspective. Instead, we propose an unsupervised Graph-based
Procedure Learning (GPL) framework. GPL consists of the novel UnityGraph that
represents all the videos of a task as a graph to obtain both intra-video and
inter-videos context. Further, to obtain similar embeddings for the same
key-steps, the embeddings of UnityGraph are updated in an unsupervised manner
using the Node2Vec algorithm. Finally, to identify the key-steps, we cluster
the embeddings using KMeans. We test GPL on benchmark ProceL, CrossTask, and
EgoProceL datasets and achieve an average improvement of 2% on third-person
datasets and 3.6% on EgoProceL over the state-of-the-art.
Related papers
- Local Structure-aware Graph Contrastive Representation Learning [12.554113138406688]
We propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views.
For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level.
For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level.
arXiv Detail & Related papers (2023-08-07T03:23:46Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z) - Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z) - End-To-End Graph-based Deep Semi-Supervised Learning [7.151859287072378]
The quality of a graph is determined jointly by three key factors of the graph nodes, edges and similarity measure (or edge weights)
We propose a novel graph-based semi-supervised learning approach to optimize all three factors simultaneously in an end-to-end learning fashion.
arXiv Detail & Related papers (2020-02-23T12:32:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.