United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure
Learning from Videos
- URL: http://arxiv.org/abs/2311.03550v1
- Date: Mon, 6 Nov 2023 21:33:56 GMT
- Title: United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure
Learning from Videos
- Authors: Siddhant Bansal, Chetan Arora, C.V. Jawahar
- Abstract summary: Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task.
This makes key-steps discovery challenging as the algorithms lack inter-videos perspective.
We propose an unsupervised Graph-based Procedure Learning framework that represents all the videos of a task as a graph to obtain both intra-video and inter-videos context.
- Score: 37.53372462270059
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given multiple videos of the same task, procedure learning addresses
identifying the key-steps and determining their order to perform the task. For
this purpose, existing approaches use the signal generated from a pair of
videos. This makes key-steps discovery challenging as the algorithms lack
inter-videos perspective. Instead, we propose an unsupervised Graph-based
Procedure Learning (GPL) framework. GPL consists of the novel UnityGraph that
represents all the videos of a task as a graph to obtain both intra-video and
inter-videos context. Further, to obtain similar embeddings for the same
key-steps, the embeddings of UnityGraph are updated in an unsupervised manner
using the Node2Vec algorithm. Finally, to identify the key-steps, we cluster
the embeddings using KMeans. We test GPL on benchmark ProceL, CrossTask, and
EgoProceL datasets and achieve an average improvement of 2% on third-person
datasets and 3.6% on EgoProceL over the state-of-the-art.
Related papers
- Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [86.15347226865826]
We design a new end-to-end object-aware lifting approach, named Unified-Lift.
We augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information.
We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms.
arXiv Detail & Related papers (2025-03-18T08:42:23Z) - Scalable Deep Metric Learning on Attributed Graphs [10.092560681589578]
We develop a graph embedding method, which is based on extending deep metric and unbiased contrastive learning techniques.
Based on a multi-classt loss function, we present two algorithms -- DMT for semi-supervised learning and DMAT-i for the unsupervised case.
arXiv Detail & Related papers (2024-11-20T03:34:31Z) - A Structure-Aware Framework for Learning Device Placements on Computation Graphs [15.282882425920064]
We propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit.
The framework consists of five steps, including graph coarsening, node representation learning and policy optimization.
To train the entire framework, we use reinforcement learning using the execution time of the placement as a reward.
arXiv Detail & Related papers (2024-05-23T05:29:29Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - SGAligner : 3D Scene Alignment with Scene Graphs [84.01002998166145]
Building 3D scene graphs has emerged as a topic in scene representation for several embodied AI applications.
We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial.
We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios.
arXiv Detail & Related papers (2023-04-28T14:39:22Z) - GraphLearner: Graph Node Clustering with Fully Learnable Augmentation [76.63963385662426]
Contrastive deep graph clustering (CDGC) leverages the power of contrastive learning to group nodes into different clusters.
We propose a Graph Node Clustering with Fully Learnable Augmentation, termed GraphLearner.
It introduces learnable augmentors to generate high-quality and task-specific augmented samples for CDGC.
arXiv Detail & Related papers (2022-12-07T10:19:39Z) - Collaborative Propagation on Multiple Instance Graphs for 3D Instance
Segmentation with Single-point Supervision [63.429704654271475]
We propose a novel weakly supervised method RWSeg that only requires labeling one object with one point.
With these sparse weak labels, we introduce a unified framework with two branches to propagate semantic and instance information.
Specifically, we propose a Cross-graph Competing Random Walks (CRW) algorithm that encourages competition among different instance graphs.
arXiv Detail & Related papers (2022-08-10T02:14:39Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.