Timestamp-Supervised Action Segmentation with Graph Convolutional
Networks
- URL: http://arxiv.org/abs/2206.15031v1
- Date: Thu, 30 Jun 2022 05:56:24 GMT
- Title: Timestamp-Supervised Action Segmentation with Graph Convolutional
Networks
- Authors: Hamza Khan, Sanjay Haresh, Awais Ahmed, Shakeeb Siddiqui, Andrey
Konin, M. Zeeshan Zia, Quoc-Huy Tran
- Abstract summary: A graph convolutional network is learned to generate dense framewise labels from sparse timestamp labels.
The generated dense framewise labels can then be used to train the segmentation model.
Detailed experiments on four public datasets, including 50 Salads, GTEA, Breakfast, and Desktop Assembly, show that our method is superior to the multi-layer perceptron baseline.
- Score: 7.696728525672148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a novel approach for temporal activity segmentation with
timestamp supervision. Our main contribution is a graph convolutional network,
which is learned in an end-to-end manner to exploit both frame features and
connections between neighboring frames to generate dense framewise labels from
sparse timestamp labels. The generated dense framewise labels can then be used
to train the segmentation model. In addition, we propose a framework for
alternating learning of both the segmentation model and the graph convolutional
model, which first initializes and then iteratively refines the learned models.
Detailed experiments on four public datasets, including 50 Salads, GTEA,
Breakfast, and Desktop Assembly, show that our method is superior to the
multi-layer perceptron baseline, while performing on par with or better than
the state of the art in temporal activity segmentation with timestamp
supervision.
Related papers
- Unified and Dynamic Graph for Temporal Character Grouping in Long Videos [31.192044026127032]
Video temporal character grouping locates appearing moments of major characters within a video according to their identities.
Recent works have evolved from unsupervised clustering to graph-based supervised clustering.
We present a unified and dynamic graph (UniDG) framework for temporal character grouping.
arXiv Detail & Related papers (2023-08-27T13:22:55Z) - RefineVIS: Video Instance Segmentation with Temporal Attention
Refinement [23.720986152136785]
RefineVIS learns two separate representations on top of an off-the-shelf frame-level image instance segmentation model.
A Temporal Attention Refinement (TAR) module learns discriminative segmentation representations by exploiting temporal relationships.
It achieves state-of-the-art video instance segmentation accuracy on YouTube-VIS 2019 (64.4 AP), Youtube-VIS 2021 (61.4 AP), and OVIS (46.1 AP) datasets.
arXiv Detail & Related papers (2023-06-07T20:45:15Z) - TIGER: Temporal Interaction Graph Embedding with Restarts [12.685645074210562]
Temporal interaction graphs (TIGs) are prevalent in fields like e-commerce and social networks.
TIGs consist of sequences of timestamped interaction events that vary over time.
Previous methods have to process the sequence of events chronologically and consecutively to ensure node representations are up-to-date.
This prevents existing models from parallelization and reduces their flexibility in industrial applications.
arXiv Detail & Related papers (2023-02-13T02:29:11Z) - Contrastive Learning for Time Series on Dynamic Graphs [17.46524362769774]
We propose a framework called GraphTNC for unsupervised learning of joint representations of the graph and the time-series.
We show that it can prove beneficial for the classification task with real-world datasets.
arXiv Detail & Related papers (2022-09-21T21:14:28Z) - Time-aware Dynamic Graph Embedding for Asynchronous Structural Evolution [60.695162101159134]
Existing works merely view a dynamic graph as a sequence of changes.
We formulate dynamic graphs as temporal edge sequences associated with joining time of.
vertex and timespan of edges.
A time-aware Transformer is proposed to embed.
vertex' dynamic connections and ToEs into the learned.
vertex representations.
arXiv Detail & Related papers (2022-07-01T15:32:56Z) - Self-Supervised Dynamic Graph Representation Learning via Temporal
Subgraph Contrast [0.8379286663107846]
This paper proposes a self-supervised dynamic graph representation learning framework (DySubC)
DySubC defines a temporal subgraph contrastive learning task to simultaneously learn the structural and evolutional features of a dynamic graph.
Experiments on five real-world datasets demonstrate that DySubC performs better than the related baselines.
arXiv Detail & Related papers (2021-12-16T09:35:34Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - From Static to Dynamic Node Embeddings [61.58641072424504]
We introduce a general framework for leveraging graph stream data for temporal prediction-based applications.
Our proposed framework includes novel methods for learning an appropriate graph time-series representation.
We find that the top-3 temporal models are always those that leverage the new $epsilon$-graph time-series representation.
arXiv Detail & Related papers (2020-09-21T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.