Unsupervised Action Segmentation by Joint Representation Learning and
Online Clustering
- URL: http://arxiv.org/abs/2105.13353v7
- Date: Thu, 17 Aug 2023 07:21:53 GMT
- Title: Unsupervised Action Segmentation by Joint Representation Learning and
Online Clustering
- Authors: Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan
Zia, Quoc-Huy Tran
- Abstract summary: We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task.
We leverage temporal information in videos by employing temporal optimal transport.
Our approach performs on par with or better than previous methods, despite having significantly less memory constraints.
- Score: 10.057155889852174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel approach for unsupervised activity segmentation which uses
video frame clustering as a pretext task and simultaneously performs
representation learning and online clustering. This is in contrast with prior
works where representation learning and clustering are often performed
sequentially. We leverage temporal information in videos by employing temporal
optimal transport. In particular, we incorporate a temporal regularization term
which preserves the temporal order of the activity into the standard optimal
transport module for computing pseudo-label cluster assignments. The temporal
optimal transport module enables our approach to learn effective
representations for unsupervised activity segmentation. Furthermore, previous
methods require storing learned features for the entire dataset before
clustering them in an offline manner, whereas our approach processes one
mini-batch at a time in an online manner. Extensive evaluations on three public
datasets, i.e. 50-Salads, YouTube Instructions, and Breakfast, and our dataset,
i.e., Desktop Assembly, show that our approach performs on par with or better
than previous methods, despite having significantly less memory constraints.
Our code and dataset are available on our research website:
https://retrocausal.ai/research/
Related papers
- Cluster-based Video Summarization with Temporal Context Awareness [9.861215740353247]
TAC-SUM is a novel and efficient training-free approach for video summarization.
Our method partitions the input video into temporally consecutive segments with clustering information.
The resulting temporal-aware clusters are then utilized to compute the final summary.
arXiv Detail & Related papers (2024-04-06T05:55:14Z) - Timestamp-supervised Wearable-based Activity Segmentation and
Recognition with Contrastive Learning and Order-Preserving Optimal Transport [11.837401473598288]
We propose a novel method for joint activity segmentation and recognition with timestamp supervision.
The prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module.
Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods.
arXiv Detail & Related papers (2023-10-13T14:00:49Z) - Contrastive Continual Multi-view Clustering with Filtered Structural
Fusion [57.193645780552565]
Multi-view clustering thrives in applications where views are collected in advance.
It overlooks scenarios where data views are collected sequentially, i.e., real-time data.
Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma.
We propose Contrastive Continual Multi-view Clustering with Filtered Structural Fusion.
arXiv Detail & Related papers (2023-09-26T14:18:29Z) - TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and
Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences.
In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning.
Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Unsupervised Visual Representation Learning by Online Constrained
K-Means [44.38989920488318]
Cluster discrimination is an effective pretext task for unsupervised representation learning.
We propose a novel clustering-based pretext task with online textbfConstrained textbfK-mtextbfeans (textbfCoKe)
Our online assignment method has a theoretical guarantee to approach the global optimum.
arXiv Detail & Related papers (2021-05-24T20:38:32Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Online Deep Clustering for Unsupervised Representation Learning [108.33534231219464]
Online Deep Clustering (ODC) performs clustering and network update simultaneously rather than alternatingly.
We design and maintain two dynamic memory modules, i.e., samples memory to store samples labels and features, and centroids memory for centroids evolution.
In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly.
arXiv Detail & Related papers (2020-06-18T16:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.