Related papers: Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering

Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering

URL: http://arxiv.org/abs/2105.13353v7
Date: Thu, 17 Aug 2023 07:21:53 GMT
Title: Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Authors: Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin, M. Zeeshan Zia, Quoc-Huy Tran
Abstract summary: We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task. We leverage temporal information in videos by employing temporal optimal transport. Our approach performs on par with or better than previous methods, despite having significantly less memory constraints.
Score: 10.057155889852174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel approach for unsupervised activity segmentation which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering. This is in contrast with prior works where representation learning and clustering are often performed sequentially. We leverage temporal information in videos by employing temporal optimal transport. In particular, we incorporate a temporal regularization term which preserves the temporal order of the activity into the standard optimal transport module for computing pseudo-label cluster assignments. The temporal optimal transport module enables our approach to learn effective representations for unsupervised activity segmentation. Furthermore, previous methods require storing learned features for the entire dataset before clustering them in an offline manner, whereas our approach processes one mini-batch at a time in an online manner. Extensive evaluations on three public datasets, i.e. 50-Salads, YouTube Instructions, and Breakfast, and our dataset, i.e., Desktop Assembly, show that our approach performs on par with or better than previous methods, despite having significantly less memory constraints. Our code and dataset are available on our research website: https://retrocausal.ai/research/

Related papers

Cluster-based Video Summarization with Temporal Context Awareness [9.861215740353247]
TAC-SUM is a novel and efficient training-free approach for video summarization. Our method partitions the input video into temporally consecutive segments with clustering information. The resulting temporal-aware clusters are then utilized to compute the final summary.
arXiv Detail & Related papers (2024-04-06T05:55:14Z)
Timestamp-supervised Wearable-based Activity Segmentation and Recognition with Contrastive Learning and Order-Preserving Optimal Transport [11.837401473598288]
We propose a novel method for joint activity segmentation and recognition with timestamp supervision. The prototypes are estimated by class-activation maps to form a sample-to-prototype contrast module. Comprehensive experiments on four public HAR datasets demonstrate that our model trained with timestamp supervision is superior to the state-of-the-art weakly-supervised methods.
arXiv Detail & Related papers (2023-10-13T14:00:49Z)
Contrastive Continual Multi-view Clustering with Filtered Structural Fusion [57.193645780552565]
Multi-view clustering thrives in applications where views are collected in advance. It overlooks scenarios where data views are collected sequentially, i.e., real-time data. Some methods are proposed to handle it but are trapped in a stability-plasticity dilemma. We propose Contrastive Continual Multi-view Clustering with Filtered Structural Fusion.
arXiv Detail & Related papers (2023-09-26T14:18:29Z)
TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering [27.52568444236988]
We propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes.
arXiv Detail & Related papers (2023-03-09T10:46:23Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets. We propose a novel general and efficient active learning (GEAL) method in this paper. Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)
Unsupervised Visual Representation Learning by Online Constrained K-Means [44.38989920488318]
Cluster discrimination is an effective pretext task for unsupervised representation learning. We propose a novel clustering-based pretext task with online textbfConstrained textbfK-mtextbfeans (textbfCoKe) Our online assignment method has a theoretical guarantee to approach the global optimum.
arXiv Detail & Related papers (2021-05-24T20:38:32Z)
Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos. We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training. Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Online Deep Clustering for Unsupervised Representation Learning [108.33534231219464]
Online Deep Clustering (ODC) performs clustering and network update simultaneously rather than alternatingly. We design and maintain two dynamic memory modules, i.e., samples memory to store samples labels and features, and centroids memory for centroids evolution. In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly.
arXiv Detail & Related papers (2020-06-18T16:15:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.