Leveraging triplet loss for unsupervised action segmentation
- URL: http://arxiv.org/abs/2304.06403v2
- Date: Wed, 19 Jul 2023 10:12:58 GMT
- Title: Leveraging triplet loss for unsupervised action segmentation
- Authors: E. Bueno-Benito, B. Tura, M. Dimiccoli
- Abstract summary: We propose a fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself.
Our method is a deep metric learning approach rooted in a shallow network with a triplet loss operating on similarity distributions.
Under these circumstances, we successfully recover temporal boundaries in the learned action representations with higher quality compared with existing unsupervised approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a novel fully unsupervised framework that learns
action representations suitable for the action segmentation task from the
single input video itself, without requiring any training data. Our method is a
deep metric learning approach rooted in a shallow network with a triplet loss
operating on similarity distributions and a novel triplet selection strategy
that effectively models temporal and semantic priors to discover actions in the
new representational space. Under these circumstances, we successfully recover
temporal boundaries in the learned action representations with higher quality
compared with existing unsupervised approaches. The proposed method is
evaluated on two widely used benchmark datasets for the action segmentation
task and it achieves competitive performance by applying a generic clustering
algorithm on the learned representations.
Related papers
- Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal
Action Localization [98.66318678030491]
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
We propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages.
arXiv Detail & Related papers (2023-05-29T02:48:04Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Semi-supervised New Event Type Induction and Description via Contrastive
Loss-Enforced Batch Attention [56.46649994444616]
We present a novel approach to semi-supervised new event type induction using a masked contrastive loss.
We extend our approach to two new tasks: predicting the type name of the discovered clusters and linking them to FrameNet frames.
arXiv Detail & Related papers (2022-02-12T00:32:22Z) - Hierarchical Modeling for Task Recognition and Action Segmentation in
Weakly-Labeled Instructional Videos [6.187780920448871]
This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos.
We propose a two-stream framework, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos.
We present a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences.
arXiv Detail & Related papers (2021-10-12T02:32:15Z) - Flip Learning: Erase to Segment [65.84901344260277]
Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation.
We propose a novel and general WSS framework called Flip Learning, which only needs the box annotation.
Our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.
arXiv Detail & Related papers (2021-08-02T09:56:10Z) - Action Shuffling for Weakly Supervised Temporal Localization [22.43209053892713]
This paper analyzes the order-sensitive and location-insensitive properties of actions.
It embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance.
arXiv Detail & Related papers (2021-05-10T09:05:58Z) - Subspace Clustering for Action Recognition with Covariance
Representations and Temporal Pruning [20.748083855677816]
This paper tackles the problem of human action recognition, defined as classifying which action is displayed in a trimmed sequence, from skeletal data.
We propose a novel subspace clustering method, which exploits covariance matrix to enhance the action's discriminability and a timestamp pruning approach that allow us to better handle the temporal dimension of the data.
arXiv Detail & Related papers (2020-06-21T14:44:03Z) - On Evaluating Weakly Supervised Action Segmentation Methods [79.42955857919497]
We focus on two aspects of the use and evaluation of weakly supervised action segmentation approaches.
We train each method on the Breakfast dataset 5 times and provide average and standard deviation of the results.
Our experiments show that the standard deviation over these repetitions is between 1 and 2.5% and significantly affects the comparison between different approaches.
arXiv Detail & Related papers (2020-05-19T20:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.