Related papers: Improving action segmentation via explicit similarity measurement

Improving action segmentation via explicit similarity measurement

URL: http://arxiv.org/abs/2502.10713v1
Date: Sat, 15 Feb 2025 08:02:38 GMT
Title: Improving action segmentation via explicit similarity measurement
Authors: Kamel Aouaidjia, Wenhao Zhang, Aofan Li, Chongsheng Zhang,
Abstract summary: We propose an explicit similarity evaluation across frames and predictions to enhance the segmentation accuracy.<n>Our supervised learning architecture uses frame-level multi-resolution features as input to Transformer encoders.<n>We apply a newly proposed boundary correction algorithm that operates based on feature similarity between consecutive frames.<n>We also propose a fully unsupervised boundary detection-correction that identifies segment boundaries based solely on feature similarity without any training.
Score: 5.303583360581161
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing supervised action segmentation methods depend on the quality of frame-wise classification using attention mechanisms or temporal convolutions to capture temporal dependencies. Even boundary detection-based methods primarily depend on the accuracy of an initial frame-wise classification, which can overlook precise identification of segments and boundaries in case of low-quality prediction. To address this problem, this paper proposes ASESM (Action Segmentation via Explicit Similarity Measurement) to enhance the segmentation accuracy by incorporating explicit similarity evaluation across frames and predictions. Our supervised learning architecture uses frame-level multi-resolution features as input to multiple Transformer encoders. The resulting multiple frame-wise predictions are used for similarity voting to obtain high quality initial prediction. We apply a newly proposed boundary correction algorithm that operates based on feature similarity between consecutive frames to adjust the boundary locations iteratively through the learning process. The corrected prediction is then further refined through multiple stages of temporal convolutions. As post-processing, we optionally apply boundary correction again followed by a segment smoothing method that removes outlier classes within segments using similarity measurement between consecutive predictions. Additionally, we propose a fully unsupervised boundary detection-correction algorithm that identifies segment boundaries based solely on feature similarity without any training. Experiments on 50Salads, GTEA, and Breakfast datasets show the effectiveness of both the supervised and unsupervised algorithms. Code and models are made available on Github.

Related papers

A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.<n>We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.<n> Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation [84.78383981697377]
Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue. We develop a novel local-global attention mechanism with temporal pyramid dilation and temporal pyramid pooling for efficient multi-scale attention. We achieve state-of-the-art accuracy, eg, 82.8% (+2.6%) on GTEA and 74.7% (+1.2%) on Breakfast, which demonstrates the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-04-04T20:27:18Z)
RankSEG: A Consistent Ranking-based Framework for Segmentation [5.166970737490847]
We establish a theoretical foundation of segmentation with respect to the Dice/IoU metrics, including the Bayes rule and Dice-/IoU-calibration. We propose a novel consistent ranking-based framework, namely RankDice/RankIoU, inspired by plug-in rules of the Bayes segmentation rule.
arXiv Detail & Related papers (2022-06-27T07:12:31Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field. We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network. An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z)
Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views. Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z)
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation [33.35220574193796]
We propose a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e.g. at the phoneme level. A differentiable boundary detector finds variable-length segments, which are then used to optimize a segment encoder via NCE. We show that our single model outperforms existing phoneme and word segmentation methods on TIMIT and Buckeye datasets.
arXiv Detail & Related papers (2021-06-03T23:12:05Z)
Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps [55.94785248905853]
We propose a novel guided interactive segmentation (GIS) algorithm for video objects to improve the segmentation accuracy and reduce the interaction time. We develop the intersection-aware propagation module to propagate segmentation results to neighboring frames. Experimental results demonstrate that the proposed algorithm provides more accurate segmentation results at a faster speed than conventional algorithms.
arXiv Detail & Related papers (2021-04-21T07:08:57Z)
Learning structure-aware semantic segmentation with image-level supervision [36.40302533324508]
We argue that the lost structure information in CAM limits its application in downstream semantic segmentation. We introduce an auxiliary semantic boundary detection module, which penalizes the deteriorated predictions. Experimental results on the PASCAL-VOC dataset illustrate the effectiveness of the proposed solution.
arXiv Detail & Related papers (2021-04-15T03:33:20Z)
Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering [14.074732867392008]
The usage of convolutional neural networks (CNNs) for unsupervised image segmentation was investigated in this study. We present a novel end-to-end network of unsupervised image segmentation that consists of normalization and an argmax function for differentiable clustering. Third, we present an extension of the proposed method for segmentation with scribbles as user input, which showed better accuracy than existing methods.
arXiv Detail & Related papers (2020-07-20T10:28:36Z)
Fast Template Matching and Update for Video Object Tracking and Segmentation [56.465510428878]
The main task we aim to tackle is the multi-instance semi-supervised video object segmentation across a sequence of frames. The challenges lie in the selection of the matching method to predict the result as well as to decide whether to update the target template. We propose a novel approach which utilizes reinforcement learning to make these two decisions at the same time.
arXiv Detail & Related papers (2020-04-16T08:58:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.