Related papers: Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories

Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories

URL: http://arxiv.org/abs/2602.15154v1
Date: Mon, 16 Feb 2026 19:53:58 GMT
Title: Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories
Authors: Praditha Alwis, Soumyadeep Chandra, Deepak Ravikumar, Kaushik Roy,
Abstract summary: We propose a model-agnostic method for detecting annotation errors in video datasets.<n>Our method does not require ground truth on annotation errors and is generalizable across datasets.<n>EgoPER and Cholec80 demonstrate strong detection performance, effectively identifying subtle inconsistencies such as mislabeling and frame disordering.
Score: 11.597228102492672
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-quality video datasets are foundational for training robust models in tasks like action recognition, phase detection, and event segmentation. However, many real-world video datasets suffer from annotation errors such as *mislabeling*, where segments are assigned incorrect class labels, and *disordering*, where the temporal sequence does not follow the correct progression. These errors are particularly harmful in phase-annotated tasks, where temporal consistency is critical. We propose a novel, model-agnostic method for detecting annotation errors by analyzing the Cumulative Sample Loss (CSL)--defined as the average loss a frame incurs when passing through model checkpoints saved across training epochs. This per-frame loss trajectory acts as a dynamic fingerprint of frame-level learnability. Mislabeled or disordered frames tend to show consistently high or irregular loss patterns, as they remain difficult for the model to learn throughout training, while correctly labeled frames typically converge to low loss early. To compute CSL, we train a video segmentation model and store its weights at each epoch. These checkpoints are then used to evaluate the loss of each frame in a test video. Frames with persistently high CSL are flagged as likely candidates for annotation errors, including mislabeling or temporal misalignment. Our method does not require ground truth on annotation errors and is generalizable across datasets. Experiments on EgoPER and Cholec80 demonstrate strong detection performance, effectively identifying subtle inconsistencies such as mislabeling and frame disordering. The proposed approach provides a powerful tool for dataset auditing and improving training reliability in video-based machine learning.

Related papers

Hard Samples, Bad Labels: Robust Loss Functions That Know When to Back Off [1.1582652820340928]
Incorrectly labelled training data adversely affects the performance and generalizability of models trained through supervised learning.<n>We evaluate two novel loss functions, Blurry Loss and Piecewise-zero Loss, that enhance robustness to label errors by de-weighting or disregarding difficult-to-classify samples.
arXiv Detail & Related papers (2025-11-20T16:30:55Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Semi-Supervised End-To-End Contrastive Learning For Time Series Classification [9.46287383389573]
Time series classification is a critical task in various domains, such as finance, healthcare, and sensor data analysis.<n>We propose an end-to-end model called SLOTS (Semi-supervised Learning fOr Time clasSification)
arXiv Detail & Related papers (2023-10-13T04:22:21Z)
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. We introduce the bilevel paradigm to model the above latent correspondence. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
PULL: Reactive Log Anomaly Detection Based On Iterative PU Learning [58.85063149619348]
We propose PULL, an iterative log analysis method for reactive anomaly detection based on estimated failure time windows. Our evaluation shows that PULL consistently outperforms ten benchmark baselines across three different datasets.
arXiv Detail & Related papers (2023-01-25T16:34:43Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Anomaly detection in surveillance videos using transformer based attention model [3.2968779106235586]
This research suggests using a weakly supervised strategy to avoid annotating anomalous segments in training videos. The proposed framework is validated on real-world dataset i.e. ShanghaiTech Campus dataset.
arXiv Detail & Related papers (2022-06-03T12:19:39Z)
Semi-Supervised Learning for Sparsely-Labeled Sequential Data: Application to Healthcare Video Processing [0.8312466807725921]
We propose a semi-supervised machine learning training strategy to improve event detection performance on sequential data. Our method uses noisy guesses of the events' end times to train event detection models. We show that our strategy outperforms conservative estimates by 12 points of mean average precision for MNIST, and 3.5 points for CIFAR.
arXiv Detail & Related papers (2020-11-28T09:54:44Z)
A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video [120.18562044084678]
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. We propose a background-agnostic framework that learns from training videos containing only normal events.
arXiv Detail & Related papers (2020-08-27T18:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.