PREGO: online mistake detection in PRocedural EGOcentric videos
- URL: http://arxiv.org/abs/2404.01933v2
- Date: Fri, 17 May 2024 16:03:35 GMT
- Title: PREGO: online mistake detection in PRocedural EGOcentric videos
- Authors: Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso,
- Abstract summary: We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
- Score: 49.72812518471056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Promptly identifying procedural errors from egocentric videos in an online setting is highly challenging and valuable for detecting mistakes as soon as they happen. This capability has a wide range of applications across various fields, such as manufacturing and healthcare. The nature of procedural mistakes is open-set since novel types of failures might occur, which calls for one-class classifiers trained on correctly executed procedures. However, no technique can currently detect open-set procedural mistakes online. We propose PREGO, the first online one-class classification model for mistake detection in PRocedural EGOcentric videos. PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions. Mistake detection is performed by comparing the recognized current action with the expected future one. We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection to establish suitable benchmarks, thus defining the Assembly101-O and Epic-tent-O datasets, respectively.
Related papers
- TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos [48.126793563151715]
No technique effectively detects open-set procedural mistakes online.
One branch continuously performs step recognition from the input egocentric video.
The other anticipates future steps based on the recognition module's output.
arXiv Detail & Related papers (2024-11-04T20:03:06Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - IndustReal: A Dataset for Procedure Step Recognition Handling Execution
Errors in Egocentric Videos in an Industrial-Like Setting [7.561148568365396]
We present the novel task of procedure step recognition (PSR)
PSR focuses on recognizing the correct completion and order of procedural steps.
We also present the multi-modal IndustReal dataset.
arXiv Detail & Related papers (2023-10-26T11:44:29Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - HCL-TAT: A Hybrid Contrastive Learning Method for Few-shot Event
Detection with Task-Adaptive Threshold [18.165302114575212]
We propose a novel Hybrid Contrastive Learning method with a Task-Adaptive Threshold (abbreviated as HCLTAT)
In this paper, we propose a novel Hybrid Contrastive Learning method with a Task-Adaptive Threshold (abbreviated as HCLTAT), which enables discriminative representation learning with a two-view contrastive loss.
Experiments on the benchmark dataset FewEvent demonstrate the superiority of our method to achieve better results compared to the state-of-the-arts.
arXiv Detail & Related papers (2022-10-17T07:37:38Z) - Online Dictionary Learning Based Fault and Cyber Attack Detection for
Power Systems [4.657875410615595]
This paper deals with the event and intrusion detection problem by leveraging a stream data mining classifier.
We first build a dictionary by learning higher-level features from unlabeled data.
Then, the labeled data are represented as sparse linear combinations of learned dictionary atoms.
We capitalize on those sparse codes to train the online classifier along with efficient change detectors.
arXiv Detail & Related papers (2021-08-24T23:17:58Z) - The Instantaneous Accuracy: a Novel Metric for the Problem of Online
Human Behaviour Recognition in Untrimmed Videos [9.3576825415122]
We introduce a novel online metric, the Instantaneous Accuracy ($IA$), that exhibits an emphonline nature.
Our results confirm the problems of previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario for human behaviour understanding.
arXiv Detail & Related papers (2020-03-22T19:04:05Z) - Self-trained Deep Ordinal Regression for End-to-End Video Anomaly
Detection [114.9714355807607]
We show that applying self-trained deep ordinal regression to video anomaly detection overcomes two key limitations of existing methods.
We devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
arXiv Detail & Related papers (2020-03-15T08:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.