Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
- URL: http://arxiv.org/abs/2503.22405v2
- Date: Wed, 02 Apr 2025 04:50:12 GMT
- Title: Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
- Authors: Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia, Yu-Ming Tang, Kun-Yu Lin, Jian-Fang Hu, Wei-Shi Zheng,
- Abstract summary: We propose an Adaptive Multiple Normal Action Representation (AMNAR) framework for error detection in procedural activities.<n>AMNAR predicts all valid next actions and reconstructs their corresponding normal action representations, which are compared against the ongoing action to detect errors.
- Score: 31.6874866836856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Error detection in procedural activities is essential for consistent and correct outcomes in AR-assisted and robotic systems. Existing methods often focus on temporal ordering errors or rely on static prototypes to represent normal actions. However, these approaches typically overlook the common scenario where multiple, distinct actions are valid following a given sequence of executed actions. This leads to two issues: (1) the model cannot effectively detect errors using static prototypes when the inference environment or action execution distribution differs from training; and (2) the model may also use the wrong prototypes to detect errors if the ongoing action label is not the same as the predicted one. To address this problem, we propose an Adaptive Multiple Normal Action Representation (AMNAR) framework. AMNAR predicts all valid next actions and reconstructs their corresponding normal action representations, which are compared against the ongoing action to detect errors. Extensive experiments demonstrate that AMNAR achieves state-of-the-art performance, highlighting the effectiveness of AMNAR and the importance of modeling multiple valid next actions in error detection. The code is available at https://github.com/iSEE-Laboratory/AMNAR.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.<n>We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos.
PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions.
We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z) - DCdetector: Dual Attention Contrastive Representation Learning for Time
Series Anomaly Detection [26.042898544127503]
Time series anomaly detection is critical for a wide range of applications.
It aims to identify deviant samples from the normal sample distribution in time series.
We propose DCdetector, a multi-scale dual attention contrastive representation learning model.
arXiv Detail & Related papers (2023-06-17T13:40:15Z) - Abnormal Event Detection via Hypergraph Contrastive Learning [54.80429341415227]
Abnormal event detection plays an important role in many real applications.
In this paper, we study the unsupervised abnormal event detection problem in Attributed Heterogeneous Information Network.
A novel hypergraph contrastive learning method, named AEHCL, is proposed to fully capture abnormal event patterns.
arXiv Detail & Related papers (2023-04-02T08:23:20Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Multivariate Time Series Anomaly Detection with Few Positive Samples [12.256288627540536]
We introduce two methodologies to address the needs of this practical situation.
Our proposed methods anchor on representative learning of normal operation with autoregressive (AR) model.
We demonstrate effective performance in comparison with approaches from literature.
arXiv Detail & Related papers (2022-07-02T00:58:52Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Spatio-Temporal Action Detection with Multi-Object Interaction [127.85524354900494]
In this paper, we study the S-temporal action detection problem with multi-object interaction.
We introduce a new dataset that is spatially annotated with action tubes containing multi-object interactions.
We propose an end-to-endtemporal action detection model that performs both spatial and temporal regression simultaneously.
arXiv Detail & Related papers (2020-04-01T00:54:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.