Related papers: Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge

Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge

URL: http://arxiv.org/abs/2506.06174v1
Date: Fri, 06 Jun 2025 15:39:09 GMT
Title: Technical Report for Egocentric Mistake Detection for the HoloAssist Challenge
Authors: Constantin Patsch, Marsil Zakour, Yuankai Wu, Eckehard Steinbach,
Abstract summary: We introduce an online mistake detection framework that handles both procedural and execution errors.<n>Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback.<n>Experiments on the HoloAssist benchmark confirm the effectiveness of our approach.
Score: 5.257305312436567
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this report, we address the task of online mistake detection, which is vital in domains like industrial automation and education, where real-time video analysis allows human operators to correct errors as they occur. While previous work focuses on procedural errors involving action order, broader error types must be addressed for real-world use. We introduce an online mistake detection framework that handles both procedural and execution errors (e.g., motor slips or tool misuse). Upon detecting an error, we use a large language model (LLM) to generate explanatory feedback. Experiments on the HoloAssist benchmark confirm the effectiveness of our approach, where our approach is placed second on the mistake detection task.

Related papers

Probing for Arithmetic Errors in Language Models [86.8227317662622]
Internal activations in language models can be used to detect arithmetic errors.<n>We show that simple probes can accurately decode both the model's predicted output and the correct answer from hidden states.<n>We train lightweight error detectors that predict model correctness with over 90% accuracy.
arXiv Detail & Related papers (2025-07-16T16:27:50Z)
A Coin Has Two Sides: A Novel Detector-Corrector Framework for Chinese Spelling Correction [79.52464132360618]
Chinese Spelling Correction (CSC) stands as a foundational Natural Language Processing (NLP) task. We introduce a novel approach based on error detector-corrector framework. Our detector is designed to yield two error detection results, each characterized by high precision and recall.
arXiv Detail & Related papers (2024-09-06T09:26:45Z)
I2EDL: Interactive Instruction Error Detection and Localization [65.25839671641218]
We propose a novel task of Interactive VLN in Continuous Environments (IVLN-CE) It allows the agent to interact with the user during the VLN-CE navigation to verify any doubts regarding the instruction errors. We leverage a pre-trained module to detect instruction errors and pinpoint them in the instruction by cross-referencing the textual input and past observations.
arXiv Detail & Related papers (2024-06-07T16:52:57Z)
PREGO: online mistake detection in PRocedural EGOcentric videos [49.72812518471056]
We propose PREGO, the first online one-class classification model for mistake detection in egocentric videos. PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions. We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection.
arXiv Detail & Related papers (2024-04-02T13:27:28Z)
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z)
Reference-based Defect Detection Network [57.89399576743665]
The first issue is the texture shift which means a trained defect detector model will be easily affected by unseen texture. The second issue is partial visual confusion which indicates that a partial defect box is visually similar with a complete box. We propose a Reference-based Defect Detection Network (RDDN) to tackle these two problems.
arXiv Detail & Related papers (2021-08-10T05:44:23Z)
A Bayesian Approach to Identifying Representational Errors [19.539720986687524]
We present a generative model for inferring representational errors based on observations of an actor's behavior. We show that our approach can recover blind spots of both reinforcement learning agents as well as human users.
arXiv Detail & Related papers (2021-03-28T16:43:01Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
Correcting the Autocorrect: Context-Aware Typographical Error Correction via Training Data Augmentation [38.10429793534442]
We first draw on a small set of annotated data to compute spelling error statistics. These are then invoked to introduce errors into substantially larger corpora. We use it to create a set of English language error detection and correction datasets.
arXiv Detail & Related papers (2020-05-03T18:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.