Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning
- URL: http://arxiv.org/abs/2510.23635v1
- Date: Fri, 24 Oct 2025 10:01:24 GMT
- Title: Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning
- Authors: Andrea Bontempelli, Matteo Busso, Leonardo Javier Malcotti, Fausto Giunchiglia,
- Abstract summary: The study involves university students using the iLog mobile application on their devices over a period of four weeks.<n>The results highlight the challenges of finding the right balance between user effort and data quality.
- Score: 16.786302712026153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life, including fitness schedules, requires high-quality annotations to function properly. However, user annotations, whether actively produced or inferred from context (e.g., data from smartphone sensors), are often subject to errors and noise. Previous research on Skeptical Learning (SKEL) addressed the issue of noisy labels by comparing offline active annotations with passive data, allowing for an evaluation of annotation accuracy. However, this evaluation did not include confirmation from end-users, the best judges of their own context. In this study, we evaluate SKEL's performance in real-world conditions with actual users who can refine the input labels based on their current perspectives and needs. The study involves university students using the iLog mobile application on their devices over a period of four weeks. The results highlight the challenges of finding the right balance between user effort and data quality, as well as the potential benefits of using SKEL, which include reduced annotation effort and improved quality of collected data.
Related papers
- Improving annotator selection in Active Learning using a mood and fatigue-aware Recommender System [0.0]
This study centers on overcoming the challenge of selecting the best annotators for each query in Active Learning (AL)<n>AL recognizes the challenges related to cost and time when acquiring labeled data, and decreases the number of labeled data needed.<n>Most strategies for query-annotator pairs do not consider internal factors that affect productivity, such as mood, attention, motivation, and fatigue levels.
arXiv Detail & Related papers (2025-07-31T17:41:30Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Ambiguous Annotations: When is a Pedestrian not a Pedestrian? [6.974741712647656]
It is not always possible to objectively determine whether an assigned label is correct or not.
Our experiments show that excluding highly ambiguous data from the training improves model performance.
In order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.
arXiv Detail & Related papers (2024-05-14T17:44:34Z) - Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Deep Active Learning with Noisy Oracle in Object Detection [5.5165579223151795]
We propose a composite active learning framework including a label review module for deep object detection.
We show that utilizing part of the annotation budget to correct the noisy annotations partially in the active dataset leads to early improvements in model performance.
In our experiments we achieve improvements of up to 4.5 mAP points of object detection performance by incorporating label reviews at equal annotation budget.
arXiv Detail & Related papers (2023-09-30T13:28:35Z) - A Matter of Annotation: An Empirical Study on In Situ and Self-Recall Activity Annotations from Wearable Sensors [56.554277096170246]
We present an empirical study that evaluates and contrasts four commonly employed annotation methods in user studies focused on in-the-wild data collection.
For both the user-driven, in situ annotations, where participants annotate their activities during the actual recording process, and the recall methods, where participants retrospectively annotate their data at the end of each day, the participants had the flexibility to select their own set of activity classes and corresponding labels.
arXiv Detail & Related papers (2023-05-15T16:02:56Z) - ALLSH: Active Learning Guided by Local Sensitivity and Hardness [98.61023158378407]
We propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function.
Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks.
arXiv Detail & Related papers (2022-05-10T15:39:11Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Toward Effective Automated Content Analysis via Crowdsourcing [6.89765603922453]
We propose a quality-aware semantic data annotation system for online workers.
With timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling.
Our results suggest that researchers can collect high-quality answers of subjective semantic features at a large scale.
arXiv Detail & Related papers (2021-01-12T17:14:18Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.