Label Curation Using Agentic AI
- URL: http://arxiv.org/abs/2602.02564v1
- Date: Fri, 30 Jan 2026 18:58:52 GMT
- Title: Label Curation Using Agentic AI
- Authors: Subhodeep Ghosh, Bayan Divaaniaazar, Md Ishat-E-Rabban, Spencer Clarke, Senjuti Basu Roy,
- Abstract summary: We present AURA, an agentic AI framework for large-scale, multi-modal data annotation.<n>AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth.<n>AURA achieves accuracy improvements of up to 5.8% over baseline.<n>In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline.
- Score: 3.500372926575144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of up to 5.8% over baseline. In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline. AURA also accurately estimates the reliability of annotators, allowing assessment of annotator quality even without any pre-validation steps.
Related papers
- Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification [71.98473277917962]
Recent advances in Deep Research Agents (DRAs) are transforming automated knowledge discovery and problem-solving.<n>We propose an alternative paradigm: self-evolving the agent's ability by iteratively verifying the policy model's outputs, guided by meticulously crafted rubrics.<n>We present DeepVerifier, a rubrics-based outcome reward verifier that leverages the asymmetry of verification.
arXiv Detail & Related papers (2026-01-22T09:47:31Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - The Achilles Heel of AI: Fundamentals of Risk-Aware Training Data for High-Consequence Models [0.0]
AI systems in high-consequence domains must detect rare, high-impact events while operating under tight resource constraints.<n>Traditional annotation strategies that prioritize label volume over informational value introduce redundancy and noise.<n>This paper introduces smart-sizing, a training data strategy that emphasizes label diversity, model-guided selection, and marginal utility-based stopping.
arXiv Detail & Related papers (2025-05-20T22:57:35Z) - Financial Fraud Detection Using Explainable AI and Stacking Ensemble Methods [0.6642919568083927]
We propose a fraud detection framework that combines a stacking ensemble of gradient boosting models: XGBoost, LightGBM, and CatBoost.<n>XAI techniques are used to enhance the transparency and interpretability of the model's decisions.
arXiv Detail & Related papers (2025-05-15T07:53:02Z) - Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations [0.0]
We introduce the notion of "label convergence" to describe the highest achievable performance under the constraint of contradictory test annotations.<n>We approximate that label convergence is between 62.63-67.52 mAP@[0.5:0.95:0.05] for LVIS with 95% confidence, attributing these bounds to the presence of real annotation errors.<n>With current state-of-the-art (SOTA) models at the upper end of the label convergence interval for the well-studied LVIS dataset, we conclude that model capacity is sufficient to solve current object detection problems.
arXiv Detail & Related papers (2024-09-14T10:59:25Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - How Does Beam Search improve Span-Level Confidence Estimation in
Generative Sequence Labeling? [11.481435098152893]
This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling.
As verified over six public datasets, we show that our proposed approach significantly reduces calibration errors of the predictions of a generative sequence labeling model.
arXiv Detail & Related papers (2022-12-21T05:01:01Z) - Delving into Probabilistic Uncertainty for Unsupervised Domain Adaptive
Person Re-Identification [54.174146346387204]
We propose an approach named probabilistic uncertainty guided progressive label refinery (P$2$LR) for domain adaptive person re-identification.
A quantitative criterion is established to measure the uncertainty of pseudo labels and facilitate the network training.
Our method outperforms the baseline by 6.5% mAP on the Duke2Market task, while surpassing the state-of-the-art method by 2.5% mAP on the Market2MSMT task.
arXiv Detail & Related papers (2021-12-28T07:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.