PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems
- URL: http://arxiv.org/abs/2601.17495v1
- Date: Sat, 24 Jan 2026 15:46:02 GMT
- Title: PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems
- Authors: Ruiyu Zhang, Lin Nie, Wai-Fung Lam, Qihao Wang, Xin Zhao,
- Abstract summary: We propose PEARL, a label-efficient approach that uses limited supervision to softly align embeddings toward class prototypes.<n>We evaluate PEARL under controlled label regimes ranging from extreme label scarcity to higher-label settings.<n>In the label-scarce condition, PEARL substantially improves local neighborhood quality, yielding 25.7% gains over raw embeddings and more than 21.1% gains relative to strong unsupervised post-processing.
- Score: 7.027521313133687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many deployed systems, new text inputs are handled by retrieving similar past cases, for example when routing and responding to citizen messages in digital governance platforms. When these systems fail, the problem is often not the language model itself, but that the nearest neighbors in the embedding space correspond to the wrong cases. Modern machine learning systems increasingly rely on fixed, high-dimensional embeddings produced by large pretrained models and sentence encoders. In real-world deployments, labels are scarce, domains shift over time, and retraining the base encoder is expensive or infeasible. As a result, downstream performance depends heavily on embedding geometry. Yet raw embeddings are often poorly aligned with the local neighborhood structure required by nearest-neighbor retrieval, similarity search, and lightweight classifiers that operate directly on embeddings. We propose PEARL (Prototype-Enhanced Aligned Representation Learning), a label-efficient approach that uses limited supervision to softly align embeddings toward class prototypes. The method reshapes local neighborhood geometry while preserving dimensionality and avoiding aggressive projection or collapse. Its aim is to bridge the gap between purely unsupervised post-processing, which offers limited and inconsistent gains, and fully supervised projections that require substantial labeled data. We evaluate PEARL under controlled label regimes ranging from extreme label scarcity to higher-label settings. In the label-scarce condition, PEARL substantially improves local neighborhood quality, yielding 25.7% gains over raw embeddings and more than 21.1% gains relative to strong unsupervised post-processing, precisely in the regime where similarity-based systems are most brittle.
Related papers
- Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z) - ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios [14.85600144047706]
We present ReCCur, a framework that converts noisy web imagery into auditable fine-grained labels.<n>On realistic corner-case scenarios, ReCCur runs on consumer-grade GPUs, steadily improves purity and separability.
arXiv Detail & Related papers (2026-01-06T13:36:43Z) - EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels [85.78886153628663]
Open-Set Domain Generalization aims to enable deep learning models to recognize unseen categories in new domains.<n>Label noise hinders open-set domain generalization by corrupting source-domain knowledge.<n>We propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM) to bridge domain gaps.
arXiv Detail & Related papers (2025-10-14T16:23:11Z) - Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry [5.1511135538176]
Active Learning (AL) promises to reduce annotation cost by prioritizing informative samples, yet its reliability is undermined when labels are noisy or when the data distribution shifts.<n>We propose Active Learning via Neural Collapse Geometry (NCAL-R), a framework that leverages the emergent geometric regularities of deep networks to counteract unreliable supervision.
arXiv Detail & Related papers (2025-10-10T17:50:31Z) - Robust and Label-Efficient Deep Waste Detection [29.019461511410515]
Effective waste sorting is critical for sustainable recycling, yet AI research in this domain continues to lag behind commercial systems.<n>In this work, we advance AI-driven waste detection by establishing strong baselines and introducing an ensemble-based semi-supervised learning framework.
arXiv Detail & Related papers (2025-08-26T08:34:04Z) - GLiClass: Generalist Lightweight Model for Sequence Classification Tasks [49.2639069781367]
We propose GLiClass, a novel method that adapts the GLiNER architecture for sequence classification tasks.<n>Our approach achieves strong accuracy and efficiency comparable to embedding-based methods, while maintaining the flexibility needed for zero-shot and few-shot learning scenarios.
arXiv Detail & Related papers (2025-08-11T06:22:25Z) - An Embedding is Worth a Thousand Noisy Labels [0.11999555634662634]
We propose WANN, a weighted Adaptive Nearest Neighbor approach to address label noise.<n>We show WANN outperforms reference methods on diverse datasets of varying size and under various noise types and severities.<n>Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome inherent limitations of deep neural network training.
arXiv Detail & Related papers (2024-08-26T15:32:31Z) - CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding [86.79903269137971]
Unsupervised visual grounding has been developed to locate regions using pseudo-labels.
We propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.
Our method outperforms the current state-of-the-art unsupervised method by a significant margin on RefCOCO/+/g datasets.
arXiv Detail & Related papers (2023-05-15T14:42:02Z) - Semi-Supervised Temporal Action Detection with Proposal-Free Masking [134.26292288193298]
We propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT)
SPOT outperforms state-of-the-art alternatives, often by a large margin.
arXiv Detail & Related papers (2022-07-14T16:58:47Z) - Unsupervised Domain Adaptive Salient Object Detection Through
Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations.
We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z) - Deep Soft Procrustes for Markerless Volumetric Sensor Alignment [81.13055566952221]
In this work, we improve markerless data-driven correspondence estimation to achieve more robust multi-sensor spatial alignment.
We incorporate geometric constraints in an end-to-end manner into a typical segmentation based model and bridge the intermediate dense classification task with the targeted pose estimation one.
Our model is experimentally shown to achieve similar results with marker-based methods and outperform the markerless ones, while also being robust to the pose variations of the calibration structure.
arXiv Detail & Related papers (2020-03-23T10:51:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.