Related papers: Dual-Encoders for Extreme Multi-Label Classification

Dual-Encoders for Extreme Multi-Label Classification

URL: http://arxiv.org/abs/2310.10636v2
Date: Sun, 17 Mar 2024 22:22:08 GMT
Title: Dual-Encoders for Extreme Multi-Label Classification
Authors: Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit Dhillon,
Abstract summary: We show that Dual-encoder (DE) models fall significantly short on extreme multi-label classification (XMC) benchmarks. We propose a simple modification to the InfoNCE loss that overcomes the limitations of existing contrastive losses. When trained with our proposed loss functions, standard DE models alone can match or outperform SOTA methods by up to 2% at Precision@1.
Score: 19.312120188406514
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dual-encoder (DE) models are widely used in retrieval tasks, most commonly studied on open QA benchmarks that are often characterized by multi-class and limited training data. In contrast, their performance in multi-label and data-rich retrieval settings like extreme multi-label classification (XMC), remains under-explored. Current empirical evidence indicates that DE models fall significantly short on XMC benchmarks, where SOTA methods linearly scale the number of learnable parameters with the total number of classes (documents in the corpus) by employing per-class classification head. To this end, we first study and highlight that existing multi-label contrastive training losses are not appropriate for training DE models on XMC tasks. We propose decoupled softmax loss - a simple modification to the InfoNCE loss - that overcomes the limitations of existing contrastive losses. We further extend our loss design to a soft top-k operator-based loss which is tailored to optimize top-k prediction performance. When trained with our proposed loss functions, standard DE models alone can match or outperform SOTA methods by up to 2% at Precision@1 even on the largest XMC datasets while being 20x smaller in terms of the number of trainable parameters. This leads to more parameter-efficient and universally applicable solutions for retrieval tasks. Our code and models are publicly available at https://github.com/nilesh2797/dexml.

Related papers

Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition [12.731093427395985]
ADAMAB is an efficient embedding calibration framework for few-shot pattern recognition.<n>Our experiments justify the superior performance of ADAMAB, with up to 40% accuracy improvement.
arXiv Detail & Related papers (2026-02-22T23:39:21Z)
MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval [16.654541753670348]
Memory-Retrieval Anomaly Detection method (MRAD) is a unified framework that replaces parametric fitting with a direct memory retrieval.<n>Across 16 industrial and medical datasets, the MRAD framework consistently demonstrates superior performance.
arXiv Detail & Related papers (2026-01-31T05:30:57Z)
MERGETUNE: Continued fine-tuning of vision-language models [77.8627788911249]
Fine-tuning vision-language models (VLMs) often leads to catastrophic forgetting of pretrained knowledge.<n>We introduce a novel paradigm, continued fine-tuning (CFT), which seeks to recover pretrained knowledge after a zero-shot model has already been adapted.
arXiv Detail & Related papers (2026-01-15T15:15:53Z)
ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces [13.242009624334996]
We propose a low-precision training framework for Extreme multilabel classification.<n>Low-precision training, combined with proposed memory optimizations, enables significant reductions in GPU memory usage.<n>For example, we train a 3-million-label XMC model with only 6.6 GiB of GPU memory, compared to the 39.7 GiB required by the optimized SOTA method.
arXiv Detail & Related papers (2025-10-13T08:59:13Z)
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection [85.0189917888094]
We propose a Dual-Stage Reweighted Mixture-of-Experts (DR-MoE) framework to handle the challenges posed by subtle and infrequent mistakes.<n>The proposed method achieves strong performance, particularly in identifying rare and ambiguous mistake instances.
arXiv Detail & Related papers (2025-09-16T12:00:42Z)
LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence [61.46575527504109]
LimiX-16M and LimiX-2M treat structured data as a joint distribution over variables and missingness.<n>We evaluate LimiX models across 11 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios.
arXiv Detail & Related papers (2025-09-03T17:39:08Z)
Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE. We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable. Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z)
Retrieval-augmented Encoders for Extreme Multi-label Text Classification [31.300502762878914]
Extreme multi-label classification (XMC) seeks to find relevant labels from an extremely large label collection for a given text input. The one-versus-all (OVA) method uses learnable label embeddings for each label, excelling at memorization. The dual-encoder (DE) model maps input and label text into a shared embedding space for better generalization.
arXiv Detail & Related papers (2025-02-15T00:30:28Z)
Ranked from Within: Ranking Large Multimodal Models Without Labels [73.96543593298426]
We show that uncertainty scores derived from softmax distributions provide a robust basis for ranking models across various tasks.<n>This facilitates the ranking of LMMs on unlabeled data, providing a practical approach for selecting models for diverse target domains without requiring manual annotation.
arXiv Detail & Related papers (2024-12-09T13:05:43Z)
Low-Resource Crop Classification from Multi-Spectral Time Series Using Lossless Compressors [6.379065975644869]
Deep learning has significantly improved the accuracy of crop classification using multispectral temporal data. In low-resource situations with fewer labeled samples, deep learning models perform poorly due to insufficient data. We propose a non-training alternative to deep learning models, aiming to address these situations.
arXiv Detail & Related papers (2024-05-28T12:28:12Z)
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space. This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z)
$\ abla τ$: Gradient-based and Task-Agnostic machine Unlearning [7.04736023670375]
We introduce Gradient-based and Task-Agnostic machine Unlearning ($nabla tau$) $nabla tau$ applies adaptive gradient ascent to the data to be forgotten while using standard gradient descent for the remaining data. We evaluate our framework's effectiveness using a set of well-established Membership Inference Attack metrics.
arXiv Detail & Related papers (2024-03-21T12:11:26Z)
List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation [80.12531449946655]
We propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. Our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.
arXiv Detail & Related papers (2024-02-05T06:52:53Z)
Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing [31.409817016287704]
Super-Resolution for remote sensing has the potential for huge impact on planet monitoring. Despite a lot of attention, several inconsistencies and challenges have prevented it from being deployed in practice. This work presents a new metric for super-resolution, CLIPScore, that corresponds far better with human judgments than previous metrics.
arXiv Detail & Related papers (2023-11-29T21:06:45Z)
Machine Learning Capability: A standardized metric using case difficulty with applications to individualized deployment of supervised machine learning [2.2060666847121864]
Model evaluation is a critical component in supervised machine learning classification analyses. Items Response Theory (IRT) and Computer Adaptive Testing (CAT) with machine learning can benchmark datasets independent of the end-classification results.
arXiv Detail & Related papers (2023-02-09T00:38:42Z)
Uncertainty in Extreme Multi-label Classification [81.14232824864787]
eXtreme Multi-label Classification (XMC) is an essential task in the era of big data for web-scale machine learning applications. In this paper, we aim to investigate general uncertainty quantification approaches for tree-based XMC models with a probabilistic ensemble-based framework. In particular, we analyze label-level and instance-level uncertainty in XMC, and propose a general approximation framework based on beam search to efficiently estimate the uncertainty with a theoretical guarantee under long-tail XMC predictions.
arXiv Detail & Related papers (2022-10-18T20:54:33Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Dissecting Supervised Constrastive Learning [24.984074794337157]
Minimizing cross-entropy over the softmax scores of a linear map composed with a high-capacity encoder is arguably the most popular choice for training neural networks on supervised learning tasks. We show that one can directly optimize the encoder instead, to obtain equally (or even more) discriminative representations via a supervised variant of a contrastive objective.
arXiv Detail & Related papers (2021-02-17T15:22:38Z)
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts [64.71102030006422]
We propose a new long-tailed classifier called RoutIng Diverse Experts (RIDE) It reduces the model variance with multiple experts, reduces the model bias with a distribution-aware diversity loss, reduces the computational cost with a dynamic expert routing module. RIDE outperforms the state-of-the-art by 5% to 7% on CIFAR100-LT, ImageNet-LT and iNaturalist 2018 benchmarks.
arXiv Detail & Related papers (2020-10-05T06:53:44Z)
Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.