Inducing Diversity in Differentiable Search Indexing
- URL: http://arxiv.org/abs/2502.02788v1
- Date: Wed, 05 Feb 2025 00:21:17 GMT
- Title: Inducing Diversity in Differentiable Search Indexing
- Authors: Abhijeet Phatak, Jayant Sachdev, Sean D Rosario, Swati Kirti, Chittaranjan Tripathy,
- Abstract summary: We explore balancing relevance and novel information content (diversity) for training DSI systems inspired by Maximal Marginal Relevance (MMR)<n>We present quantitative and qualitative evaluations of relevance and diversity measures obtained using our method on NQ320K and MSMARCO datasets.
- Score: 1.747623282473278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differentiable Search Indexing (DSI) is a recent paradigm for information retrieval which uses a transformer-based neural network architecture as the document index to simplify the retrieval process. A differentiable index has many advantages enabling modifications, updates or extensions to the index. In this work, we explore balancing relevance and novel information content (diversity) for training DSI systems inspired by Maximal Marginal Relevance (MMR), and show the benefits of our approach over the naive DSI training. We present quantitative and qualitative evaluations of relevance and diversity measures obtained using our method on NQ320K and MSMARCO datasets in comparison to naive DSI. With our approach, it is possible to achieve diversity without any significant impact to relevance. Since we induce diversity while training DSI, the trained model has learned to diversify while being relevant. This obviates the need for a post-processing step to induce diversity in the recall set as typically performed using MMR. Our approach will be useful for Information Retrieval problems where both relevance and diversity are important such as in sub-topic retrieval. Our work can also be easily be extended to the incremental DSI settings which would enable fast updates to the index while retrieving a diverse recall set.
Related papers
- Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.
Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.
These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z) - Knowledge-Aware Iterative Retrieval for Multi-Agent Systems [0.0]
We introduce a novel large language model (LLM)-driven agent framework.
It iteratively refines queries and filters contextual evidence by leveraging dynamically evolving knowledge.
The proposed system supports both competitive and collaborative sharing of updated context.
arXiv Detail & Related papers (2025-03-17T15:27:02Z) - Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric [48.81957145701228]
We propose a new diversity metric based on sample-level "novelty"
We show that NovelSum accurately captures diversity variations and achieves a 0.97 correlation with instruction-tuned model performance.
arXiv Detail & Related papers (2025-02-24T14:20:22Z) - SC-Rec: Enhancing Generative Retrieval with Self-Consistent Reranking for Sequential Recommendation [18.519480704213017]
We propose SC-Rec, a unified recommender system that learns diverse preference knowledge from two distinct item indices and multiple prompt templates.
SC-Rec considerably outperforms the state-of-the-art methods for sequential recommendation, effectively incorporating complementary knowledge from varied outputs of the model.
arXiv Detail & Related papers (2024-08-16T11:59:01Z) - PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval [47.13111745988827]
Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes.
We introduce PromptDSI, a prompt-based rehearsal-free approach for instance-wise incremental learning document retrieval.
arXiv Detail & Related papers (2024-06-18T13:25:18Z) - De-DSI: Decentralised Differentiable Search Index [0.0]
De-DSI is a framework that fuses large language models with genuine decentralization for information retrieval.
It uses the differentiable search index (DSI) concept in a decentralized setting to efficiently connect novel user queries with document identifiers.
arXiv Detail & Related papers (2024-04-18T14:51:55Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Representation Online Matters: Practical End-to-End Diversification in
Search and Recommender Systems [8.296711988456762]
We introduce end-to-end diversification to improve representation in search results and recommendations.
We develop, experiment, and deploy scalable diversification mechanisms on the Pinterest platform.
Our approaches significantly improve diversity metrics, with a neutral to a positive impact on utility metrics and improved user satisfaction.
arXiv Detail & Related papers (2023-05-24T19:43:26Z) - Transformer Memory as a Differentiable Search Index [102.41278496436948]
We introduce the Differentiable Search Index (DSI), a new paradigm that learns a text-to-text model that maps string queries directly to relevant docids.
We study variations in how documents and their identifiers are represented, variations in training procedures, and the interplay between models and corpus sizes.
arXiv Detail & Related papers (2022-02-14T19:12:43Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.