Curriculum Learning for Dense Retrieval Distillation
- URL: http://arxiv.org/abs/2204.13679v1
- Date: Thu, 28 Apr 2022 17:42:21 GMT
- Title: Curriculum Learning for Dense Retrieval Distillation
- Authors: Hansi Zeng, Hamed Zamani, Vishwa Vinay
- Abstract summary: We propose a generic curriculum learning based optimization framework called CL-DRD.
CL-DRD controls the difficulty level of training data produced by the re-ranking (teacher) model.
Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
- Score: 20.25741148622744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that more effective dense retrieval models can be
obtained by distilling ranking knowledge from an existing base re-ranking
model. In this paper, we propose a generic curriculum learning based
optimization framework called CL-DRD that controls the difficulty level of
training data produced by the re-ranking (teacher) model. CL-DRD iteratively
optimizes the dense retrieval (student) model by increasing the difficulty of
the knowledge distillation data made available to it. In more detail, we
initially provide the student model coarse-grained preference pairs between
documents in the teacher's ranking and progressively move towards finer-grained
pairwise document ordering requirements. In our experiments, we apply a simple
implementation of the CL-DRD framework to enhance two state-of-the-art dense
retrieval models. Experiments on three public passage retrieval datasets
demonstrate the effectiveness of our proposed framework.
Related papers
- An Active Learning Framework for Inclusive Generation by Large Language Models [32.16984263644299]
Large Language Models (LLMs) generate text representative of diverse sub-populations.
We propose a novel clustering-based active learning framework, enhanced with knowledge distillation.
We construct two new datasets in tandem with model training, showing a performance improvement of 2%-10% over baseline models.
arXiv Detail & Related papers (2024-10-17T15:09:35Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding [2.0257616108612373]
This paper introduces a model-agnostic doc-level embedding framework through large language model augmentation.
We have been able to significantly improve the effectiveness of widely-used retriever models.
arXiv Detail & Related papers (2024-04-08T19:29:07Z) - Distillation Enhanced Generative Retrieval [96.69326099136289]
Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target.
In this work, we identify a viable direction to further enhance generative retrieval via distillation and propose a feasible framework, named DGR.
We conduct experiments on four public datasets, and the results indicate that DGR achieves state-of-the-art performance among the generative retrieval methods.
arXiv Detail & Related papers (2024-02-16T15:48:24Z) - Let All be Whitened: Multi-teacher Distillation for Efficient Visual
Retrieval [57.17075479691486]
We propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval.
Our source code is released at https://github.com/Maryeon/whiten_mtd.
arXiv Detail & Related papers (2023-12-15T11:43:56Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Debias the Black-box: A Fair Ranking Framework via Knowledge
Distillation [26.60241524303918]
We propose a fair information retrieval framework based on knowledge distillation.
This framework can improve the exposure-based fairness of models while considerably decreasing model size.
It also improves fairness performance by 15%46% while keeping a high level of recommendation effectiveness.
arXiv Detail & Related papers (2022-08-24T15:59:58Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.