D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
- URL: http://arxiv.org/abs/2406.17262v1
- Date: Tue, 25 Jun 2024 04:03:04 GMT
- Title: D2LLM: Decomposed and Distilled Large Language Models for Semantic Search
- Authors: Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang,
- Abstract summary: We present D2LLMs-Decomposed and Distilled LLMs for semantic search.
We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module.
Our experiments show that D2LLM surpasses five leading baselines in terms of all metrics across three tasks.
- Score: 18.63768158439252
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time applications. In this paper, we present D2LLMs-Decomposed and Distilled LLMs for semantic search-that combines the best of both worlds. We decompose a cross-encoder into an efficient bi-encoder integrated with Pooling by Multihead Attention and an Interaction Emulation Module, achieving nuanced understanding and pre-computability. Knowledge from the LLM is distilled into this model using contrastive, rank, and feature imitation techniques. Our experiments show that D2LLM surpasses five leading baselines in terms of all metrics across three tasks, particularly improving NLI task performance by at least 6.45%. The source code is available at https://github.com/codefuse-ai/D2LLM.
Related papers
- RRADistill: Distilling LLMs' Passage Ranking Ability for Long-Tail Queries Document Re-Ranking on a Search Engine [2.0379810233726126]
Large Language Models (LLMs) excel at understanding the semantic relationships between queries and documents.
These queries are challenging for feedback-based rankings due to sparse user engagement and limited feedback.
We propose an efficient label generation pipeline and novel sLLM training methods for both encoder and decoder models.
arXiv Detail & Related papers (2024-10-08T11:28:06Z) - Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation [1.8876415010297893]
Data-to-text (D2T) generation aims to generate human-readable text from semi-structured data, such as tables and graphs.
No research has been conducted to illustrate the impact of model size on the performance of fine-tuned LLMs for D2T tasks.
We aim to elucidate both the advantages and limitations of scaling model sizes across five widely used D2T datasets.
arXiv Detail & Related papers (2024-07-19T07:54:30Z) - A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking [79.35822270532948]
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data.
We construct and release a new distillation dataset: Rank-DistiLLM.
arXiv Detail & Related papers (2024-05-13T16:51:53Z) - Self-Selected Attention Span for Accelerating Large Language Model Inference [10.305434265471938]
Large language models (LLMs) can solve challenging tasks.
LLMs' inference computation is highly inefficient due to the increasing number of tokens they must attend to as they generate new ones.
We capitalize on LLMs' problem-solving capabilities to optimize their own inference-time efficiency.
arXiv Detail & Related papers (2024-04-14T19:36:04Z) - LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders [34.421335513040795]
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks.
We introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder.
arXiv Detail & Related papers (2024-04-09T02:51:05Z) - MEND: Meta dEmonstratioN Distillation for Efficient and Effective
In-Context Learning [9.271196993624944]
Large Language models (LLMs) make predictions for a given test input together with a few input-output pairs (demonstrations)
Existing solutions attempt to distill lengthy demonstrations into compact vectors.
We present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task.
arXiv Detail & Related papers (2024-03-11T17:03:04Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models [83.98062659664785]
Large language models (LLMs) typically train on short text segments (e.g., 4K tokens) due to the quadratic complexity of their Transformer architectures.
This work identifies three major factors contributing to this length generalization failure.
We propose LM-Infinite, a simple and effective method for enhancing LLMs' capabilities of handling long contexts.
arXiv Detail & Related papers (2023-08-30T16:47:51Z) - CodeT5+: Open Code Large Language Models for Code Understanding and
Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence.
CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z) - MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers [140.0479479231558]
In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
arXiv Detail & Related papers (2022-12-15T13:57:07Z) - Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework.
It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates.
It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.