Related papers: Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

URL: http://arxiv.org/abs/2406.14848v1
Date: Fri, 21 Jun 2024 03:33:51 GMT
Title: Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
Authors: Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao,
Abstract summary: We propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. We introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. Results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness.
Score: 17.420756201557957
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leveraging the single passage embedding as a good context compression for efficient listwise passage reranking. By treating each passage as a special token, we can directly input passage embeddings into LLMs, thereby reducing input length. Additionally, we introduce an inference method that dynamically constrains the decoding space to these special tokens, accelerating the decoding process. For adapting the model to reranking, we employ listwise learning to rank loss for training. Evaluation results on multiple benchmarks demonstrate that PE-Rank significantly improves efficiency in both prefilling and decoding, while maintaining competitive ranking effectiveness. {The Code is available at \url{https://github.com/liuqi6777/pe_rank}.}

Related papers

CoRanking: Collaborative Ranking with Small and Large Ranking Agents [39.98101653077503]
Large Language Models (LLMs) have demonstrated superior listwise ranking performance. CoRanking combines small and large ranking models for efficient and effective ranking.
arXiv Detail & Related papers (2025-03-30T13:00:52Z)
ListConRanker: A Contrastive Text Reranker with Listwise Encoding [27.017035527335402]
We propose a novel Listwise-encoded Contrastive text reRanker (ListConRanker) It can help the passage to be compared with other passages during the encoding process. It achieves state-of-the-art performance on the reranking benchmark of Chinese Massive Text Embedding Benchmark.
arXiv Detail & Related papers (2025-01-13T07:51:46Z)
Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models [40.21540137079309]
Long-context Language Models (LLMs) enable the full ranking of all passages within a single inference. We show that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting. We propose a new complete listwise label construction approach and a novel importance-aware learning objective for full ranking.
arXiv Detail & Related papers (2024-12-19T06:44:59Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
FIRST: Faster Improved Listwise Reranking with Single Token Decoding [56.727761901751194]
First, we introduce FIRST, a novel listwise LLM reranking approach leveraging the output logits of the first generated identifier to directly obtain a ranked ordering of the candidates. Empirical results demonstrate that FIRST accelerates inference by 50% while maintaining a robust ranking performance with gains across the BEIR benchmark. Our results show that LLM rerankers can provide a stronger distillation signal compared to cross-encoders, yielding substantial improvements in retriever recall after relevance feedback.
arXiv Detail & Related papers (2024-06-21T21:27:50Z)
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections [35.133698935322634]
Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. We identify and characterise the important components needed for effective model convergence using gradient descent. This result leads us to a cheap and memory-efficient algorithm for both fine-tuning and pre-training LLMs.
arXiv Detail & Related papers (2024-05-28T09:23:14Z)
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [105.11844150736536]
Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. We propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.
arXiv Detail & Related papers (2024-05-20T15:48:32Z)
Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-Ranking [79.35822270532948]
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. To close this gap, we create a new dataset, Rank-DistiLLM. Cross-encoders trained on Rank-DistiLLM achieve the effectiveness of LLMs while being up to 173 times faster and 24 times more memory efficient.
arXiv Detail & Related papers (2024-05-13T16:51:53Z)
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z)
Instruction Distillation Makes Large Language Models Efficient Zero-shot Rankers [56.12593882838412]
We introduce a novel instruction distillation method to rank documents. We first rank documents using the effective pairwise approach with complex instructions, and then distill the teacher predictions to the pointwise approach with simpler instructions. Our approach surpasses the performance of existing supervised methods like monoT5 and is on par with the state-of-the-art zero-shot methods.
arXiv Detail & Related papers (2023-11-02T19:16:21Z)
A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models [35.17291316942284]
We propose a novel zero-shot document ranking approach based on Large Language Models (LLMs): the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise.
arXiv Detail & Related papers (2023-10-14T05:20:02Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.