Related papers: Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

URL: http://arxiv.org/abs/2404.06912v5
Date: Sat, 05 Apr 2025 09:09:18 GMT
Title: Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders
Authors: Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen,
Abstract summary: Cross-encoder models can be categorized as pointwise, pairwise, or listwise.<n>We propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder.
Score: 79.35822270532948
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing cross-encoder models can be categorized as pointwise, pairwise, or listwise. Pairwise and listwise models allow passage interactions, which typically makes them more effective than pointwise models but less efficient and less robust to input passage order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder. In experiments on TREC Deep Learning and TIREx, the Set-Encoder is as effective as state-of-the-art listwise models while being more efficient and invariant to input passage order permutations. Compared to pointwise models, the Set-Encoder is particularly more effective when considering inter-passage information, such as novelty, and retains its advantageous properties compared to other listwise models. Our code is publicly available at https://github.com/webis-de/ECIR-25.

Related papers

Exploring the Effectiveness of Multi-stage Fine-tuning for Cross-encoder Re-rankers [23.013617933109526]
State-of-the-art cross-encoders can be finetuned to be highly effective in passage re-ranking. An alternative approach for fine-tuning instead involves teaching the model to mimic the rankings of a highly effective large language model. We show that the effectiveness of point-wise cross-encoders fine-tuned using contrastive learning is indeed on par with that of models fine-tuned with multi-stage approaches.
arXiv Detail & Related papers (2025-03-28T17:58:31Z)
ListConRanker: A Contrastive Text Reranker with Listwise Encoding [27.017035527335402]
We propose a novel Listwise-encoded Contrastive text reRanker (ListConRanker)<n>It can help the passage to be compared with other passages during the encoding process.<n>It achieves state-of-the-art performance on the reranking benchmark of Chinese Massive Text Embedding Benchmark.
arXiv Detail & Related papers (2025-01-13T07:51:46Z)
CrossMPT: Cross-attention Message-Passing Transformer for Error Correcting Codes [14.631435001491514]
We propose a novel Cross-attention Message-Passing Transformer (CrossMPT) We show that CrossMPT significantly outperforms existing neural network-based decoders for various code classes. Notably, CrossMPT achieves this decoding performance improvement, while significantly reducing the memory usage, complexity, inference time, and training time.
arXiv Detail & Related papers (2024-05-02T06:30:52Z)
GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models. We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network. We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR Decomposition [77.4863142882136]
Cross-encoder models are prohibitively expensive for direct k-nearest neighbor (k-NN) search. We propose ADACUR, a method that adaptively, iteratively, and efficiently minimizes the approximation error for the practically important top-k neighbors.
arXiv Detail & Related papers (2023-05-04T17:01:17Z)
CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval [72.90850213615427]
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers. These methods are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. We propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.
arXiv Detail & Related papers (2022-11-18T18:27:35Z)
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z)
Predicting Attention Sparsity in Transformers [0.9786690381850356]
We propose Sparsefinder, a model trained to identify the sparsity pattern of entmax attention before computing it. Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph.
arXiv Detail & Related papers (2021-09-24T20:51:21Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
We show how to encode position and segment into Transformer models. The proposed method performs on par with SOTA on GLUE, XTREME and WMT benchmarks while saving costs.
arXiv Detail & Related papers (2021-04-18T03:44:57Z)
On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond [82.18770740564642]
Variational autoencoders (VAEs) combine latent variables with amortized variational inference. We observe the encoder-decoder incompatibility that leads to poor parameterizations of the data manifold. We propose Coupled-VAE, which couples a VAE model with a deterministic autoencoder with the same structure.
arXiv Detail & Related papers (2020-04-20T10:34:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.