Related papers: Long Document Ranking with Query-Directed Sparse Transformer

Long Document Ranking with Query-Directed Sparse Transformer

URL: http://arxiv.org/abs/2010.12683v1
Date: Fri, 23 Oct 2020 21:57:56 GMT
Title: Long Document Ranking with Query-Directed Sparse Transformer
Authors: Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee and Wei Wang
Abstract summary: We design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer.
Score: 30.997237454078526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks. In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking: local contextualization, hierarchical representation, and query-oriented proximity matching, while it also enjoys efficiency from sparsity. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer over previous approaches, as they either retrofit long documents into BERT or use sparse attention without emphasizing IR principles. We further quantify the computing complexity and demonstrates that our sparse attention with TVM implementation is twice more efficient than the fully-connected self-attention. All source codes, trained model, and predictions of this work are available at https://github.com/hallogameboy/QDS-Transformer.

Related papers

PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Long-Range Transformer Architectures for Document Understanding [1.9331361036118608]
Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019. We introduce 2 new multi-modal (text + layout) long-range models for DU based on efficient implementations of Transformers for long sequences. Relative 2D attention revealed to be effective on dense text for both normal and long-range models.
arXiv Detail & Related papers (2023-09-11T14:45:24Z)
Robust representations of oil wells' intervals via sparse attention mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers) The focus in our experiments is on oil&gas data, namely, well logs. To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z)
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance. We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z)
Resource-Efficient Separation Transformer [14.666016177212837]
This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture. The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings.
arXiv Detail & Related papers (2022-06-19T23:37:24Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization. Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
Random Feature Attention [69.4671822971207]
We propose RFA, a linear time and space attention that uses random feature methods to approximate the softmax function. RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism. Experiments on language modeling and machine translation demonstrate that RFA achieves similar or better performance compared to strong transformer baselines.
arXiv Detail & Related papers (2021-03-03T02:48:56Z)
Conformer-Kernel with Query Term Independence for Document Retrieval [32.36908635150144]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. We extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents.
arXiv Detail & Related papers (2020-07-20T19:47:28Z)
Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem. Given a query (e.g., a question), return the set of relevant documents from a large document corpus. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.