Related papers: Conformer-Kernel with Query Term Independence for Document Retrieval

Conformer-Kernel with Query Term Independence for Document Retrieval

URL: http://arxiv.org/abs/2007.10434v1
Date: Mon, 20 Jul 2020 19:47:28 GMT
Title: Conformer-Kernel with Query Term Independence for Document Retrieval
Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani and Nick Craswell
Abstract summary: The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. We extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents.
Score: 32.36908635150144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to BERT-based ranking models. In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. Furthermore, to reduce the memory complexity of the Transformer layers with respect to the input sequence length, we propose a new Conformer layer. We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents. Finally, we demonstrate that incorporating explicit term matching signal into the model can be particularly useful in the full retrieval setting. We present preliminary results from our work in this paper.

Related papers

Attention over pre-trained Sentence Embeddings for Long Document Classification [4.38566347001872]
transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens. We suggest to take advantage of pre-trained sentence transformers to start from semantically meaningful embeddings of the individual sentences. We report the results obtained by this simple architecture on three standard document classification datasets.
arXiv Detail & Related papers (2023-07-18T09:06:35Z)
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive. We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering [68.00631278030627]
We propose a novel method KG-FiD, which filters noisy passages by leveraging the structural relationship among the retrieved passages with a knowledge graph. We show that KG-FiD can improve vanilla FiD by up to 1.5% on answer exact match score and achieve comparable performance with FiD with only 40% of computation cost.
arXiv Detail & Related papers (2021-10-08T18:39:59Z)
Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence [29.442579683405913]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark. A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
arXiv Detail & Related papers (2021-04-19T15:32:34Z)
IOT: Instance-wise Layer Reordering for Transformer Structures [173.39918590438245]
We break the assumption of the fixed layer order in the Transformer and introduce instance-wise layer reordering into the model structure. Our method can also be applied to other architectures beyond Transformer.
arXiv Detail & Related papers (2021-03-05T03:44:42Z)
Long Document Ranking with Query-Directed Sparse Transformer [30.997237454078526]
We design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention. Our model, QDS-Transformer, enforces the principle properties desired in ranking. Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer.
arXiv Detail & Related papers (2020-10-23T21:57:56Z)
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search [84.94597821711808]
We extend PoWER-BERT (Goyal et al., 2020) and propose Length-Adaptive Transformer that can be used for various inference scenarios after one-shot training. We conduct a multi-objective evolutionary search to find a length configuration that maximizes the accuracy and minimizes the efficiency metric under any given computational budget. We empirically verify the utility of the proposed approach by demonstrating the superior accuracy-efficiency trade-off under various setups.
arXiv Detail & Related papers (2020-10-14T12:28:08Z)
Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory [23.022723184325017]
Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. We propose a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories.
arXiv Detail & Related papers (2020-05-16T16:54:52Z)
A Generic Network Compression Framework for Sequential Recommender Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs) We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.