Improving Transformer-Kernel Ranking Model Using Conformer and Query
Term Independence
- URL: http://arxiv.org/abs/2104.09393v1
- Date: Mon, 19 Apr 2021 15:32:34 GMT
- Title: Improving Transformer-Kernel Ranking Model Using Conformer and Query
Term Independence
- Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani and Nick Craswell
- Abstract summary: The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
A variant of the TK model -- called TKL -- has been developed that incorporates local self-attention to efficiently process longer input sequences.
In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences.
- Score: 29.442579683405913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking
performance on the TREC Deep Learning benchmark -- and can be considered to be
an efficient (but slightly less effective) alternative to other
Transformer-based architectures that employ (i) large-scale pretraining (high
training cost), (ii) joint encoding of query and document (high inference
cost), and (iii) larger number of Transformer layers (both high training and
high inference costs). Since, a variant of the TK model -- called TKL -- has
been developed that incorporates local self-attention to efficiently process
longer input sequences in the context of document ranking. In this work, we
propose a novel Conformer layer as an alternative approach to scale TK to
longer input sequences. Furthermore, we incorporate query term independence and
explicit term matching to extend the model to the full retrieval setting. We
benchmark our models under the strictly blind evaluation setting of the TREC
2020 Deep Learning track and find that our proposed architecture changes lead
to improved retrieval quality over TKL. Our best model also outperforms all
non-neural runs ("trad") and two-thirds of the pretrained Transformer-based
runs ("nnlm") on NDCG@10.
Related papers
- REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical.
It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance.
We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - SIT3: Code Summarization with Structure-Induced Transformer [48.000063280183376]
We propose a novel model based on structure-induced self-attention, which encodes sequential inputs with highly-effective structure modeling.
Our newly-proposed model achieves new state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2020-12-29T11:37:43Z) - Long Document Ranking with Query-Directed Sparse Transformer [30.997237454078526]
We design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention.
Our model, QDS-Transformer, enforces the principle properties desired in ranking.
Experiments on one fully supervised and three few-shot TREC document ranking benchmarks demonstrate the consistent and robust advantage of QDS-Transformer.
arXiv Detail & Related papers (2020-10-23T21:57:56Z) - Conformer-Kernel with Query Term Independence for Document Retrieval [32.36908635150144]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
We extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption.
We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents.
arXiv Detail & Related papers (2020-07-20T19:47:28Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.