Distilling Transformers for Neural Cross-Domain Search
- URL: http://arxiv.org/abs/2108.03322v1
- Date: Fri, 6 Aug 2021 22:30:19 GMT
- Title: Distilling Transformers for Neural Cross-Domain Search
- Authors: Colin B. Clement, Chen Wu, Dawn Drain, Neel Sundaresan
- Abstract summary: We argue that sequence-to-sequence models are a conceptually ideal---albeit highly impractical---retriever.
We derive a new distillation objective, implementing it as a data augmentation scheme.
Using natural language source code search as a case study for cross-domain search, we demonstrate the validity of this idea by significantly improving upon the current leader of the CodeSearchNet challenge, a recent natural language code search benchmark.
- Score: 9.865125804658991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained transformers have recently clinched top spots in the gamut of
natural language tasks and pioneered solutions to software engineering tasks.
Even information retrieval has not been immune to the charm of the transformer,
though their large size and cost is generally a barrier to deployment. While
there has been much work in streamlining, caching, and modifying transformer
architectures for production, here we explore a new direction: distilling a
large pre-trained translation model into a lightweight bi-encoder which can be
efficiently cached and queried. We argue from a probabilistic perspective that
sequence-to-sequence models are a conceptually ideal---albeit highly
impractical---retriever. We derive a new distillation objective, implementing
it as a data augmentation scheme. Using natural language source code search as
a case study for cross-domain search, we demonstrate the validity of this idea
by significantly improving upon the current leader of the CodeSearchNet
challenge, a recent natural language code search benchmark.
Related papers
- Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers [0.21756081703276003]
This thesis provides methods and analysis of models which make progress on this goal.
We introduce two new finetuning methods which add new capabilities to the models they are used on.
We provide theoretical and empirical insights on the divergence of model-likelihood and output quality.
arXiv Detail & Related papers (2024-08-29T03:50:24Z) - Large Sequence Models for Sequential Decision-Making: A Survey [33.35835438923926]
The Transformer has attracted increasing interest in the RL communities, spawning numerous approaches with notable effectiveness and generalizability.
This paper puts forth various potential avenues for future research intending to improve the effectiveness of large sequence models for sequential decision-making.
arXiv Detail & Related papers (2023-06-24T12:06:26Z) - Quality and Cost Trade-offs in Passage Re-ranking Task [0.0]
The paper is devoted to the problem of how to choose the right architecture for the ranking step of the information retrieval pipeline.
We investigated several late-interaction models such as Colbert and Poly-encoder architectures along with their modifications.
Also, we took care of the memory footprint of the search index and tried to apply the learning-to-hash method to binarize the output vectors from the transformer encoders.
arXiv Detail & Related papers (2021-11-18T19:47:45Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - A Practical Survey on Faster and Lighter Transformers [0.9176056742068811]
The Transformer is a model solely based on the attention mechanism that is able to relate any two positions of the input sequence.
It has improved the state-of-the-art across numerous sequence modelling tasks.
However, its effectiveness comes at the expense of a quadratic computational and memory complexity with respect to the sequence length.
arXiv Detail & Related papers (2021-03-26T17:54:47Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Efficient Transformers: A Survey [98.23264445730645]
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning.
This paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models.
arXiv Detail & Related papers (2020-09-14T20:38:14Z) - AutoTrans: Automating Transformer Design via Reinforced Architecture
Search [52.48985245743108]
This paper empirically explore how to set layer-norm, whether to scale, number of layers, number of heads, activation function, etc, so that one can obtain a transformer architecture that better suits the tasks at hand.
Experiments on the CoNLL03, Multi-30k, IWSLT14 and WMT-14 shows that the searched transformer model can outperform the standard transformers.
arXiv Detail & Related papers (2020-09-04T08:46:22Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z) - The Cascade Transformer: an Application for Efficient Answer Sentence
Selection [116.09532365093659]
We introduce the Cascade Transformer, a technique to adapt transformer-based models into a cascade of rankers.
When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy.
arXiv Detail & Related papers (2020-05-05T23:32:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.