Is Your Language Model Ready for Dense Representation Fine-tuning?
- URL: http://arxiv.org/abs/2104.08253v1
- Date: Fri, 16 Apr 2021 17:36:44 GMT
- Title: Is Your Language Model Ready for Dense Representation Fine-tuning?
- Authors: Luyu Gao, Jamie Callan
- Abstract summary: This paper shows that one cause lies in the readiness of the LM to expose its knowledge through dense representation in fine-tuning.
We present Condenser, a general pre-training architecture based on Transformer LMs, to improve dense optimization readiness.
- Score: 15.238322226336232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained language models (LM) have become go-to text representation
encoders. Prior research used deep LMs to encode text sequences such as
sentences and passages into single dense vector representations. These dense
representations have been used in efficient text comparison and embedding-based
retrieval. However, dense encoders suffer in low resource situations. Many
techniques have been developed to solve this problem. Despite their success,
not much is known about why this happens. This paper shows that one cause lies
in the readiness of the LM to expose its knowledge through dense representation
in fine-tuning, which we term Optimization Readiness. To validate the theory,
we present Condenser, a general pre-training architecture based on Transformer
LMs, to improve dense optimization readiness. We show that fine-tuning from
Condenser significantly improves performance for small and/or noisy training
sets.
Related papers
- Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Extracting Text Representations for Terms and Phrases in Technical
Domains [9.27244202193623]
We propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices.
Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster.
arXiv Detail & Related papers (2023-05-25T08:59:36Z) - GRACE: Discriminator-Guided Chain-of-Thought Reasoning [75.35436025709049]
We propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE) to steer the decoding process towards producing correct reasoning steps.
GRACE employs a discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates.
arXiv Detail & Related papers (2023-05-24T09:16:51Z) - KNN-LM Does Not Improve Open-ended Text Generation [34.86733697757264]
We study the generation quality of retrieval-augmented language models (LMs)
We find that interpolating with a retrieval distribution actually increases perplexity compared to a baseline Transformer LM.
We discover that the entropy of the retrieval distribution increases faster than that of the base LM as the generated sequence becomes longer.
arXiv Detail & Related papers (2023-05-24T01:48:33Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Why do Nearest Neighbor Language Models Work? [93.71050438413121]
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context.
Retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore.
arXiv Detail & Related papers (2023-01-07T11:12:36Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - Contrastive Decoding: Open-ended Text Generation as Optimization [153.35961722855686]
We propose contrastive decoding (CD), a reliable decoding approach.
It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs.
CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
arXiv Detail & Related papers (2022-10-27T00:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.