Related papers: Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

URL: http://arxiv.org/abs/2404.04163v2
Date: Sat, 26 Oct 2024 00:04:38 GMT
Title: Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval
Authors: João Coelho, Bruno Martins, João Magalhães, Jamie Callan, Chenyan Xiong,
Abstract summary: We build on previous research that demonstrated loss of information in the middle of input sequences for causal language models. We examine positional biases at various stages of training for an encoder-decoder model, including language model pre-training, contrastive pre-training, and contrastive fine-tuning.
Score: 31.9252824152673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval. We build on previous research that demonstrated loss of information in the middle of input sequences for causal language models, extending it to the domain of representation learning. We examine positional biases at various stages of training for an encoder-decoder model, including language model pre-training, contrastive pre-training, and contrastive fine-tuning. Experiments with the MS-MARCO document collection reveal that after contrastive pre-training the model already generates embeddings that better capture early contents of the input, with fine-tuning further aggravating this effect.

Related papers

An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models [0.0]
This research aims to unravel how large language models (LLMs) iteratively refine token predictions. We focused on how LLMs access and use information from input contexts, and how positioning of relevant information affects the model's token prediction refinement process.
arXiv Detail & Related papers (2025-01-25T03:34:15Z)
Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks [1.1711824752079485]
dissertation focuses on developing intrinsically interpretable models when using LMs as encoders.<n>I developed a novel white-box multi-head graph attention-based prototype network.<n>I am working on extending the attention-based prototype network with contrastive learning to redesign an interpretable graph neural network.
arXiv Detail & Related papers (2024-12-04T22:59:35Z)
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models. We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Efficient Training of Language Models to Fill in the Middle [17.118891860985123]
We show that autoregressive language models can learn to infill text after we apply a straightforward transformation to the dataset. We use these ablations to prescribe strong default settings and best practices to train FIM models. We have released our best infilling model trained with best practices in our API, and release our infilling benchmarks to aid future research.
arXiv Detail & Related papers (2022-07-28T17:40:47Z)
A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities. We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention. Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z)
Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model. We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization. We also propose a prototype encoding structure to model the ability of intra-topic derivation. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.