EigenNoise: A Contrastive Prior to Warm-Start Representations
- URL: http://arxiv.org/abs/2205.04376v1
- Date: Mon, 9 May 2022 15:30:50 GMT
- Title: EigenNoise: A Contrastive Prior to Warm-Start Representations
- Authors: Hunter Scott Heidenreich, Jake Ryland Williams
- Abstract summary: We present a naive scheme for word vectors based on a dense, independent co-occurrence model.
We show that our model, EigenNoise, can approach the performance of empirically trained GloVe despite the lack of pre-training data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present a naive initialization scheme for word vectors based
on a dense, independent co-occurrence model and provide preliminary results
that suggest it is competitive and warrants further investigation.
Specifically, we demonstrate through information-theoretic minimum description
length (MDL) probing that our model, EigenNoise, can approach the performance
of empirically trained GloVe despite the lack of any pre-training data (in the
case of EigenNoise). We present these preliminary results with interest to set
the stage for further investigations into how this competitive initialization
works without pre-training data, as well as to invite the exploration of more
intelligent initialization schemes informed by the theory of harmonic
linguistic structure. Our application of this theory likewise contributes a
novel (and effective) interpretation of recent discoveries which have
elucidated the underlying distributional information that linguistic
representations capture from data and contrast distributions.
Related papers
- Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - A Supervised Contrastive Learning Pretrain-Finetune Approach for Time
Series [15.218841180577135]
We introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset.
We then propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets.
arXiv Detail & Related papers (2023-11-21T02:06:52Z) - Probing via Prompting [71.7904179689271]
This paper introduces a novel model-free approach to probing, by formulating probing as a prompting task.
We conduct experiments on five probing tasks and show that our approach is comparable or better at extracting information than diagnostic probes.
We then examine the usefulness of a specific linguistic property for pre-training by removing the heads that are essential to that property and evaluating the resulting model's performance on language modeling.
arXiv Detail & Related papers (2022-07-04T22:14:40Z) - To Know by the Company Words Keep and What Else Lies in the Vicinity [0.0]
We introduce an analytic model of the statistics learned by seminal algorithms, including GloVe and Word2Vec.
We derive -- to the best of our knowledge -- the first known solution to Word2Vec's softmax-optimized, skip-gram algorithm.
arXiv Detail & Related papers (2022-04-30T03:47:48Z) - An Explanation of In-context Learning as Implicit Bayesian Inference [117.19809377740188]
We study the role of the pretraining distribution on the emergence of in-context learning.
We prove that in-context learning occurs implicitly via Bayesian inference of the latent concept.
We empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.
arXiv Detail & Related papers (2021-11-03T09:12:33Z) - Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model.
We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.