Measuring Memorization Effect in Word-Level Neural Networks Probing
- URL: http://arxiv.org/abs/2006.16082v1
- Date: Mon, 29 Jun 2020 14:35:42 GMT
- Title: Measuring Memorization Effect in Word-Level Neural Networks Probing
- Authors: Rudolf Rosa, Tom\'a\v{s} Musil, David Mare\v{c}ek
- Abstract summary: We propose a simple general method for measuring the memorization effect, based on a symmetric selection of test words seen versus unseen in training.
Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate.
- Score: 0.9156064716689833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple studies have probed representations emerging in neural networks
trained for end-to-end NLP tasks and examined what word-level linguistic
information may be encoded in the representations. In classical probing, a
classifier is trained on the representations to extract the target linguistic
information. However, there is a threat of the classifier simply memorizing the
linguistic labels for individual words, instead of extracting the linguistic
abstractions from the representations, thus reporting false positive results.
While considerable efforts have been made to minimize the memorization problem,
the task of actually measuring the amount of memorization happening in the
classifier has been understudied so far. In our work, we propose a simple
general method for measuring the memorization effect, based on a symmetric
selection of comparable sets of test words seen versus unseen in training. Our
method can be used to explicitly quantify the amount of memorization happening
in a probing setup, so that an adequate setup can be chosen and the results of
the probing can be interpreted with a reliability estimate. We exemplify this
by showcasing our method on a case study of probing for part of speech in a
trained neural machine translation encoder.
Related papers
- Measures of Information Reflect Memorization Patterns [53.71420125627608]
We show that the diversity in the activation patterns of different neurons is reflective of model generalization and memorization.
Importantly, we discover that information organization points to the two forms of memorization, even for neural activations computed on unlabelled in-distribution examples.
arXiv Detail & Related papers (2022-10-17T20:15:24Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Unsupervised Multimodal Word Discovery based on Double Articulation
Analysis with Co-occurrence cues [7.332652485849632]
Human infants acquire their verbal lexicon with minimal prior knowledge of language.
This study proposes a novel fully unsupervised learning method for discovering speech units.
The proposed method can acquire words and phonemes from speech signals using unsupervised learning.
arXiv Detail & Related papers (2022-01-18T07:31:59Z) - Preliminary study on using vector quantization latent spaces for TTS/VC
systems with consistent performance [55.10864476206503]
We investigate the use of quantized vectors to model the latent linguistic embedding.
By enforcing different policies over the latent spaces in the training, we are able to obtain a latent linguistic embedding.
Our experiments show that the voice cloning system built with vector quantization has only a small degradation in terms of perceptive evaluations.
arXiv Detail & Related papers (2021-06-25T07:51:35Z) - A comprehensive empirical analysis on cross-domain semantic enrichment
for detection of depressive language [0.9749560288448115]
We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism.
We show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset.
arXiv Detail & Related papers (2021-06-24T07:15:09Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Probing the Probing Paradigm: Does Probing Accuracy Entail Task
Relevance? [27.64235687067883]
We show that models can learn to encode linguistic properties even if they are not needed for the task on which the model was trained.
We demonstrate models can encode these properties considerably above chance-level even when distributed in the data as random noise.
arXiv Detail & Related papers (2020-05-02T06:19:20Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.