Information-Theoretic Probing with Minimum Description Length
- URL: http://arxiv.org/abs/2003.12298v1
- Date: Fri, 27 Mar 2020 09:35:38 GMT
- Title: Information-Theoretic Probing with Minimum Description Length
- Authors: Elena Voita, Ivan Titov
- Abstract summary: We propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL)
With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data.
We show that these methods agree in results and are more informative and stable than the standard probes.
- Score: 74.29846942213445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To measure how well pretrained representations encode some linguistic
property, it is common to use accuracy of a probe, i.e. a classifier trained to
predict the property from the representations. Despite widespread adoption of
probes, differences in their accuracy fail to adequately reflect differences in
representations. For example, they do not substantially favour pretrained
representations over randomly initialized ones. Analogously, their accuracy can
be similar when probing for genuine linguistic labels and probing for random
synthetic tasks. To see reasonable differences in accuracy with respect to
these random baselines, previous work had to constrain either the amount of
probe training data or its model size. Instead, we propose an alternative to
the standard probes, information-theoretic probing with minimum description
length (MDL). With MDL probing, training a probe to predict labels is recast as
teaching it to effectively transmit the data. Therefore, the measure of
interest changes from probe accuracy to the description length of labels given
representations. In addition to probe quality, the description length evaluates
"the amount of effort" needed to achieve the quality. This amount of effort
characterizes either (i) size of a probing model, or (ii) the amount of data
needed to achieve the high quality. We consider two methods for estimating MDL
which can be easily implemented on top of the standard probing pipelines:
variational coding and online coding. We show that these methods agree in
results and are more informative and stable than the standard probes.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot
Classification [20.85088711770188]
We show that it is possible to improve prompt-based learning without additional labeled data.
We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions.
We find that Embroid substantially improves performance over original prompts.
arXiv Detail & Related papers (2023-07-20T17:07:28Z) - Learning Disentangled Textual Representations via Statistical Measures
of Similarity [35.74568888409149]
We introduce a family of regularizers for learning disentangled representations that do not require training.
Our novel regularizers do not require additional training, are faster and do not involve additional tuning.
arXiv Detail & Related papers (2022-05-07T08:06:22Z) - Few-shot Instruction Prompts for Pretrained Language Models to Detect
Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs)
We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z) - Comparing Text Representations: A Theory-Driven Approach [2.893558866535708]
We adapt general tools from computational learning theory to fit the specific characteristics of text datasets.
We present a method to evaluate the compatibility between representations and tasks.
This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task.
arXiv Detail & Related papers (2021-09-15T17:48:19Z) - Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing [45.834234634602566]
We show that semantic cues in training data means that syntactic probes do not properly isolate syntax.
We train the probes on several popular language models.
arXiv Detail & Related papers (2021-06-04T15:46:39Z) - Evaluating representations by the complexity of learning low-loss
predictors [55.94170724668857]
We consider the problem of evaluating representations of data for use in solving a downstream task.
We propose to measure the quality of a representation by the complexity of learning a predictor on top of the representation that achieves low loss on a task of interest.
arXiv Detail & Related papers (2020-09-15T22:06:58Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.