Probing BERT's priors with serial reproduction chains
- URL: http://arxiv.org/abs/2202.12226v1
- Date: Thu, 24 Feb 2022 17:42:28 GMT
- Title: Probing BERT's priors with serial reproduction chains
- Authors: Takateru Yamakoshi, Robert D. Hawkins, Thomas L. Griffiths
- Abstract summary: We use serial reproduction chains to probe BERT's priors.
A unique and consistent estimator of the ground-truth joint distribution may be obtained.
We compare the lexical and syntactic statistics of sentences from the resulting prior distribution against those of the ground-truth corpus distribution.
- Score: 8.250374560598493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We can learn as much about language models from what they say as we learn
from their performance on targeted benchmarks. Sampling is a promising
bottom-up method for probing, but generating samples from successful models
like BERT remains challenging. Taking inspiration from theories of iterated
learning in cognitive science, we explore the use of serial reproduction chains
to probe BERT's priors. Although the masked language modeling objective does
not guarantee a consistent joint distribution, we observe that a unique and
consistent estimator of the ground-truth joint distribution may be obtained by
a GSN sampler, which randomly selects which word to mask and reconstruct on
each step. We compare the lexical and syntactic statistics of sentences from
the resulting prior distribution against those of the ground-truth corpus
distribution and elicit a large empirical sample of naturalness judgments to
investigate how, exactly, the model deviates from human speakers. Our findings
suggest the need to move beyond top-down evaluation methods toward bottom-up
probing to capture the full richness of what has been learned about language.
Related papers
- Structured Voronoi Sampling [61.629198273926676]
In this paper, we take an important step toward building a principled approach for sampling from language models with gradient-based methods.
We name our gradient-based technique Structured Voronoi Sampling (SVS)
In a controlled generation task, SVS is able to generate fluent and diverse samples while following the control targets significantly better than other methods.
arXiv Detail & Related papers (2023-06-05T17:32:35Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Self-Normalized Importance Sampling for Neural Language Modeling [97.96857871187052]
In this work, we propose self-normalized importance sampling. Compared to our previous work, the criteria considered in this work are self-normalized and there is no need to further conduct a correction step.
We show that our proposed self-normalized importance sampling is competitive in both research-oriented and production-oriented automatic speech recognition tasks.
arXiv Detail & Related papers (2021-11-11T16:57:53Z) - Empowering Language Understanding with Counterfactual Reasoning [141.48592718583245]
We propose a Counterfactual Reasoning Model, which mimics the counterfactual thinking by learning from few counterfactual samples.
In particular, we devise a generation module to generate representative counterfactual samples for each factual sample, and a retrospective module to retrospect the model prediction by comparing the counterfactual and factual samples.
arXiv Detail & Related papers (2021-06-06T06:36:52Z) - On Sampling-Based Training Criteria for Neural Language Modeling [97.35284042981675]
We consider Monte Carlo sampling, importance sampling, a novel method we call compensated partial summation, and noise contrastive estimation.
We show that all these sampling methods can perform equally well, as long as we correct for the intended class posterior probabilities.
Experimental results in language modeling and automatic speech recognition on Switchboard and LibriSpeech support our claim.
arXiv Detail & Related papers (2021-04-21T12:55:52Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - On Robustness and Bias Analysis of BERT-based Relation Extraction [40.64969232497321]
We analyze a fine-tuned BERT model from different perspectives using relation extraction.
We find that BERT suffers a bottleneck in terms of robustness by way of randomizations, adversarial and counterfactual tests, and biases.
arXiv Detail & Related papers (2020-09-14T05:24:28Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in
Natural Language Inference [38.14399396661415]
We derive adversarial examples in terms of the hypothesis-only bias.
We investigate two debiasing approaches which exploit the artificial pattern modeling to mitigate such hypothesis-only bias.
arXiv Detail & Related papers (2020-03-05T16:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.