Exposing the Implicit Energy Networks behind Masked Language Models via
Metropolis--Hastings
- URL: http://arxiv.org/abs/2106.02736v1
- Date: Fri, 4 Jun 2021 22:04:30 GMT
- Title: Exposing the Implicit Energy Networks behind Masked Language Models via
Metropolis--Hastings
- Authors: Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
- Abstract summary: We interpret sequences as energy-based sequence models and propose two energy parametrizations derivable from traineds.
We develop a tractable emph scheme based on the Metropolis-Hastings Monte Carlo algorithm.
We validate the effectiveness of the proposed parametrizations by exploring the quality of samples drawn from these energy-based models.
- Score: 57.133639209759615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While recent work has shown that scores from models trained by the ubiquitous
masked language modeling (MLM) objective effectively discriminate probable and
improbable sequences, it is still an open question if these MLMs specify a
principled probability distribution over the space of possible sequences. In
this paper, we interpret MLMs as energy-based sequence models and propose two
energy parametrizations derivable from the trained MLMs. In order to draw
samples correctly from these models, we develop a tractable \emph{sampling}
scheme based on the Metropolis--Hastings Monte Carlo algorithm. In our
approach, samples are proposed from the same masked conditionals used for
training the masked language models, and they are accepted or rejected based on
their energy values according to the target distribution. We validate the
effectiveness of the proposed parametrizations by exploring the quality of
samples drawn from these energy-based models on the conditional generation task
of machine translation. We theoretically and empirically justify our sampling
algorithm by showing that the masked conditionals on their own do not yield a
Markov chain whose stationary distribution is that of our target distribution,
and our approach generates higher quality samples than other recently proposed
undirected generation approaches (Wang et al., 2019, Ghazvininejad et al.,
2019).
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - Towards Probabilistically-Sound Beam Search with Masked Language Models [0.0]
Beam search masked language models (MLMs) is challenging in part because joint probability over distributions are not available.
estimating such distributions has important domain-specific applications such as ancient text restoration and protein engineering.
Here we present probabilistically-sound methods for beam search with sequences.
arXiv Detail & Related papers (2024-02-22T23:36:26Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Differentiating Metropolis-Hastings to Optimize Intractable Densities [51.16801956665228]
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers.
We apply gradient-based optimization to objectives expressed as expectations over intractable target densities.
arXiv Detail & Related papers (2023-06-13T17:56:02Z) - Deriving Language Models from Masked Language Models [12.628196757545979]
Masked language models (MLM) do not explicitly define a distribution over language.
Recent work has implicitly treated them as such for the purposes of generation and scoring.
arXiv Detail & Related papers (2023-05-24T18:42:45Z) - Inconsistencies in Masked Language Models [20.320583166619528]
Masked language models (MLMs) can provide distributions of tokens in the masked positions in a sequence.
distributions corresponding to different masking patterns can demonstrate considerable inconsistencies.
We propose an inference-time strategy for fors called Ensemble of Conditionals.
arXiv Detail & Related papers (2022-12-30T22:53:25Z) - Sampling from Discrete Energy-Based Models with Quality/Efficiency
Trade-offs [3.491202838583993]
Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions.
They do not provide a mechanism for obtaining exact samples from these distributions.
We propose a new approximate sampling technique, Quasi Rejection Sampling (QRS), that allows for a trade-off between sampling efficiency and sampling quality.
arXiv Detail & Related papers (2021-12-10T17:51:37Z) - Oops I Took A Gradient: Scalable Sampling for Discrete Distributions [53.3142984019796]
We show that this approach outperforms generic samplers in a number of difficult settings.
We also demonstrate the use of our improved sampler for training deep energy-based models on high dimensional discrete data.
arXiv Detail & Related papers (2021-02-08T20:08:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.