Mode recovery in neural autoregressive sequence modeling
- URL: http://arxiv.org/abs/2106.05459v1
- Date: Thu, 10 Jun 2021 02:17:28 GMT
- Title: Mode recovery in neural autoregressive sequence modeling
- Authors: Ilia Kulikov, Sean Welleck, Kyunghyun Cho
- Abstract summary: Recent studies have revealed unexpected and undesirable properties of neural autoregressive sequence models.
We investigate how the modes, or local maxima, of a distribution are maintained throughout the full learning chain.
We conclude that future research must consider the entire learning chain in order to fully understand the potentials and perils.
- Score: 55.05526174291747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite its wide use, recent studies have revealed unexpected and undesirable
properties of neural autoregressive sequence models trained with maximum
likelihood, such as an unreasonably high affinity to short sequences after
training and to infinitely long sequences at decoding time. We propose to study
these phenomena by investigating how the modes, or local maxima, of a
distribution are maintained throughout the full learning chain of the
ground-truth, empirical, learned and decoding-induced distributions, via the
newly proposed mode recovery cost. We design a tractable testbed where we build
three types of ground-truth distributions: (1) an LSTM based structured
distribution, (2) an unstructured distribution where probability of a sequence
does not depend on its content, and (3) a product of these two which we call a
semi-structured distribution. Our study reveals both expected and unexpected
findings. First, starting with data collection, mode recovery cost strongly
relies on the ground-truth distribution and is most costly with the
semi-structured distribution. Second, after learning, mode recovery cost from
the ground-truth distribution may increase or decrease compared to data
collection, with the largest cost degradation occurring with the
semi-structured ground-truth distribution. Finally, the ability of the
decoding-induced distribution to recover modes from the learned distribution is
highly impacted by the choices made earlier in the learning chain. We conclude
that future research must consider the entire learning chain in order to fully
understand the potentials and perils and to further improve neural
autoregressive sequence models.
Related papers
- Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation [12.459901557580052]
We present SURE, a novel framework that extends the capabilities of pretrained multimodal models by introducing latent space reconstruction and uncertainty estimation.
We show that SURE consistently achieves state-of-the-art performance, ensuring robust predictions even in the presence of incomplete data.
arXiv Detail & Related papers (2025-04-18T05:07:20Z) - Parallelly Tempered Generative Adversarial Networks [7.94957965474334]
A generative adversarial network (GAN) has been a representative backbone model in generative artificial intelligence (AI)
This work analyzes the training instability and inefficiency in the presence of mode collapse by linking it to multimodality in the target distribution.
With our newly developed GAN objective function, the generator can learn all the tempered distributions simultaneously.
arXiv Detail & Related papers (2024-11-18T18:01:13Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Unimodal Distributions for Ordinal Regression [2.642698101441705]
We propose two new approaches to incorporate the preference for unimodal distributions into the predictive model.
We analyse the set of unimodal distributions in the probability simplex and establish fundamental properties.
We then propose a new architecture that imposes unimodal distributions and a new loss term that relies on the notion of projection in a set to promote unimodality.
arXiv Detail & Related papers (2023-03-08T13:00:40Z) - JANA: Jointly Amortized Neural Approximation of Complex Bayesian Models [0.5872014229110214]
We propose jointly amortized neural approximation'' (JANA) of intractable likelihood functions and posterior densities.
We benchmark the fidelity of JANA on a variety of simulation models against state-of-the-art Bayesian methods.
arXiv Detail & Related papers (2023-02-17T20:17:21Z) - Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks.
Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target.
Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z) - MMCGAN: Generative Adversarial Network with Explicit Manifold Prior [78.58159882218378]
We propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN.
Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.
arXiv Detail & Related papers (2020-06-18T07:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.