DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
- URL: http://arxiv.org/abs/2212.04205v2
- Date: Thu, 18 May 2023 04:06:50 GMT
- Title: DC-MBR: Distributional Cooling for Minimum Bayesian Risk Decoding
- Authors: Jianhao Yan, Jin Xu, Fandong Meng, Jie Zhou, Yue Zhang
- Abstract summary: Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding algorithm in Neural Machine Translation.
MBR performs poorly with label smoothing, which is surprising as label smoothing provides decent improvement with beam search and improves generality in various tasks.
We show that the issue arises from the un-consistency of label smoothing on the token-level and sequence-level distributions.
- Score: 53.33313271531839
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Minimum Bayesian Risk Decoding (MBR) emerges as a promising decoding
algorithm in Neural Machine Translation. However, MBR performs poorly with
label smoothing, which is surprising as label smoothing provides decent
improvement with beam search and improves generality in various tasks. In this
work, we show that the issue arises from the un-consistency of label smoothing
on the token-level and sequence-level distributions. We demonstrate that even
though label smoothing only causes a slight change in the token-level, the
sequence-level distribution is highly skewed. We coin the issue
\emph{autoregressive over-smoothness}. To address this issue, we propose a
simple and effective method, Distributional Cooling MBR (DC-MBR), which
manipulates the entropy of output distributions by tuning down the Softmax
temperature. We theoretically prove the equivalence between pre-tuning label
smoothing factor and distributional cooling. Extensive experiments on NMT
benchmarks validate that distributional cooling improves MBR in various
settings.
Related papers
- Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation [30.323103270892734]
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
Minimum Bayes Risk (MBR) decoding offers an alternative by seeking hypotheses with the highest expected utility.
arXiv Detail & Related papers (2024-06-17T15:13:52Z) - Inaccurate Label Distribution Learning with Dependency Noise [52.08553913094809]
We introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning.
We show that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods.
arXiv Detail & Related papers (2024-05-26T07:58:07Z) - Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling [49.215957313126324]
Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling.
However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details.
We propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results.
arXiv Detail & Related papers (2024-05-05T14:05:33Z) - The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing [85.85160896547698]
Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks.
We show how to design an efficient classifier with a certified radius by relying on noise injection into the inputs.
Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.
arXiv Detail & Related papers (2023-09-28T22:41:47Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme.
We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z) - Federated Learning with Label Distribution Skew via Logits Calibration [26.98248192651355]
In this paper, we investigate the label distribution skew in FL, where the distribution of labels varies across clients.
We propose FedLC, which calibrates the logits before softmax cross-entropy according to the probability of occurrence of each class.
Experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model.
arXiv Detail & Related papers (2022-09-01T02:56:39Z) - GMAC: A Distributional Perspective on Actor-Critic Framework [6.243642831536256]
We propose a new method that minimizes the Cram'er distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm SR($lambda$)
We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost.
arXiv Detail & Related papers (2021-05-24T15:50:26Z) - Distributionally Robust Bayesian Optimization [121.71766171427433]
We present a novel distributionally robust Bayesian optimization algorithm (DRBO) for zeroth-order, noisy optimization.
Our algorithm provably obtains sub-linear robust regret in various settings.
We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
arXiv Detail & Related papers (2020-02-20T22:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.