Enabling arbitrary translation objectives with Adaptive Tree Search
- URL: http://arxiv.org/abs/2202.11444v1
- Date: Wed, 23 Feb 2022 11:48:26 GMT
- Title: Enabling arbitrary translation objectives with Adaptive Tree Search
- Authors: Wang Ling, Wojciech Stokowiec, Domenic Donato, Laurent Sartran, Lei
Yu, Austin Matthews and Chris Dyer
- Abstract summary: We introduce an adaptive tree search algorithm that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective.
Our algorithm has different biases than beam search has, which enables a new analysis of the role of decoding bias in autoregressive models.
- Score: 23.40984370716434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce an adaptive tree search algorithm, that can find high-scoring
outputs under translation models that make no assumptions about the form or
structure of the search objective. This algorithm -- a deterministic variant of
Monte Carlo tree search -- enables the exploration of new kinds of models that
are unencumbered by constraints imposed to make decoding tractable, such as
autoregressivity or conditional independence assumptions. When applied to
autoregressive models, our algorithm has different biases than beam search has,
which enables a new analysis of the role of decoding bias in autoregressive
models. Empirically, we show that our adaptive tree search algorithm finds
outputs with substantially better model scores compared to beam search in
autoregressive models, and compared to reranking techniques in models whose
scores do not decompose additively with respect to the words in the output. We
also characterise the correlation of several translation model objectives with
respect to BLEU. We find that while some standard models are poorly calibrated
and benefit from the beam search bias, other often more robust models
(autoregressive models tuned to maximize expected automatic metric scores, the
noisy channel model and a newly proposed objective) benefit from increasing
amounts of search using our proposed decoder, whereas the beam search bias
limits the improvements obtained from such objectives. Thus, we argue that as
models improve, the improvements may be masked by over-reliance on beam search
or reranking based methods.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Rethinking the Evaluation of Neural Machine Translation [25.036685025571927]
We propose a novel evaluation protocol, which avoids the effect of search errors and provides a system-level evaluation in the perspective of model ranking.
Our method is based on our newly proposed exact top-$k$ decoding instead of beam search.
arXiv Detail & Related papers (2021-06-29T09:59:50Z) - Quantum-Assisted Feature Selection for Vehicle Price Prediction Modeling [0.0]
We study metrics for encoding the search as a binary model, such as Generalized Mean Information Coefficient and Pearson Correlation Coefficient.
We achieve accuracy scores of 0.9 for finding optimal subsets on synthetic data using a new metric that we define.
Our findings show that by leveraging quantum-assisted routines we find solutions that increase the quality of predictive model output.
arXiv Detail & Related papers (2021-04-08T20:48:44Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Learning Optimal Tree Models Under Beam Search [27.92120639502327]
Existing tree models suffer from the training-testing discrepancy.
We develop the concept of Bayes optimality under beam search and calibration under beam search.
We propose a novel algorithm for learning optimal tree models under beam search.
arXiv Detail & Related papers (2020-06-27T17:20:04Z) - Unsupervised Anomaly Detection with Adversarial Mirrored AutoEncoders [51.691585766702744]
We propose a variant of Adversarial Autoencoder which uses a mirrored Wasserstein loss in the discriminator to enforce better semantic-level reconstruction.
We put forward an alternative measure of anomaly score to replace the reconstruction-based metric.
Our method outperforms the current state-of-the-art methods for anomaly detection on several OOD detection benchmarks.
arXiv Detail & Related papers (2020-03-24T08:26:58Z) - On the Discrepancy between Density Estimation and Sequence Generation [92.70116082182076]
log-likelihood is highly correlated with BLEU when we consider models within the same family.
We observe no correlation between rankings of models across different families.
arXiv Detail & Related papers (2020-02-17T20:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.