Regression Transformer: Concurrent Conditional Generation and Regression
by Blending Numerical and Textual Tokens
- URL: http://arxiv.org/abs/2202.01338v1
- Date: Tue, 1 Feb 2022 08:57:31 GMT
- Title: Regression Transformer: Concurrent Conditional Generation and Regression
by Blending Numerical and Textual Tokens
- Authors: Jannis Born, Matteo Manica
- Abstract summary: The Regression Transformer (RT) casts continuous properties as sequences of numerical tokens and encodes them jointly with conventional tokens.
We propose several extensions to the XLNet objective and adopt an alternating training scheme to concurrently optimize property prediction and conditional text generation.
This finds application particularly in property-driven, local exploration of the chemical or protein space.
- Score: 3.421506449201873
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We report the Regression Transformer (RT), a method that abstracts regression
as a conditional sequence modeling problem. The RT casts continuous properties
as sequences of numerical tokens and encodes them jointly with conventional
tokens. This yields a dichotomous model that can seamlessly transition between
solving regression tasks and conditional generation tasks; solely governed by
the mask location. We propose several extensions to the XLNet objective and
adopt an alternating training scheme to concurrently optimize property
prediction and conditional text generation based on a self-consistency loss.
Our experiments on both chemical and protein languages demonstrate that the
performance of traditional regression models can be surpassed despite training
with cross entropy loss. Importantly, priming the same model with continuous
properties yields a highly competitive conditional generative models that
outperforms specialized approaches in a constrained property optimization
benchmark. In sum, the Regression Transformer opens the door for "swiss army
knife" models that excel at both regression and conditional generation. This
finds application particularly in property-driven, local exploration of the
chemical or protein space.
Related papers
- RS-Reg: Probabilistic and Robust Certified Regression Through Randomized Smoothing [19.03441416869426]
We show how to establish upper bounds on input data point using the $ell$ norm.
We then derive a certified upper bound of the perturbation inputs when dealing with a family of regression models where the outputs are bounded.
Our simulations verify the validity of the theoretical results and reveal the advantages and limitations of simple smoothing functions.
arXiv Detail & Related papers (2024-05-14T18:10:46Z) - Generalized Regression with Conditional GANs [2.4171019220503402]
We propose to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset.
We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities.
arXiv Detail & Related papers (2024-04-21T01:27:47Z) - GenFormer: A Deep-Learning-Based Approach for Generating Multivariate
Stochastic Processes [5.679243827959339]
We propose a Transformer-based deep learning model that learns a mapping between a Markov state sequence and time series values.
The GenFormer model is applied to simulate synthetic wind speed data at various stations in Florida to calculate exceedance probabilities for risk management.
arXiv Detail & Related papers (2024-02-03T03:50:18Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Engression: Extrapolation through the Lens of Distributional Regression [2.519266955671697]
We propose a neural network-based distributional regression methodology called engression'
An engression model is generative in the sense that we can sample from the fitted conditional distribution and is also suitable for high-dimensional outcomes.
We show that engression can successfully perform extrapolation under some assumptions such as monotonicity, whereas traditional regression approaches such as least-squares or quantile regression fall short under the same assumptions.
arXiv Detail & Related papers (2023-07-03T08:19:00Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - SymbolicGPT: A Generative Transformer Model for Symbolic Regression [3.685455441300801]
We present SymbolicGPT, a novel transformer-based language model for symbolic regression.
We show that our model performs strongly compared to competing models with respect to the accuracy, running time, and data efficiency.
arXiv Detail & Related papers (2021-06-27T03:26:35Z) - Decision Transformer: Reinforcement Learning via Sequence Modeling [102.86873656751489]
We present a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.
We present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
arXiv Detail & Related papers (2021-06-02T17:53:39Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - Cascaded Text Generation with Markov Transformers [122.76100449018061]
Two dominant approaches to neural text generation are fully autoregressive models, using serial beam search decoding, and non-autoregressive models, using parallel decoding with no output dependencies.
This work proposes an autoregressive model with sub-linear parallel time generation. Noting that conditional random fields with bounded context can be decoded in parallel, we propose an efficient cascaded decoding approach for generating high-quality output.
This approach requires only a small modification from standard autoregressive training, while showing competitive accuracy/speed tradeoff compared to existing methods on five machine translation datasets.
arXiv Detail & Related papers (2020-06-01T17:52:15Z) - Aligned Cross Entropy for Non-Autoregressive Machine Translation [120.15069387374717]
We propose aligned cross entropy (AXE) as an alternative loss function for training of non-autoregressive models.
AXE-based training of conditional masked language models (CMLMs) substantially improves performance on major WMT benchmarks.
arXiv Detail & Related papers (2020-04-03T16:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.