Related papers: Zero-Shot Controlled Generation with Encoder-Decoder Transformers

Zero-Shot Controlled Generation with Encoder-Decoder Transformers

URL: http://arxiv.org/abs/2106.06411v2
Date: Tue, 15 Jun 2021 16:25:11 GMT
Title: Zero-Shot Controlled Generation with Encoder-Decoder Transformers
Authors: Devamanyu Hazarika, Mahdi Namazifar, Dilek Hakkani-T\"ur
Abstract summary: We propose novel approaches for controlling encoder-decoder transformer-based NLG models in zero-shot. We show that not only are these NLG models robust to such manipulations, but also their behavior could be controlled without an impact on their generation performance.
Score: 8.506451605253517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Controlling neural network-based models for natural language generation (NLG) has broad applications in numerous areas such as machine translation, document summarization, and dialog systems. Approaches that enable such control in a zero-shot manner would be of great importance as, among other reasons, they remove the need for additional annotated data and training. In this work, we propose novel approaches for controlling encoder-decoder transformer-based NLG models in zero-shot. This is done by introducing three control knobs, namely, attention biasing, decoder mixing, and context augmentation, that are applied to these models at generation time. These knobs control the generation process by directly manipulating trained NLG models (e.g., biasing cross-attention layers) to realize the desired attributes in the generated outputs. We show that not only are these NLG models robust to such manipulations, but also their behavior could be controlled without an impact on their generation performance. These results, to the best of our knowledge, are the first of their kind. Through these control knobs, we also investigate the role of transformer decoder's self-attention module and show strong evidence that its primary role is maintaining fluency of sentences generated by these models. Based on this hypothesis, we show that alternative architectures for transformer decoders could be viable options. We also study how this hypothesis could lead to more efficient ways for training encoder-decoder transformer models.

Related papers

Enabling Versatile Controls for Video Diffusion Models [18.131652071161266]
VCtrl is a novel framework designed to enable fine control over pre-trained video diffusion models. Comprehensive experiments and human evaluations demonstrate VCtrl effectively enhances controllability and generation quality.
arXiv Detail & Related papers (2025-03-21T09:48:00Z)
CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling. CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process. Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z)
Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers [0.21756081703276003]
This thesis provides methods and analysis of models which make progress on this goal. We introduce two new finetuning methods which add new capabilities to the models they are used on. We provide theoretical and empirical insights on the divergence of model-likelihood and output quality.
arXiv Detail & Related papers (2024-08-29T03:50:24Z)
Unraveling the Control Engineer's Craft with Neural Networks [4.5088302622486935]
We present a data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. We learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin.
arXiv Detail & Related papers (2023-11-20T10:22:38Z)
Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data. Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably. We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z)
Transformer-based Conditional Variational Autoencoder for Controllable Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability. We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers. Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)
Towards a Neural Graphics Pipeline for Controllable Image Generation [96.11791992084551]
We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation. We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
arXiv Detail & Related papers (2020-06-18T14:22:54Z)
Posterior Control of Blackbox Generation [126.33511630879713]
We consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach. We find that this method improves over standard benchmarks, while also providing fine-grained control.
arXiv Detail & Related papers (2020-05-10T03:22:45Z)
Populations of Spiking Neurons for Reservoir Computing: Closed Loop Control of a Compliant Quadruped [64.64924554743982]
We present a framework for implementing central pattern generators with spiking neural networks to obtain closed loop robot control. We demonstrate the learning of predefined gait patterns, speed control and gait transition on a simulated model of a compliant quadrupedal robot.
arXiv Detail & Related papers (2020-04-09T14:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.