Zero-Shot Controlled Generation with Encoder-Decoder Transformers
- URL: http://arxiv.org/abs/2106.06411v2
- Date: Tue, 15 Jun 2021 16:25:11 GMT
- Title: Zero-Shot Controlled Generation with Encoder-Decoder Transformers
- Authors: Devamanyu Hazarika, Mahdi Namazifar, Dilek Hakkani-T\"ur
- Abstract summary: We propose novel approaches for controlling encoder-decoder transformer-based NLG models in zero-shot.
We show that not only are these NLG models robust to such manipulations, but also their behavior could be controlled without an impact on their generation performance.
- Score: 8.506451605253517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controlling neural network-based models for natural language generation (NLG)
has broad applications in numerous areas such as machine translation, document
summarization, and dialog systems. Approaches that enable such control in a
zero-shot manner would be of great importance as, among other reasons, they
remove the need for additional annotated data and training. In this work, we
propose novel approaches for controlling encoder-decoder transformer-based NLG
models in zero-shot. This is done by introducing three control knobs, namely,
attention biasing, decoder mixing, and context augmentation, that are applied
to these models at generation time. These knobs control the generation process
by directly manipulating trained NLG models (e.g., biasing cross-attention
layers) to realize the desired attributes in the generated outputs. We show
that not only are these NLG models robust to such manipulations, but also their
behavior could be controlled without an impact on their generation performance.
These results, to the best of our knowledge, are the first of their kind.
Through these control knobs, we also investigate the role of transformer
decoder's self-attention module and show strong evidence that its primary role
is maintaining fluency of sentences generated by these models. Based on this
hypothesis, we show that alternative architectures for transformer decoders
could be viable options. We also study how this hypothesis could lead to more
efficient ways for training encoder-decoder transformer models.
Related papers
- Unraveling the Control Engineer's Craft with Neural Networks [4.5088302622486935]
We present a data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters.
We learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin.
arXiv Detail & Related papers (2023-11-20T10:22:38Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action
Recognition [66.96931254510544]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.
Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably.
We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z) - A Spiking Central Pattern Generator for the control of a simulated
lamprey robot running on SpiNNaker and Loihi neuromorphic boards [1.8139771201780368]
We propose a spiking neural network and its implementation on neuromorphic hardware as a means to control a simulated lamprey model.
We show that by modifying the input to the network, which can be provided by sensory information, the robot can be controlled dynamically in direction and pace.
This category of spiking algorithms shows a promising potential to exploit the theoretical advantages of neuromorphic hardware in terms of energy efficiency and computational speed.
arXiv Detail & Related papers (2021-01-18T11:04:16Z) - Transformer-based Conditional Variational Autoencoder for Controllable
Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability.
We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers.
Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - Towards a Neural Graphics Pipeline for Controllable Image Generation [96.11791992084551]
We present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models.
NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation.
We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes.
arXiv Detail & Related papers (2020-06-18T14:22:54Z) - Posterior Control of Blackbox Generation [126.33511630879713]
We consider augmenting neural generation models with discrete control states learned through a structured latent-variable approach.
We find that this method improves over standard benchmarks, while also providing fine-grained control.
arXiv Detail & Related papers (2020-05-10T03:22:45Z) - Populations of Spiking Neurons for Reservoir Computing: Closed Loop
Control of a Compliant Quadruped [64.64924554743982]
We present a framework for implementing central pattern generators with spiking neural networks to obtain closed loop robot control.
We demonstrate the learning of predefined gait patterns, speed control and gait transition on a simulated model of a compliant quadrupedal robot.
arXiv Detail & Related papers (2020-04-09T14:32:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.