SideControl: Controlled Open-domain Dialogue Generation via Additive
Side Networks
- URL: http://arxiv.org/abs/2109.01958v1
- Date: Sun, 5 Sep 2021 01:15:26 GMT
- Title: SideControl: Controlled Open-domain Dialogue Generation via Additive
Side Networks
- Authors: Wanyu Du, Yangfeng Ji
- Abstract summary: We propose a novel approach to control the generation of Transformer-based pre-trained language models: the SideControl framework.
Results show that the SideControl framework has better controllability, higher generation quality and better sample-efficiency than existing gradient-based and weighted-decoding baselines.
- Score: 10.607177634432214
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformer-based pre-trained language models boost the performance of
open-domain dialogue systems. Prior works leverage Transformer-based
pre-trained language models to generate texts with desired attributes in two
general approaches: (1) gradient-based methods: updating all latent
representations of pre-trained models with gradients from attribute models; (2)
weighted-decoding methods: re-ranking beam candidates from pre-trained models
with attribute functions. However, gradient-based methods lead to high
computation cost and can easily get overfitted on small training sets, while
weighted-decoding methods are inherently constrained by the low-variance
high-bias pre-trained model. In this work, we propose a novel approach to
control the generation of Transformer-based pre-trained language models: the
SideControl framework, which leverages a novel control attributes loss to
incorporate useful control signals, and is shown to perform well with very
limited training samples. We evaluate our proposed method on two benchmark
open-domain dialogue datasets, and results show that the SideControl framework
has better controllability, higher generation quality and better
sample-efficiency than existing gradient-based and weighted-decoding baselines.
Related papers
- CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling.
CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process.
Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Learning from Bootstrapping and Stepwise Reinforcement Reward: A
Semi-Supervised Framework for Text Style Transfer [30.622772801446132]
We propose a semi-supervised framework for text style transfer.
First, the learning process is bootstrapped with supervision guided by automatically constructed pseudo-parallel pairs.
Then the model learns from unlabeled data via reinforcement rewards.
arXiv Detail & Related papers (2022-05-19T05:18:06Z) - Learning Instance-Specific Adaptation for Cross-Domain Segmentation [79.61787982393238]
We propose a test-time adaptation method for cross-domain image segmentation.
Given a new unseen instance at test time, we adapt a pre-trained model by conducting instance-specific BatchNorm calibration.
arXiv Detail & Related papers (2022-03-30T17:59:45Z) - Controllable Natural Language Generation with Contrastive Prefixes [120.12778570283956]
GPT2 generation utilizes a set of small attribute-specific vectors, called prefixes, to steer natural language generation.
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control.
Experimental results on both single-aspect and multi-aspect control show that our methods can guide generation towards the desired attributes while keeping high linguistic quality.
arXiv Detail & Related papers (2022-02-27T00:31:03Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z) - Controlled Text Generation as Continuous Optimization with Multiple
Constraints [23.71027518888138]
We propose a flexible and modular algorithm for controllable inference from pretrained models.
We make use of Lagrangian multipliers and gradient-descent based techniques to generate the desired text.
We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes.
arXiv Detail & Related papers (2021-08-04T05:25:20Z) - Accelerating Training of Transformer-Based Language Models with
Progressive Layer Dropping [24.547833264405355]
The proposed method achieves a 24% time reduction on average per sample and allows the pre-training to be 2.5 times faster than the baseline.
While being faster, our pre-trained models are equipped with strong knowledge transferability, achieving comparable and sometimes higher GLUE score than the baseline.
arXiv Detail & Related papers (2020-10-26T06:50:07Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.