Conditioned Text Generation with Transfer for Closed-Domain Dialogue
Systems
- URL: http://arxiv.org/abs/2011.02143v1
- Date: Tue, 3 Nov 2020 14:06:10 GMT
- Title: Conditioned Text Generation with Transfer for Closed-Domain Dialogue
Systems
- Authors: St\'ephane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre
Caulier, Marc Lelarge
- Abstract summary: We show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder.
We introduce a new protocol called query transfer that allows to leverage a large unlabelled dataset.
- Score: 65.48663492703557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scarcity of training data for task-oriented dialogue systems is a well known
problem that is usually tackled with costly and time-consuming manual data
annotation. An alternative solution is to rely on automatic text generation
which, although less accurate than human supervision, has the advantage of
being cheap and fast. Our contribution is twofold. First we show how to
optimally train and control the generation of intent-specific sentences using a
conditional variational autoencoder. Then we introduce a new protocol called
query transfer that allows to leverage a large unlabelled dataset, possibly
containing irrelevant queries, to extract relevant information. Comparison with
two different baselines shows that this method, in the appropriate regime,
consistently improves the diversity of the generated queries without
compromising their quality. We also demonstrate the effectiveness of our
generation method as a data augmentation technique for language modelling
tasks.
Related papers
- Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - The Whole Truth and Nothing But the Truth: Faithful and Controllable
Dialogue Response Generation with Dataflow Transduction and Constrained
Decoding [65.34601470417967]
We describe a hybrid architecture for dialogue response generation that combines the strengths of neural language modeling and rule-based generation.
Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
arXiv Detail & Related papers (2022-09-16T09:00:49Z) - Collocation2Text: Controllable Text Generation from Guide Phrases in
Russian [0.0]
Collocation2Text is a plug-and-play method for automatic controllable text generation in Russian.
The method is based on two interacting models: the autoregressive language ruGPT-3 model and the autoencoding language ruRoBERTa model.
Experiments on generating news articles using the proposed method showed its effectiveness for automatically generated fluent texts.
arXiv Detail & Related papers (2022-06-18T17:10:08Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Self-augmented Data Selection for Few-shot Dialogue Generation [18.794770678708637]
We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
arXiv Detail & Related papers (2022-05-19T16:25:50Z) - AGGGEN: Ordering and Aggregating while Generating [12.845842212733695]
We present AGGGEN, a data-to-text model which re-introduces two explicit sentence planning stages into neural data-to-text systems.
AGGGEN performs sentence planning at the same time as generating text by learning latent alignments between input representation and target text.
arXiv Detail & Related papers (2021-06-10T08:14:59Z) - Data-Efficient Methods for Dialogue Systems [4.061135251278187]
Conversational User Interface (CUI) has become ubiquitous in everyday life, in consumer-focused products like Siri and Alexa.
Deep learning underlies many recent breakthroughs in dialogue systems but requires very large amounts of training data, often annotated by experts.
In this thesis, we introduce a series of methods for training robust dialogue systems from minimal data.
arXiv Detail & Related papers (2020-12-05T02:51:09Z) - A Simple and Efficient Multi-Task Learning Approach for Conditioned
Dialogue Generation [23.828348485513043]
Conditioned dialogue generation suffers from the scarcity of labeled responses.
We propose a multi-task learning approach to leverage both labeled dialogue and text data.
arXiv Detail & Related papers (2020-10-21T16:56:49Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.