Learning from Multiple Sources for Data-to-Text and Text-to-Data
- URL: http://arxiv.org/abs/2302.11269v1
- Date: Wed, 22 Feb 2023 10:39:33 GMT
- Title: Learning from Multiple Sources for Data-to-Text and Text-to-Data
- Authors: Song Duong, Alberto Lumbreras, Mike Gartrell, Patrick Gallinari
- Abstract summary: Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa.
Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks.
This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora.
We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that
- Score: 16.080265665849527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert
structured data, such as graphs or tables into fluent text, and vice versa.
These tasks are usually handled separately and use corpora extracted from a
single source. Current systems leverage pre-trained language models fine-tuned
on D2T or T2D tasks. This approach has two main limitations: first, a separate
system has to be tuned for each task and source; second, learning is limited by
the scarcity of available corpora. This paper considers a more general scenario
where data are available from multiple heterogeneous sources. Each source, with
its specific data format and semantic domain, provides a non-parallel corpus of
text and structured data. We introduce a variational auto-encoder model with
disentangled style and content variables that allows us to represent the
diversity that stems from multiple sources of text and data. Our model is
designed to handle the tasks of D2T and T2D jointly. We evaluate our model on
several datasets, and show that by learning from multiple sources, our model
closes the performance gap with its supervised single-source counterpart and
outperforms it in some cases.
Related papers
- Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource
Agglutinative Data-to-Text Generation [9.80836683456026]
We tackle data-to-text for isiXhosa, which is low-resource and agglutinative.
We introduce Triples-to-isiXhosa (T2X), a new dataset based on a subset of WebNLG.
We develop an evaluation framework for T2X that measures how accurately generated text describes the data.
arXiv Detail & Related papers (2024-03-12T11:53:27Z) - SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
Auto-Generated Data [73.23388142296535]
SELMA improves the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets.
We show that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks.
We also show that fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data.
arXiv Detail & Related papers (2024-03-11T17:35:33Z) - Self-training from Self-memory in Data-to-text Generation [3.844398528249339]
This paper introduces a novel training model, self-training from self-memory (STSM) in data-to-text generation (DTG)
The quality of self-memory is validated by two models, data-to-text (D2T) and text-to-data (T2D)
arXiv Detail & Related papers (2024-01-19T09:13:28Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - What Makes Data-to-Text Generation Hard for Pretrained Language Models? [17.07349898176898]
Expressing natural language descriptions of structured facts or relations -- data-to-text generation (D2T) -- increases the accessibility of structured knowledge repositories.
Previous work shows that pre-trained language models(PLMs) perform remarkably well on this task after fine-tuning on a significant amount of task-specific training data.
We conduct an empirical study of both fine-tuned and auto-regressive PLMs on the DART multi-domain D2T dataset.
arXiv Detail & Related papers (2022-05-23T17:58:39Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via
Cycle Training [63.11444020743543]
Deep learning models for graph-to-text (G2T) and text-to-graph (T2G) conversion suffer from scarce training data.
We present CycleGT, an unsupervised training method that can bootstrap from non-parallel graph and text data, and iteratively back translate between the two forms.
arXiv Detail & Related papers (2020-06-08T15:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.