Text2Data: Low-Resource Data Generation with Textual Control
- URL: http://arxiv.org/abs/2402.10941v1
- Date: Thu, 8 Feb 2024 03:41:39 GMT
- Title: Text2Data: Low-Resource Data Generation with Textual Control
- Authors: Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang,
Caiming Xiong, Silvio Savarese
- Abstract summary: Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
- Score: 104.38011760992637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language serves as a common and straightforward control signal for
humans to interact seamlessly with machines. Recognizing the importance of this
interface, the machine learning community is investing considerable effort in
generating data that is semantically coherent with textual instructions. While
strides have been made in text-to-data generation spanning image editing, audio
synthesis, video creation, and beyond, low-resource areas characterized by
expensive annotations or complex data structures, such as molecules, motion
dynamics, and time series, often lack textual labels. This deficiency impedes
supervised learning, thereby constraining the application of advanced
generative models for text-to-data tasks. In response to these challenges in
the low-resource scenario, we propose Text2Data, a novel approach that utilizes
unlabeled data to understand the underlying data distribution through an
unsupervised diffusion model. Subsequently, it undergoes controllable
finetuning via a novel constraint optimization-based learning objective that
ensures controllability and effectively counteracts catastrophic forgetting.
Comprehensive experiments demonstrate that Text2Data is able to achieve
enhanced performance regarding controllability across various modalities,
including molecules, motions and time series, when compared to existing
baselines.
Related papers
- Joint-Dataset Learning and Cross-Consistent Regularization for Text-to-Motion Retrieval [4.454835029368504]
We focus on the recently introduced text-motion retrieval which aim to search for sequences that are most relevant to a natural motion description.
Despite recent efforts to explore these promising avenues, a primary challenge remains the insufficient data available to train robust text-motion models.
We propose to investigate joint-dataset learning - where we train on multiple text-motion datasets simultaneously.
We also introduce a transformer-based motion encoder, called MoT++, which employs the specified-temporal attention to process sequences of skeleton data.
arXiv Detail & Related papers (2024-07-02T09:43:47Z) - Learning Generalizable Human Motion Generator with Reinforcement Learning [95.62084727984808]
Text-driven human motion generation is one of the vital tasks in computer-aided content creation.
Existing methods often overfit specific motion expressions in the training data, hindering their ability to generalize.
We present textbfInstructMotion, which incorporate the trail and error paradigm in reinforcement learning for generalizable human motion generation.
arXiv Detail & Related papers (2024-05-24T13:29:12Z) - Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts.
Recent data augmentation methods often neglect the problem of grammatical incorrectness.
We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z) - STA: Self-controlled Text Augmentation for Improving Text
Classifications [2.9669250132689164]
A number of text augmentation techniques have emerged in the field of Natural Language Processing (NLP)
We introduce a state-of-the-art approach for Self-Controlled Text Augmentation (STA)
Our approach tightly controls the generation process by introducing a self-checking procedure to ensure that generated examples retain the semantic content of the original text.
arXiv Detail & Related papers (2023-02-24T17:54:12Z) - Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning [12.443476695459553]
We propose a novel duality fine-tuning method by formally defining the probabilistic duality constraints between key information prediction and headline generation tasks.
The proposed method can capture more information from limited data, build connections between separate tasks, and is suitable for less-data constrained generation tasks.
We conduct extensive experiments to demonstrate that our method is effective and efficient to achieve improved performance in terms of language modeling metric and informativeness correctness metric on two public datasets.
arXiv Detail & Related papers (2022-10-10T07:59:36Z) - Leveraging Natural Supervision for Language Representation Learning and
Generation [8.083109555490475]
We describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision.
We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks.
We propose a framework that uses paraphrase pairs to disentangle semantics and syntax in sentence representations.
arXiv Detail & Related papers (2022-07-21T17:26:03Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Data-to-text Generation with Macro Planning [61.265321323312286]
We propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods.
Our approach outperforms competitive baselines in terms of automatic and human evaluation.
arXiv Detail & Related papers (2021-02-04T16:32:57Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.