Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning
- URL: http://arxiv.org/abs/2210.04473v1
- Date: Mon, 10 Oct 2022 07:59:36 GMT
- Title: Leveraging Key Information Modeling to Improve Less-Data Constrained
News Headline Generation via Duality Fine-Tuning
- Authors: Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng and Bo Ren
- Abstract summary: We propose a novel duality fine-tuning method by formally defining the probabilistic duality constraints between key information prediction and headline generation tasks.
The proposed method can capture more information from limited data, build connections between separate tasks, and is suitable for less-data constrained generation tasks.
We conduct extensive experiments to demonstrate that our method is effective and efficient to achieve improved performance in terms of language modeling metric and informativeness correctness metric on two public datasets.
- Score: 12.443476695459553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent language generative models are mostly trained on large-scale datasets,
while in some real scenarios, the training datasets are often expensive to
obtain and would be small-scale. In this paper we investigate the challenging
task of less-data constrained generation, especially when the generated news
headlines are short yet expected by readers to keep readable and informative
simultaneously. We highlight the key information modeling task and propose a
novel duality fine-tuning method by formally defining the probabilistic duality
constraints between key information prediction and headline generation tasks.
The proposed method can capture more information from limited data, build
connections between separate tasks, and is suitable for less-data constrained
generation tasks. Furthermore, the method can leverage various pre-trained
generative regimes, e.g., autoregressive and encoder-decoder models. We conduct
extensive experiments to demonstrate that our method is effective and efficient
to achieve improved performance in terms of language modeling metric and
informativeness correctness metric on two public datasets.
Related papers
- Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction [7.3292387742640415]
We propose to incorporate richer training dynamics information into a prototypical contrastive learning framework.
We conduct empirical evaluations of our approach using two large-scale naturalistic datasets.
arXiv Detail & Related papers (2024-04-18T23:12:46Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Generative Deduplication For Socia Media Data Selection [4.545354973721937]
We propose a novel Generative Deduplication framework for social media data selection.
Our model acts as an efficient pre-processing method to universally enhance social media NLP pipelines.
arXiv Detail & Related papers (2024-01-11T12:43:26Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Curriculum-Based Self-Training Makes Better Few-Shot Learners for
Data-to-Text Generation [56.98033565736974]
We propose Curriculum-Based Self-Training (CBST) to leverage unlabeled data in a rearranged order determined by the difficulty of text generation.
Our method can outperform fine-tuning and task-adaptive pre-training methods, and achieve state-of-the-art performance in the few-shot setting of data-to-text generation.
arXiv Detail & Related papers (2022-06-06T16:11:58Z) - Self-augmented Data Selection for Few-shot Dialogue Generation [18.794770678708637]
We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
arXiv Detail & Related papers (2022-05-19T16:25:50Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z) - Iterative Data Programming for Expanding Text Classification Corpora [9.152045698511506]
Real-world text classification tasks often require many labeled training examples that are expensive to obtain.
Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data sets quickly.
We present a fast, simple data programming method for augmenting text data sets by generating neighborhood-based weak models.
arXiv Detail & Related papers (2020-02-04T17:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.