SPEC: Summary Preference Decomposition for Low-Resource Abstractive
Summarization
- URL: http://arxiv.org/abs/2303.14011v1
- Date: Fri, 24 Mar 2023 14:07:03 GMT
- Title: SPEC: Summary Preference Decomposition for Low-Resource Abstractive
Summarization
- Authors: Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai
- Abstract summary: We present a framework to transfer few-shot learning processes from source corpora to the target corpus.
Our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.
- Score: 21.037841262371355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural abstractive summarization has been widely studied and achieved great
success with large-scale corpora. However, the considerable cost of annotating
data motivates the need for learning strategies under low-resource settings. In
this paper, we investigate the problems of learning summarizers with only few
examples and propose corresponding methods for improvements. First, typical
transfer learning methods are prone to be affected by data properties and
learning objectives in the pretext tasks. Therefore, based on pretrained
language models, we further present a meta learning framework to transfer
few-shot learning processes from source corpora to the target corpus. Second,
previous methods learn from training examples without decomposing the content
and preference. The generated summaries could therefore be constrained by the
preference bias in the training set, especially under low-resource settings. As
such, we propose decomposing the contents and preferences during learning
through the parameter modulation, which enables control over preferences during
inference. Third, given a target application, specifying required preferences
could be non-trivial because the preferences may be difficult to derive through
observations. Therefore, we propose a novel decoding method to automatically
estimate suitable preferences and generate corresponding summary candidates
from the few training examples. Extensive experiments demonstrate that our
methods achieve state-of-the-art performance on six diverse corpora with
30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on
ROUGE-1/2/L under 10- and 100-example settings.
Related papers
- Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation [1.3157419797035321]
The article proposes a novel small sample instance segmentation solution from the perspective of maximizing the utilization of existing information.
First, it helps the model fully utilize unlabeled data by learning to generate pseudo labels, increasing the number of available samples.
Second, by integrating the features of text and image, more accurate classification results can be obtained.
arXiv Detail & Related papers (2024-10-21T14:44:08Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning [36.619804184427245]
Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
arXiv Detail & Related papers (2023-08-22T14:06:40Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning.
We tackle this issue by using an approach inspired by transfer learning.
We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z) - Exploiting All Samples in Low-Resource Sentence Classification: Early Stopping and Initialization Parameters [6.368871731116769]
In this study, we discuss how to exploit labeled samples without additional data or model redesigns.
We propose an integrated method, which is to initialize the model with a weight averaging method and use a non-validation stop method to train all samples.
Our results highlight the importance of the training strategy and suggest that the integrated method can be the first step in the low-resource setting.
arXiv Detail & Related papers (2021-11-12T22:31:47Z) - Few-shot Classification via Adaptive Attention [93.06105498633492]
We propose a novel few-shot learning method via optimizing and fast adapting the query sample representation based on very few reference samples.
As demonstrated experimentally, the proposed model achieves state-of-the-art classification results on various benchmark few-shot classification and fine-grained recognition datasets.
arXiv Detail & Related papers (2020-08-06T05:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.