SPEC: Summary Preference Decomposition for Low-Resource Abstractive
Summarization
- URL: http://arxiv.org/abs/2303.14011v1
- Date: Fri, 24 Mar 2023 14:07:03 GMT
- Title: SPEC: Summary Preference Decomposition for Low-Resource Abstractive
Summarization
- Authors: Yi-Syuan Chen, Yun-Zhu Song, Hong-Han Shuai
- Abstract summary: We present a framework to transfer few-shot learning processes from source corpora to the target corpus.
Our methods achieve state-of-the-art performance on six diverse corpora with 30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on ROUGE-1/2/L under 10- and 100-example settings.
- Score: 21.037841262371355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural abstractive summarization has been widely studied and achieved great
success with large-scale corpora. However, the considerable cost of annotating
data motivates the need for learning strategies under low-resource settings. In
this paper, we investigate the problems of learning summarizers with only few
examples and propose corresponding methods for improvements. First, typical
transfer learning methods are prone to be affected by data properties and
learning objectives in the pretext tasks. Therefore, based on pretrained
language models, we further present a meta learning framework to transfer
few-shot learning processes from source corpora to the target corpus. Second,
previous methods learn from training examples without decomposing the content
and preference. The generated summaries could therefore be constrained by the
preference bias in the training set, especially under low-resource settings. As
such, we propose decomposing the contents and preferences during learning
through the parameter modulation, which enables control over preferences during
inference. Third, given a target application, specifying required preferences
could be non-trivial because the preferences may be difficult to derive through
observations. Therefore, we propose a novel decoding method to automatically
estimate suitable preferences and generate corresponding summary candidates
from the few training examples. Extensive experiments demonstrate that our
methods achieve state-of-the-art performance on six diverse corpora with
30.11%/33.95%/27.51% and 26.74%/31.14%/24.48% average improvements on
ROUGE-1/2/L under 10- and 100-example settings.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - CorDA: Context-Oriented Decomposition Adaptation of Large Language Models [101.81127587760831]
Current parameter-efficient fine-tuning methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain.
We propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable adapters from weight decomposition oriented by the context of downstream task or world knowledge.
Our knowledge-preserved adaptation not only achieves better performance than LoRA on finetuning tasks, but also mitigates the decomposed of world knowledge.
arXiv Detail & Related papers (2024-06-07T19:10:35Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning [36.619804184427245]
Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
arXiv Detail & Related papers (2023-08-22T14:06:40Z) - Revisiting Sample Size Determination in Natural Language Understanding [18.637079595450366]
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a beneficial step towards reducing the overall budgets for annotation.
We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples.
arXiv Detail & Related papers (2023-07-01T16:08:52Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning.
We tackle this issue by using an approach inspired by transfer learning.
We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z) - Making Pre-trained Language Models Better Few-shot Learners [11.90626040104822]
Recent GPT-3 model achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context.
Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient.
We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples.
arXiv Detail & Related papers (2020-12-31T17:21:26Z) - Few-shot Classification via Adaptive Attention [93.06105498633492]
We propose a novel few-shot learning method via optimizing and fast adapting the query sample representation based on very few reference samples.
As demonstrated experimentally, the proposed model achieves state-of-the-art classification results on various benchmark few-shot classification and fine-grained recognition datasets.
arXiv Detail & Related papers (2020-08-06T05:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.