Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization
- URL: http://arxiv.org/abs/2212.10449v3
- Date: Thu, 8 Jun 2023 22:43:58 GMT
- Title: Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization
- Authors: Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kry\'sci\'nski,
Chien-Sheng Wu
- Abstract summary: Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
- Score: 89.04537372465612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In long document controllable summarization, where labeled data is scarce,
pretrained models struggle to adapt to the task and effectively respond to user
queries. In this paper, we introduce Socratic pretraining, a question-driven,
unsupervised pretraining objective specifically designed to improve
controllability in summarization tasks. By training a model to generate and
answer relevant questions in a given context, Socratic pretraining enables the
model to more effectively adhere to user-provided queries and identify relevant
content to be summarized. We demonstrate the effectiveness of this approach
through extensive experimentation on two summarization domains, short stories
and dialogue, and multiple control strategies: keywords, questions, and factoid
QA pairs. Our pretraining method relies only on unlabeled documents and a
question generation system and outperforms pre-finetuning approaches that use
additional supervised data. Furthermore, our results show that Socratic
pretraining cuts task-specific labeled data requirements in half, is more
faithful to user-provided queries, and achieves state-of-the-art performance on
QMSum and SQuALITY.
Related papers
- One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering [31.025439143093585]
Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets.
These models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks.
We propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models.
arXiv Detail & Related papers (2024-11-04T16:04:59Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs.
For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction.
A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.