Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization
- URL: http://arxiv.org/abs/2212.10449v3
- Date: Thu, 8 Jun 2023 22:43:58 GMT
- Title: Socratic Pretraining: Question-Driven Pretraining for Controllable
Summarization
- Authors: Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kry\'sci\'nski,
Chien-Sheng Wu
- Abstract summary: Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks.
Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
- Score: 89.04537372465612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In long document controllable summarization, where labeled data is scarce,
pretrained models struggle to adapt to the task and effectively respond to user
queries. In this paper, we introduce Socratic pretraining, a question-driven,
unsupervised pretraining objective specifically designed to improve
controllability in summarization tasks. By training a model to generate and
answer relevant questions in a given context, Socratic pretraining enables the
model to more effectively adhere to user-provided queries and identify relevant
content to be summarized. We demonstrate the effectiveness of this approach
through extensive experimentation on two summarization domains, short stories
and dialogue, and multiple control strategies: keywords, questions, and factoid
QA pairs. Our pretraining method relies only on unlabeled documents and a
question generation system and outperforms pre-finetuning approaches that use
additional supervised data. Furthermore, our results show that Socratic
pretraining cuts task-specific labeled data requirements in half, is more
faithful to user-provided queries, and achieves state-of-the-art performance on
QMSum and SQuALITY.
Related papers
- One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs.
For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction.
A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z) - Weakly Supervised Pre-Training for Multi-Hop Retriever [23.79574380039197]
We propose a new method for weakly supervised multi-hop retriever pre-training without human efforts.
Our method includes 1) a pre-training task for generating vector representations of complex questions, 2) a scalable data generation method that produces the nested structure of question and sub-question as weak supervision for pre-training, and 3) a pre-training model structure based on dense encoders.
arXiv Detail & Related papers (2021-06-18T08:06:02Z) - Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system.
We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries.
Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.