Related papers: Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization

Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization

URL: http://arxiv.org/abs/2212.10449v3
Date: Thu, 8 Jun 2023 22:43:58 GMT
Title: Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
Authors: Artidoro Pagnoni, Alexander R. Fabbri, Wojciech Kry\'sci\'nski, Chien-Sheng Wu
Abstract summary: Socratic pretraining is a question-driven, unsupervised pretraining objective designed to improve controllability in summarization tasks. Our results show that Socratic pretraining cuts task-specific labeled data requirements in half.
Score: 89.04537372465612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In long document controllable summarization, where labeled data is scarce, pretrained models struggle to adapt to the task and effectively respond to user queries. In this paper, we introduce Socratic pretraining, a question-driven, unsupervised pretraining objective specifically designed to improve controllability in summarization tasks. By training a model to generate and answer relevant questions in a given context, Socratic pretraining enables the model to more effectively adhere to user-provided queries and identify relevant content to be summarized. We demonstrate the effectiveness of this approach through extensive experimentation on two summarization domains, short stories and dialogue, and multiple control strategies: keywords, questions, and factoid QA pairs. Our pretraining method relies only on unlabeled documents and a question generation system and outperforms pre-finetuning approaches that use additional supervised data. Furthermore, our results show that Socratic pretraining cuts task-specific labeled data requirements in half, is more faithful to user-provided queries, and achieves state-of-the-art performance on QMSum and SQuALITY.

Related papers

Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering [20.969921246457414]
PRAISE is a pipeline-based approach for ConvQA that trains adapters for each of the three subtasks. We show that PRAISE shows improvements per subtask and achieves new state-of-the-art performance on a popular ConvQA benchmark.
arXiv Detail & Related papers (2025-03-28T10:26:49Z)
CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning [12.249938312431993]
We propose a novel Cyclic Prompt Aggregation (CAPrompt) method to eliminate the dependency on task ID prediction. Under concave conditions, the aggregated prompt achieves lower error compared to selecting a single task-specific prompt. Our proposed CAPrompt outperforms state-of-the-art methods by 2%-3%.
arXiv Detail & Related papers (2024-12-12T04:34:28Z)
One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering [31.025439143093585]
Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. These models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. We propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models.
arXiv Detail & Related papers (2024-11-04T16:04:59Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Unified Pretraining for Recommendation via Task Hypergraphs [55.98773629788986]
We propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation.
arXiv Detail & Related papers (2023-10-20T05:33:21Z)
Abstractive Query Focused Summarization with Query-Free Resources [60.468323530248945]
In this work, we consider the problem of leveraging only generic summarization resources to build an abstractive QFS system. We propose Marge, a Masked ROUGE Regression framework composed of a novel unified representation for summaries and queries. Despite learning from minimal supervision, our system achieves state-of-the-art results in the distantly supervised setting.
arXiv Detail & Related papers (2020-12-29T14:39:35Z)
Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.