Related papers: Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards

Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards

URL: http://arxiv.org/abs/2310.03473v1
Date: Thu, 5 Oct 2023 11:29:09 GMT
Title: Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards
Authors: Litton J Kurisinkel, Nancy F chen
Abstract summary: Controllability is a matter of concern when it comes to text generation tasks with long inputs, such as multi-document summarization. We train a controllable content extraction scheme to extract the text that will be refined by an LLM. Our approach yields competitive results in the evaluation using ROUGE metrics and outperforms potential baselines in coherence.
Score: 42.171703872560286
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Memory-efficient large language models are good at refining text input for better readability. However, controllability is a matter of concern when it comes to text generation tasks with long inputs, such as multi-document summarization. In this work, we investigate for a generic controllable approach for multi-document summarization that leverages the capabilities of LLMs to refine the text. In particular, we train a controllable content extraction scheme to extract the text that will be refined by an LLM. The scheme is designed with a novel coverage and coherence intuitive policy, which is duly rewarded by a passively trained LLM. Our approach yields competitive results in the evaluation using ROUGE metrics and outperforms potential baselines in coherence, as per human evaluation.

Related papers

IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [35.686125031177234]
Multi-Document Summarization (MDS) is a challenging task that focuses on extracting and synthesizing useful information from multiple lengthy documents. We propose a novel framework that leverages inference-time scaling for this task. We also introduce two new evaluation metrics: Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
Enhancing Annotated Bibliography Generation with LLM Ensembles [0.0]
Large Language Model ensembles are introduced and validated to enhance model performance in scholarly tasks. Results show a 38% improvement in annotation quality and a 51% reduction in content redundancy.
arXiv Detail & Related papers (2024-12-30T11:07:05Z)
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding [41.43688559565315]
We present a novel OCR-free document understanding framework based on pretrained Multimodal Large Language Models (MLLMs) Our approach employs multi-scale visual features to handle various font sizes within document images. We introduce a novel instruction tuning task, which facilitates the model's text-reading capability by learning to predict the relative positions of input text.
arXiv Detail & Related papers (2024-11-08T00:58:12Z)
Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach. This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets. We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z)
One Arrow, Many Targets: Probing LLMs for Multi-Attribute Controllable Text Summarization [7.734726150561089]
Multi-Attribute Controllable Summarization (MACS) is a well-established task within the natural language processing (NLP) community. This work addresses the gap by examining the MACS task through the lens of large language models. We propose and evaluate a novel hierarchical adapter fusion technique to integrate learnings from two distinct controllable attributes.
arXiv Detail & Related papers (2024-11-02T11:07:25Z)
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding [40.38251904765156]
Text-rich document understanding (TDU) requires comprehensive analysis of documents containing substantial textual content and complex layouts. We introduce DocLayLLM, an efficient multi-modal extension of Multimodal Large Language Models (MLLMs) specifically designed for TDU.
arXiv Detail & Related papers (2024-08-27T13:13:38Z)
RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z)
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? [19.814974042343028]
We investigate the controllability of large language models (LLMs) on scientific summarization tasks. We find that non-fine-tuned LLMs outperform humans in the MuP review generation task.
arXiv Detail & Related papers (2024-01-18T23:00:54Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization [132.25202059478065]
We benchmark large language models (LLMs) on instruction controllable text summarization. Our study reveals that instruction controllable text summarization remains a challenging task for LLMs.
arXiv Detail & Related papers (2023-11-15T18:25:26Z)
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation [94.59630161324013]
We propose CoAnnotating, a novel paradigm for Human-LLM co-annotation of unstructured texts at scale. Our empirical study shows CoAnnotating to be an effective means to allocate work from results on different datasets, with up to 21% performance improvement over random baseline.
arXiv Detail & Related papers (2023-10-24T08:56:49Z)
An Enhanced MeanSum Method For Generating Hotel Multi-Review Summarizations [0.06091702876917279]
This work uses Multi-Aspect Masker(MAM) as content selector to address the issue with multi-aspect. We also propose a regularizer to control the length of the generated summaries. Our improved model achieves higher ROUGE, Sentiment Accuracy than the original Meansum method.
arXiv Detail & Related papers (2020-12-07T13:16:01Z)
Interpretable Multi-Headed Attention for Abstractive Summarization at Controllable Lengths [14.762731718325002]
Multi-level Summarizer (MLS) is a supervised method to construct abstractive summaries of a text document at controllable lengths. MLS outperforms strong baselines by up to 14.70% in the METEOR score.
arXiv Detail & Related papers (2020-02-18T19:40:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.