MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
- URL: http://arxiv.org/abs/2502.00322v1
- Date: Sat, 01 Feb 2025 05:08:14 GMT
- Title: MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
- Authors: Nishant Balepur, Alexa Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun, Jordan Boyd-Graber, Puneet Mathur,
- Abstract summary: We introduce Debatable QFS, a task to create summaries that answer queries via documents with opposing perspectives.
We design MODS, a multi-LLM framework mirroring human panel discussions.
Experiments on ConflictingQA with controversial web queries and DebateQFS, our new dataset of debate queries from Debatepedia, show MODS beats SOTA by 38-59% in topic paragraph coverage and balance.
- Score: 57.588478932185005
- License:
- Abstract: Query-focused summarization (QFS) gives a summary of documents to answer a query. Past QFS work assumes queries have one answer, ignoring debatable ones (Is law school worth it?). We introduce Debatable QFS (DQFS), a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must comprehensively cover all sources and balance perspectives, favoring no side. These goals elude LLM QFS systems, which: 1) lack structured content plans, failing to guide LLMs to write balanced summaries, and 2) use the same query to retrieve contexts across documents, failing to cover all perspectives specific to each document's content. To overcome this, we design MODS, a multi-LLM framework mirroring human panel discussions. MODS treats documents as individual Speaker LLMs and has a Moderator LLM that picks speakers to respond to tailored queries for planned topics. Speakers use tailored queries to retrieve relevant contexts from their documents and supply perspectives, which are tracked in a rich outline, yielding a content plan to guide the final summary. Experiments on ConflictingQA with controversial web queries and DebateQFS, our new dataset of debate queries from Debatepedia, show MODS beats SOTA by 38-59% in topic paragraph coverage and balance, based on new citation metrics. Users also find MODS's summaries to be readable and more balanced.
Related papers
- Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA)
Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents.
We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z) - Contri(e)ve: Context + Retrieve for Scholarly Question Answering [0.0]
We present a two step solution using open source Large Language Model(LLM): Llama3.1 for Scholarly-QALD dataset.
Firstly, we extract the context pertaining to the question from different structured and unstructured data sources.
Secondly, we implement prompt engineering to improve the information retrieval performance of the LLM.
arXiv Detail & Related papers (2024-09-13T17:38:47Z) - Integrating SPARQL and LLMs for Question Answering over Scholarly Data Sources [0.0]
This paper describes a methodology that combines SPARQL queries, divide and conquer algorithms, and a pre-trained extractive question answering model.
It starts with SPARQL queries to gather data, then applies divide and conquer to manage various question types and sources, and uses the model to handle personal author questions.
The approach, evaluated with Exact Match and F-score metrics, shows promise for improving QA accuracy and efficiency in scholarly contexts.
arXiv Detail & Related papers (2024-09-11T14:50:28Z) - DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems [99.17123445211115]
We introduce DocBench, a benchmark to evaluate large language model (LLM)-based document reading systems.
Our benchmark involves the recruitment of human annotators and the generation of synthetic questions.
It includes 229 real documents and 1,102 questions, spanning across five different domains and four major types of questions.
arXiv Detail & Related papers (2024-07-15T13:17:42Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models [21.95962189710859]
We propose a lightweight, end-to-end framework to execute the Spoken Question Answering (SQA) task on the LibriSQA dataset.
By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks.
Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs.
arXiv Detail & Related papers (2023-08-20T23:47:23Z) - DAPR: A Benchmark on Document-Aware Passage Retrieval [57.45793782107218]
We propose and name this task emphDocument-Aware Passage Retrieval (DAPR)
While analyzing the errors of the State-of-The-Art (SoTA) passage retrievers, we find the major errors (53.5%) are due to missing document context.
Our created benchmark enables future research on developing and comparing retrieval systems for the new task.
arXiv Detail & Related papers (2023-05-23T10:39:57Z) - LMGQS: A Large-scale Dataset for Query-focused Summarization [77.6179359525065]
We convert four generic summarization benchmarks into a new QFS benchmark dataset, LMGQS.
We establish baselines with state-of-the-art summarization models.
We achieve state-of-the-art zero-shot and supervised performance on multiple existing QFS benchmarks.
arXiv Detail & Related papers (2023-05-22T14:53:45Z) - AnswerQuest: A System for Generating Question-Answer Items from
Multi-Paragraph Documents [1.0896567381206712]
We demo a system that integrates the tasks of question answering (QA) and question generation (QG) in order to produce Q&A items that convey the content of multi-paragraph documents.
We report some experiments for QA and QG that yield improvements on both tasks, and assess how they interact to produce a list of Q&A items for a text.
arXiv Detail & Related papers (2021-03-05T17:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.