Expository Text Generation: Imitate, Retrieve, Paraphrase
- URL: http://arxiv.org/abs/2305.03276v2
- Date: Mon, 23 Oct 2023 01:32:10 GMT
- Title: Expository Text Generation: Imitate, Retrieve, Paraphrase
- Authors: Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang
- Abstract summary: We propose the task of expository text generation, which seeks to automatically generate an accurate and stylistically consistent text for a topic.
We develop IRP, a framework that overcomes the limitations of retrieval-augmented models and iteratively performs content planning, fact retrieval, and rephrasing.
We show that IRP produces factual and organized expository texts that accurately inform readers.
- Score: 26.43857184008374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Expository documents are vital resources for conveying complex information to
readers. Despite their usefulness, writing expository text by hand is a
challenging process that requires careful content planning, obtaining facts
from multiple sources, and the ability to clearly synthesize these facts. To
ease these burdens, we propose the task of expository text generation, which
seeks to automatically generate an accurate and stylistically consistent
expository text for a topic by intelligently searching a knowledge source. We
solve our task by developing IRP, a framework that overcomes the limitations of
retrieval-augmented models and iteratively performs content planning, fact
retrieval, and rephrasing. Through experiments on three diverse,
newly-collected datasets, we show that IRP produces factual and organized
expository texts that accurately inform readers.
Related papers
- Map&Make: Schema Guided Text to Table Generation [41.52038779169547]
Text-to-Table generation is an essential task for information retrieval.<n>We introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements.<n>Our approach is tested against two challenging datasets, Rotowire and Livesum.
arXiv Detail & Related papers (2025-05-29T07:12:46Z) - Writing Like the Best: Exemplar-Based Expository Text Generation [23.631195575124924]
We introduce the Exemplar-Based Expository Text Generation task, aiming to generate an expository text on a new topic using an exemplar on a similar topic.<n>Current methods fall short due to their reliance on extensive exemplar data, difficulty in adapting topic-specific content, and issues with long-text coherence.<n>We propose the concept of Adaptive Imitation and present a novel Recurrent Plan-then-Adapt framework.
arXiv Detail & Related papers (2025-05-24T20:40:39Z) - Generative Compositor for Few-Shot Visual Information Extraction [60.663887314625164]
We propose a novel generative model, named Generative generative spatialtor, to address the challenge of few-shot VIE.
Generative generative spatialtor is a hybrid pointer-generator network that emulates the operations of a compositor by retrieving words from the source text.
The proposed method achieves highly competitive results in the full-sample training, while notably outperforms the baseline in the 1-shot, 5-shot, and 10-shot settings.
arXiv Detail & Related papers (2025-03-21T04:56:24Z) - RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery [69.41989381702858]
Existing methods, such as direct generation and multi-agent discussion, often struggle with issues like hallucinations, topic incoherence, and significant latency.
We propose RAPID, an efficient retrieval-augmented long text generation framework.
Our work provides a robust and efficient solution to the challenges of automated long-text generation.
arXiv Detail & Related papers (2025-03-02T06:11:29Z) - ConTReGen: Context-driven Tree-structured Retrieval for Open-domain Long-form Text Generation [26.4086456393314]
Long-form text generation requires coherent, comprehensive responses that address complex queries with both breadth and depth.
Existing iterative retrieval-augmented generation approaches often struggle to delve deeply into each facet of complex queries.
This paper introduces ConTReGen, a novel framework that employs a context-driven, tree-structured retrieval approach.
arXiv Detail & Related papers (2024-10-20T21:17:05Z) - Contextual Knowledge Pursuit for Faithful Visual Synthesis [33.191847768674826]
In large language models (LLMs), a prevalent strategy to reduce hallucinations is to retrieve factual knowledge from an external database.
This paper proposes Conparametric Knowledge Pursuit (CKPT), a framework that leverages the complementary strengths of external and parametric knowledge to help generators produce reliable visual content.
arXiv Detail & Related papers (2023-11-29T18:51:46Z) - Towards Improving Document Understanding: An Exploration on
Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images.
We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model.
Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - Natural language processing for word sense disambiguation and
information extraction [0.0]
The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
arXiv Detail & Related papers (2020-04-05T17:13:43Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.