Expository Text Generation: Imitate, Retrieve, Paraphrase
- URL: http://arxiv.org/abs/2305.03276v2
- Date: Mon, 23 Oct 2023 01:32:10 GMT
- Title: Expository Text Generation: Imitate, Retrieve, Paraphrase
- Authors: Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang
- Abstract summary: We propose the task of expository text generation, which seeks to automatically generate an accurate and stylistically consistent text for a topic.
We develop IRP, a framework that overcomes the limitations of retrieval-augmented models and iteratively performs content planning, fact retrieval, and rephrasing.
We show that IRP produces factual and organized expository texts that accurately inform readers.
- Score: 26.43857184008374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Expository documents are vital resources for conveying complex information to
readers. Despite their usefulness, writing expository text by hand is a
challenging process that requires careful content planning, obtaining facts
from multiple sources, and the ability to clearly synthesize these facts. To
ease these burdens, we propose the task of expository text generation, which
seeks to automatically generate an accurate and stylistically consistent
expository text for a topic by intelligently searching a knowledge source. We
solve our task by developing IRP, a framework that overcomes the limitations of
retrieval-augmented models and iteratively performs content planning, fact
retrieval, and rephrasing. Through experiments on three diverse,
newly-collected datasets, we show that IRP produces factual and organized
expository texts that accurately inform readers.
Related papers
- Towards Improving Document Understanding: An Exploration on
Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images.
We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model.
Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Layout-Aware Information Extraction for Document-Grounded Dialogue:
Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents.
LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents.
Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - Natural language processing for word sense disambiguation and
information extraction [0.0]
The thesis presents a new approach for Word Sense Disambiguation using thesaurus.
A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated.
The strategy concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning.
arXiv Detail & Related papers (2020-04-05T17:13:43Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.