Related papers: LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain

LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain

URL: http://arxiv.org/abs/2402.14871v1
Date: Wed, 21 Feb 2024 13:54:53 GMT
Title: LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain
Authors: Emanuele Musumeci, Michele Brienza, Vincenzo Suriani, Daniele Nardi, Domenico Daniele Bloisi
Abstract summary: Large Language Models (LLMs) have enabled the creation of customized text output satisfying user requests. We propose a novel approach that combines the LLMs with prompt engineering and multi-agent systems for generating new documents compliant with a desired structure.
Score: 2.3999111269325266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the last years' digitalization process, the creation and management of documents in various domains, particularly in Public Administration (PA), have become increasingly complex and diverse. This complexity arises from the need to handle a wide range of document types, often characterized by semi-structured forms. Semi-structured documents present a fixed set of data without a fixed format. As a consequence, a template-based solution cannot be used, as understanding a document requires the extraction of the data structure. The recent introduction of Large Language Models (LLMs) has enabled the creation of customized text output satisfying user requests. In this work, we propose a novel approach that combines the LLMs with prompt engineering and multi-agent systems for generating new documents compliant with a desired structure. The main contribution of this work concerns replacing the commonly used manual prompting with a task description generated by semantic retrieval from an LLM. The potential of this approach is demonstrated through a series of experiments and case studies, showcasing its effectiveness in real-world PA scenarios.

Related papers

Benchmarking Multimodal Understanding and Complex Reasoning for ESG Tasks [56.350173737493215]
Environmental, Social, and Governance (ESG) reports are essential for evaluating sustainability practices, ensuring regulatory compliance, and promoting financial transparency.<n>MMESGBench is a first-of-its-kind benchmark dataset to evaluate multimodal understanding and complex reasoning across structurally diverse and multi-source ESG documents.<n>MMESGBench comprises 933 validated QA pairs derived from 45 ESG documents, spanning across seven distinct document types and three major ESG source categories.
arXiv Detail & Related papers (2025-07-25T03:58:07Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [28.47810405584841]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
Docopilot: Improving Multimodal Models for Document-Level Understanding [87.60020625241178]
We present a high-quality document-level dataset, Doc-750K, designed to support in-depth understanding of multimodal documents.<n>This dataset includes diverse document structures, extensive cross-page dependencies, and real question-answer pairs derived from the original documents.<n>Building on the dataset, we develop a native multimodal model, Docopilot, which can accurately handle document-level dependencies without relying on RAG.
arXiv Detail & Related papers (2025-07-19T16:03:34Z)
DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
Unsupervised Document and Template Clustering using Multimodal Embeddings [0.0]
This paper investigates a novel approach to unsupervised document clustering by leveraging multimodal embeddings as input.<n>Our method aims to achieve a finer-grained document understanding by grouping documents at the type level and distinguishing between different templates.<n>We evaluated the effectiveness of this approach using embeddings generated by several state-of-the-art pretrained multimodal models.
arXiv Detail & Related papers (2025-06-13T14:07:44Z)
Relation-Rich Visual Document Generator for Visual Information Extraction [12.4941229258054]
We propose a Relation-rIch visual Document GEnerator (RIDGE) that addresses these limitations through a two-stage approach. Our method significantly enhances the performance of document understanding models on various VIE benchmarks.
arXiv Detail & Related papers (2025-04-14T19:19:26Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding. Existing solutions often rely on task-specific architectures and objectives for individual tasks. In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction [23.47150047875133]
Document parsing is essential for converting unstructured and semi-structured documents into machine-readable data. Document parsing plays an indispensable role in both knowledge base construction and training data generation. This paper discusses the challenges faced by modular document parsing systems and vision-language models in handling complex layouts.
arXiv Detail & Related papers (2024-10-28T16:11:35Z)
Multi-Field Adaptive Retrieval [39.38972160512916]
We introduce Multi-Field Adaptive Retrieval (MFAR), a flexible framework that accommodates any number of document indices on structured data. Our framework consists of two main steps: (1) the decomposition of an existing document into fields, each indexed independently through dense and lexical methods, and (2) learning a model which adaptively predicts the importance of a field by conditioning on the document query. We find that our approach allows for the optimized use of dense versus lexical representations across field types, significantly improves in document ranking over a number of existing retrievers, and achieves state-of-the-art performance for multi-field structured
arXiv Detail & Related papers (2024-10-26T03:07:22Z)
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models [73.13933847198395]
We propose a training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate answers to produce the final output.
arXiv Detail & Related papers (2024-10-12T03:13:44Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
LAPDoc: Layout-Aware Prompting for Documents [3.523208537466128]
We investigate the possibility to use purely text-based LLMs for document-specific tasks by using layout enrichment. Our results indicate that layout enrichment can improve the performance of purely text-based LLMs for document understanding by up to 15%.
arXiv Detail & Related papers (2024-02-15T10:00:49Z)
PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. We propose PDFTriage that enables models to retrieve the context based on either structure or content. Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z)
DocumentNet: Bridging the Data Gap in Document Pre-Training [78.01647768018485]
We propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models. The collected dataset, named DocumentNet, does not depend on specific document types or entity sets. Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training.
arXiv Detail & Related papers (2023-06-15T08:21:15Z)
Absformer: Transformer-based Model for Unsupervised Multi-Document Abstractive Summarization [1.066048003460524]
Multi-document summarization (MDS) refers to the task of summarizing the text in multiple documents into a concise summary. Abstractive MDS aims to generate a coherent and fluent summary for multiple documents using natural language generation techniques. We propose Absformer, a new Transformer-based method for unsupervised abstractive summary generation.
arXiv Detail & Related papers (2023-06-07T21:18:23Z)
A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization [66.08074487429477]
Pre-trained language models (PLMs) have achieved outstanding achievements in abstractive single-document summarization (SDS) We propose a new method to better utilize a PLM to facilitate multi-document interactions for the multi-document summarization (MDS) task. Our method outperforms its corresponding PLM backbone by up to 3 Rouge-L and is favored by humans.
arXiv Detail & Related papers (2023-05-15T10:03:31Z)
PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream [33.68263291948121]
We propose a new summarization problem, Evolving Multi-Document sets stream Summarization (EMDS) We introduce a novel unsupervised algorithm PDSum with the idea of prototype-driven continuous summarization. PDSum builds a lightweight prototype of each multi-document set and exploits it to adapt to new documents.
arXiv Detail & Related papers (2023-02-10T23:43:46Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.