Related papers: Document Automation Architectures: Updated Survey in Light of Large Language Models

Document Automation Architectures: Updated Survey in Light of Large Language Models

URL: http://arxiv.org/abs/2308.09341v1
Date: Fri, 18 Aug 2023 06:59:55 GMT
Title: Document Automation Architectures: Updated Survey in Light of Large Language Models
Authors: Mohammad Ahmadi Achachlouei, Omkar Patil, Tarun Joshi, Vijayan N. Nair
Abstract summary: This paper surveys the current state of the art in document automation (DA) The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies.
Score: 2.990411348977783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in generative AI and large language models.

Related papers

DREAM: Document Reconstruction via End-to-end Autoregressive Model [53.51754520966657]
We present an innovative autoregressive model specifically designed for document reconstruction, referred to as Document Reconstruction via End-to-end Autoregressive Model (DREAM)<n>We establish a standardized definition of the document reconstruction task, and introduce a novel Document Similarity Metric (DSM) and DocRec1K dataset for assessing the performance of the task.
arXiv Detail & Related papers (2025-07-08T09:24:07Z)
JARVIS: A Multi-Agent Code Assistant for High-Quality EDA Script Generation [3.6946337486060776]
JARVIS is a novel multi-agent framework that leverages Large Language Models (LLMs) and domain expertise to generate high-quality scripts for EDA tasks.<n>By combining a domain-specific LLM trained with synthetically generated data, a custom compiler for structural verification, rule enforcement, code fixing capabilities, and advanced retrieval mechanisms, our approach achieves significant improvements over state-of-the-art domain-specific models.
arXiv Detail & Related papers (2025-05-20T23:40:57Z)
A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation. We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs) We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z)
A Survey of Research in Large Language Models for Electronic Design Automation [5.426530967206322]
Large Language Models (LLMs) have emerged as transformative technologies. This survey focuses on advancements in model architectures, the implications of varying model sizes, and innovative customization techniques. It aims to offer valuable insights to professionals in the EDA industry, AI researchers, and anyone interested in the convergence of advanced AI technologies and electronic design.
arXiv Detail & Related papers (2025-01-16T16:51:59Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Document Understanding Dataset and Evaluation (DUDE) [29.78902147806488]
Document Understanding dataset and evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs) We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins, and dates.
arXiv Detail & Related papers (2023-05-15T08:54:32Z)
Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection. We provide an analysis of both classic and new applications in the field. The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z)
A Survey on Open Information Extraction from Rule-based Model to Large Language Model [29.017823043117144]
Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework.
arXiv Detail & Related papers (2022-08-18T08:03:45Z)
Document AI: Benchmarks, Models and Applications [35.46858492311289]
Document AI refers to the techniques for automatically reading, understanding, and analyzing business documents. In recent years, the popularity of deep learning technology has greatly advanced the development of Document AI. This paper briefly reviews some of the representative models, tasks, and benchmark datasets.
arXiv Detail & Related papers (2021-11-16T16:43:07Z)
Document Automation Architectures and Technologies: A Survey [0.0]
This paper surveys the current state of the art in document automation (DA) The objective of DA is to reduce the manual effort during the generation of documents by automatically integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies.
arXiv Detail & Related papers (2021-09-23T19:12:26Z)
Data-Driven Design-by-Analogy: State of the Art and Future Directions [11.025196033751786]
Design-by- Analogy (DbA) is a design methodology wherein new solutions, opportunities or designs are generated in a target domain based on inspiration drawn from a source domain. Recently, the increasingly available design databases and rapidly advancing data science and artificial intelligence technologies have presented new opportunities for developing data-driven methods and tools for DbA support.
arXiv Detail & Related papers (2021-06-03T04:35:34Z)
A Survey of Deep Learning Approaches for OCR and Document Understanding [68.65995739708525]
We review different techniques for document understanding for documents written in English. We consolidate methodologies present in literature to act as a jumping-off point for researchers exploring this area.
arXiv Detail & Related papers (2020-11-27T03:05:59Z)
A New Neural Search and Insights Platform for Navigating and Organizing AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature. We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z)
Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future. We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.