Related papers: Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis

Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis

URL: http://arxiv.org/abs/2403.04080v1
Date: Wed, 6 Mar 2024 22:22:02 GMT
Title: Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Authors: Abdelrahman Abdallah and Daniel Eberharter and Zoe Pfister and Adam Jatowt
Abstract summary: We focus on the topic of form understanding in the context of scanned documents. Our research methodology involves an in-depth analysis of popular documents and forms of understanding of trends over the last decade. We showcase how transformers have propelled the field forward, revolutionizing form-understanding techniques.
Score: 16.86139440201837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a comprehensive survey of research works on the topic of form understanding in the context of scanned documents. We delve into recent advancements and breakthroughs in the field, highlighting the significance of language models and transformers in solving this challenging task. Our research methodology involves an in-depth analysis of popular documents and forms of understanding of trends over the last decade, enabling us to offer valuable insights into the evolution of this domain. Focusing on cutting-edge models, we showcase how transformers have propelled the field forward, revolutionizing form-understanding techniques. Our exploration includes an extensive examination of state-of-the-art language models designed to effectively tackle the complexities of noisy scanned documents. Furthermore, we present an overview of the latest and most relevant datasets, which serve as essential benchmarks for evaluating the performance of selected models. By comparing and contrasting the capabilities of these models, we aim to provide researchers and practitioners with useful guidance in choosing the most suitable solutions for their specific form understanding tasks.

Related papers

Vision Generalist Model: A Survey [87.49797517847132]
We provide a comprehensive overview of the vision generalist models, delving into their characteristics and capabilities within the field.<n>We take a brief excursion into related domains, shedding light on their interconnections and potential synergies.
arXiv Detail & Related papers (2025-06-11T17:23:41Z)
Exploring the Technology Landscape through Topic Modeling, Expert Involvement, and Reinforcement Learning [0.48342038441006807]
This study proposes a method that combines topic modeling, expert knowledge inputs, and reinforcement learning (RL) to enhance the detection of technological changes. Results demonstrate the method's effectiveness in identifying, ranking, and tracking trends that align with expert input.
arXiv Detail & Related papers (2025-01-22T22:18:50Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Abstractive Text Summarization: State of the Art, Challenges, and Improvements [6.349503549199403]
This review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements. The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics.
arXiv Detail & Related papers (2024-09-04T03:39:23Z)
Deep Learning based Visually Rich Document Content Understanding: A Survey [10.746453741520826]
Visually Rich Documents (VRDs) play a vital role in domains such as academia, finance, healthcare, and marketing.<n>Traditional approaches to extracting information from VRDs rely heavily on expert knowledge and manual annotation.<n>Recent advances in deep learning have transformed this landscape by enabling multimodal models that integrate vision, language, and layout features through pretraining.
arXiv Detail & Related papers (2024-08-02T14:19:34Z)
Synthesizing Scientific Summaries: An Extractive and Abstractive Approach [0.5904095466127044]
We propose a hybrid methodology for research paper summarisation. We use two models based on unsupervised learning for the extraction stage and two transformer language models. We find that using certain combinations of hyper parameters, it is possible for automated summarisation systems to exceed the abstractiveness of summaries written by humans.
arXiv Detail & Related papers (2024-07-29T08:21:42Z)
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing [8.171572460041823]
Talking head synthesis is an advanced method for generating portrait videos from a still image driven by specific content. This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques.
arXiv Detail & Related papers (2024-06-15T08:14:59Z)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Large foundation models, such as large language models, have revolutionized various natural language processing tasks. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z)
Visual Analytics for Generative Transformer Models [28.251218916955125]
We present a novel visual analytical framework to support the analysis of transformer-based generative networks. Our framework is one of the first dedicated to supporting the analysis of transformer-based encoder-decoder models.
arXiv Detail & Related papers (2023-11-21T08:15:01Z)
Learn From Model Beyond Fine-Tuning: A Survey [78.80920533793595]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
Deep Learning Schema-based Event Extraction: Literature Review and Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot. This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z)
Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text. In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.