Transformers and Language Models in Form Understanding: A Comprehensive
Review of Scanned Document Analysis
- URL: http://arxiv.org/abs/2403.04080v1
- Date: Wed, 6 Mar 2024 22:22:02 GMT
- Title: Transformers and Language Models in Form Understanding: A Comprehensive
Review of Scanned Document Analysis
- Authors: Abdelrahman Abdallah and Daniel Eberharter and Zoe Pfister and Adam
Jatowt
- Abstract summary: We focus on the topic of form understanding in the context of scanned documents.
Our research methodology involves an in-depth analysis of popular documents and forms of understanding of trends over the last decade.
We showcase how transformers have propelled the field forward, revolutionizing form-understanding techniques.
- Score: 16.86139440201837
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a comprehensive survey of research works on the topic of
form understanding in the context of scanned documents. We delve into recent
advancements and breakthroughs in the field, highlighting the significance of
language models and transformers in solving this challenging task. Our research
methodology involves an in-depth analysis of popular documents and forms of
understanding of trends over the last decade, enabling us to offer valuable
insights into the evolution of this domain. Focusing on cutting-edge models, we
showcase how transformers have propelled the field forward, revolutionizing
form-understanding techniques. Our exploration includes an extensive
examination of state-of-the-art language models designed to effectively tackle
the complexities of noisy scanned documents. Furthermore, we present an
overview of the latest and most relevant datasets, which serve as essential
benchmarks for evaluating the performance of selected models. By comparing and
contrasting the capabilities of these models, we aim to provide researchers and
practitioners with useful guidance in choosing the most suitable solutions for
their specific form understanding tasks.
Related papers
- Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Abstractive Text Summarization: State of the Art, Challenges, and Improvements [6.349503549199403]
This review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements.
The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics.
arXiv Detail & Related papers (2024-09-04T03:39:23Z) - Synthesizing Scientific Summaries: An Extractive and Abstractive Approach [0.5904095466127044]
We propose a hybrid methodology for research paper summarisation.
We use two models based on unsupervised learning for the extraction stage and two transformer language models.
We find that using certain combinations of hyper parameters, it is possible for automated summarisation systems to exceed the abstractiveness of summaries written by humans.
arXiv Detail & Related papers (2024-07-29T08:21:42Z) - A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing [8.171572460041823]
Talking head synthesis is an advanced method for generating portrait videos from a still image driven by specific content.
This survey systematically reviews the technology, categorizing it into three pivotal domains: portrait generation, driven mechanisms, and editing techniques.
arXiv Detail & Related papers (2024-06-15T08:14:59Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - Visual Analytics for Generative Transformer Models [28.251218916955125]
We present a novel visual analytical framework to support the analysis of transformer-based generative networks.
Our framework is one of the first dedicated to supporting the analysis of transformer-based encoder-decoder models.
arXiv Detail & Related papers (2023-11-21T08:15:01Z) - Learn From Model Beyond Fine-Tuning: A Survey [78.80920533793595]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.
The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.
This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.