Related papers: Form2Seq : A Framework for Higher-Order Form Structure Extraction

Form2Seq : A Framework for Higher-Order Form Structure Extraction

URL: http://arxiv.org/abs/2107.04419v1
Date: Fri, 9 Jul 2021 13:10:51 GMT
Title: Form2Seq : A Framework for Higher-Order Form Structure Extraction
Authors: Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy
Abstract summary: We propose a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text. We discuss two tasks; 1) Classification of low-level constituent elements into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields, and ChoiceGroups, used as information collection mechanism in forms. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above
Score: 14.134131448981295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.

Related papers

Enhancing Frame Detection with Retrieval Augmented Generation [2.5782420501870296]
We present the first RAG-based approach for frame detection called RCIF (Retrieve Candidates and Identify Frames) Our results show that our retrieval component significantly reduces the complexity of the task by narrowing the search space. Our approach achieves state-of-the-art performance on FrameNet 1.5 and 1.7, demonstrating its robustness in scenarios where only raw text is provided.
arXiv Detail & Related papers (2025-02-17T02:34:02Z)
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets. The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z)
From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation [71.40119152422295]
We propose a lightweight, scalable and generalizable approach to identify text reading order. The model is language-agnostic and runs effectively across multi-language datasets. It is small enough to be deployed on virtually any platform including mobile devices.
arXiv Detail & Related papers (2023-05-04T06:21:00Z)
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document. Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability. We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z)
StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z)
Multi-Modal Association based Grouping for Form Structure Extraction [14.134131448981295]
We present a novel multi-modal approach for form structure extraction. We extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups. Our approach achieves a recall of 90.29%, 73.80%, 83.12%, and 52.72% for the above structures, respectively.
arXiv Detail & Related papers (2021-07-09T12:49:34Z)
Nested and Balanced Entity Recognition using Multi-Task Learning [0.0]
This paper introduces a partly-layered network architecture that deals with the complexity of overlapping and nested cases. We train and evaluate this architecture to recognise two kinds of entities - Concepts (CR) and Named Entities (NER) Our approach achieves state-of-the-art NER performances, while it outperforms previous CR approaches.
arXiv Detail & Related papers (2021-06-11T07:52:32Z)
Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks. We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.