Form2Seq : A Framework for Higher-Order Form Structure Extraction
- URL: http://arxiv.org/abs/2107.04419v1
- Date: Fri, 9 Jul 2021 13:10:51 GMT
- Title: Form2Seq : A Framework for Higher-Order Form Structure Extraction
- Authors: Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy
- Abstract summary: We propose a novel sequence-to-sequence (Seq2Seq) inspired framework for structure extraction using text.
We discuss two tasks; 1) Classification of low-level constituent elements into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields, and ChoiceGroups, used as information collection mechanism in forms.
Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above
- Score: 14.134131448981295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document structure extraction has been a widely researched area for decades
with recent works performing it as a semantic segmentation task over document
images using fully-convolution networks. Such methods are limited by image
resolution due to which they fail to disambiguate structures in dense regions
which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel
sequence-to-sequence (Seq2Seq) inspired framework for structure extraction
using text, with a specific focus on forms, which leverages relative spatial
arrangement of structures. We discuss two tasks; 1) Classification of low-level
constituent elements (TextBlock and empty fillable Widget) into ten types such
as field captions, list items, and others; 2) Grouping lower-level elements
into higher-order constructs, such as Text Fields, ChoiceFields and
ChoiceGroups, used as information collection mechanism in forms. To achieve
this, we arrange the constituent elements linearly in natural reading order,
feed their spatial and textual representations to Seq2Seq framework, which
sequentially outputs prediction of each element depending on the final task. We
modify Seq2Seq for grouping task and discuss improvements obtained through
cascaded end-to-end training of two tasks versus training in isolation.
Experimental results show the effectiveness of our text-based approach
achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01,
61.63 on groups discussed above respectively, outperforming segmentation
baselines. Further we show our framework achieves state of the results for
table structure recognition on ICDAR 2013 dataset.
Related papers
- SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding [55.48936731641802]
We present the SRFUND, a hierarchically structured multi-task form understanding benchmark.
SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets.
The dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese.
arXiv Detail & Related papers (2024-06-13T02:35:55Z) - From Text Segmentation to Smart Chaptering: A Novel Benchmark for
Structuring Video Transcriptions [63.11097464396147]
We introduce a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse.
We also introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-27T15:59:37Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Text Reading Order in Uncontrolled Conditions by Sparse Graph
Segmentation [71.40119152422295]
We propose a lightweight, scalable and generalizable approach to identify text reading order.
The model is language-agnostic and runs effectively across multi-language datasets.
It is small enough to be deployed on virtually any platform including mobile devices.
arXiv Detail & Related papers (2023-05-04T06:21:00Z) - Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document.
Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability.
We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z) - StrucTexT: Structured Text Understanding with Multi-Modal Transformers [29.540122964399046]
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence.
This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks.
We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts.
arXiv Detail & Related papers (2021-08-06T02:57:07Z) - Multi-Modal Association based Grouping for Form Structure Extraction [14.134131448981295]
We present a novel multi-modal approach for form structure extraction.
We extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups.
Our approach achieves a recall of 90.29%, 73.80%, 83.12%, and 52.72% for the above structures, respectively.
arXiv Detail & Related papers (2021-07-09T12:49:34Z) - Nested and Balanced Entity Recognition using Multi-Task Learning [0.0]
This paper introduces a partly-layered network architecture that deals with the complexity of overlapping and nested cases.
We train and evaluate this architecture to recognise two kinds of entities - Concepts (CR) and Named Entities (NER)
Our approach achieves state-of-the-art NER performances, while it outperforms previous CR approaches.
arXiv Detail & Related papers (2021-06-11T07:52:32Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Efficient strategies for hierarchical text classification: External
knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy.
With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.