GFTE: Graph-based Financial Table Extraction
- URL: http://arxiv.org/abs/2003.07560v1
- Date: Tue, 17 Mar 2020 07:10:05 GMT
- Title: GFTE: Graph-based Financial Table Extraction
- Authors: Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye and Xianhui Liu
- Abstract summary: In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images.
We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds.
We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
- Score: 66.26206038522339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular data is a crucial form of information expression, which can organize
data in a standard structure for easy information retrieval and comparison.
However, in financial industry and many other fields tables are often disclosed
in unstructured digital files, e.g. Portable Document Format (PDF) and images,
which are difficult to be extracted directly. In this paper, to facilitate deep
learning based table extraction from unstructured digital files, we publish a
standard Chinese dataset named FinTab, which contains more than 1,600 financial
tables of diverse kinds and their corresponding structure representation in
JSON. In addition, we propose a novel graph-based convolutional neural network
model named GFTE as a baseline for future comparison. GFTE integrates image
feature, position feature and textual feature together for precise edge
prediction and reaches overall good results.
Related papers
- UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition [55.153629718464565]
We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model.
UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
arXiv Detail & Related papers (2024-09-20T01:26:32Z) - Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs [79.0426838808629]
We propose TAT-DQA, i.e. to answer the question over a visually-rich table-text document.
Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability.
We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set.
arXiv Detail & Related papers (2023-05-03T07:30:32Z) - Graph Neural Networks and Representation Embedding for Table Extraction
in PDF Documents [1.1859913430860336]
The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks.
We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.
arXiv Detail & Related papers (2022-08-23T21:36:01Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TabLeX: A Benchmark Dataset for Structure and Content Information
Extraction from Scientific Tables [1.4115224153549193]
This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles.
To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts.
Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images.
arXiv Detail & Related papers (2021-05-12T05:13:38Z) - Deep Structured Feature Networks for Table Detection and Tabular Data
Extraction from Scanned Financial Document Images [0.6299766708197884]
This research is proposing an automated table detection and tabular data extraction from financial PDF documents.
We proposed a method that consists of three main processes, which are detecting table areas with a Faster R-CNN (Region-based Convolutional Neural Network) model.
The excellent table detection performance of the detection model is obtained from our customized dataset.
arXiv Detail & Related papers (2021-02-20T08:21:17Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.