Rethinking Image-based Table Recognition Using Weakly Supervised Methods
- URL: http://arxiv.org/abs/2303.07641v1
- Date: Tue, 14 Mar 2023 06:03:57 GMT
- Title: Rethinking Image-based Table Recognition Using Weakly Supervised Methods
- Authors: Nam Tuan Ly, Atsuhiro Takasu, Phuc Nguyen, and Hideaki Takeda
- Abstract summary: We propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or) code-level annotations of table images.
To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available table image-based dataset built from Wikipedia.
- Score: 3.9993134366218857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of the previous methods for table recognition rely on training datasets
containing many richly annotated table images. Detailed table image annotation,
e.g., cell or text bounding box annotation, however, is costly and often
subjective. In this paper, we propose a weakly supervised model named WSTabNet
for table recognition that relies only on HTML (or LaTeX) code-level
annotations of table images. The proposed model consists of three main parts:
an encoder for feature extraction, a structure decoder for generating table
structure, and a cell decoder for predicting the content of each cell in the
table. Our system is trained end-to-end by stochastic gradient descent
algorithms, requiring only table images and their ground-truth HTML (or LaTeX)
representations. To facilitate table recognition with deep learning, we create
and release WikiTableSet, the largest publicly available image-based table
recognition dataset built from Wikipedia. WikiTableSet contains nearly 4
million English table images, 590K Japanese table images, and 640k French table
images with corresponding HTML representation and cell bounding boxes. The
extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet
and PubTabNet demonstrate that the proposed weakly supervised model achieves
better, or similar accuracies compared to the state-of-the-art models on all
benchmark datasets.
Related papers
- UniTabNet: Bridging Vision and Language Models for Enhanced Table Structure Recognition [55.153629718464565]
We introduce UniTabNet, a novel framework for table structure parsing based on the image-to-text model.
UniTabNet employs a divide-and-conquer'' strategy, utilizing an image-to-text model to decouple table cells and integrating both physical and logical decoders to reconstruct the complete table structure.
arXiv Detail & Related papers (2024-09-20T01:26:32Z) - LaTable: Towards Large Tabular Models [63.995130144110156]
Tabular generative foundation models are hard to build due to the heterogeneous feature spaces of different datasets.
LaTable is a novel diffusion model that addresses these challenges and can be trained across different datasets.
We find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples.
arXiv Detail & Related papers (2024-06-25T16:03:50Z) - Multimodal Table Understanding [26.652797853893233]
How to directly understand tables using intuitive visual information is a crucial and urgent challenge for developing more practical applications.
We propose a new problem, multimodal table understanding, where the model needs to generate correct responses to various table-related requests.
We develop Table-LLaVA, a generalist multimodal large language model (MLLM), which significantly outperforms recent open-source MLLM baselines on 23 benchmarks.
arXiv Detail & Related papers (2024-06-12T11:27:03Z) - TGRNet: A Table Graph Reconstruction Network for Table Structure
Recognition [76.06530816349763]
We propose an end-to-end trainable table graph reconstruction network (TGRNet) for table structure recognition.
Specifically, the proposed method has two main branches, a cell detection branch and a cell logical location branch, to jointly predict the spatial location and the logical location of different cells.
arXiv Detail & Related papers (2021-06-20T01:57:05Z) - Multi-Type-TD-TSR -- Extracting Tables from Document Images using a
Multi-stage Pipeline for Table Detection and Table Structure Recognition:
from OCR to Structured Table Representations [63.98463053292982]
The recognition of tables consists of two main tasks, namely table detection and table structure recognition.
Recent work shows a clear trend towards deep learning approaches coupled with the use of transfer learning for the task of table structure recognition.
We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for the problem of table recognition.
arXiv Detail & Related papers (2021-05-23T21:17:18Z) - TabLeX: A Benchmark Dataset for Structure and Content Information
Extraction from Scientific Tables [1.4115224153549193]
This paper presents TabLeX, a large-scale benchmark dataset comprising table images generated from scientific articles.
To facilitate the development of robust table IE tools, TabLeX contains images in different aspect ratios and in a variety of fonts.
Our analysis sheds light on the shortcomings of current state-of-the-art table extraction models and shows that they fail on even simple table images.
arXiv Detail & Related papers (2021-05-12T05:13:38Z) - Web Table Classification based on Visual Features [1.52292571922932]
We propose an approach for web table classification by exploiting the full visual appearance of a table.
The evaluation of CNN image classification with fine tuned ResNet50 shows that this approach achieves results comparable to previous solutions.
arXiv Detail & Related papers (2021-02-25T07:39:19Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z) - GFTE: Graph-based Financial Table Extraction [66.26206038522339]
In financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images.
We publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds.
We propose a novel graph-based convolutional network model named GFTE as a baseline for future comparison.
arXiv Detail & Related papers (2020-03-17T07:10:05Z) - Identifying Table Structure in Documents using Conditional Generative
Adversarial Networks [0.0]
In many industries and in academic research, information is primarily transmitted in the form of unstructured documents.
We propose a top-down approach, first using a conditional generative adversarial network to map a table image into a standardised skeleton' table form.
We then deriving latent table structure using xy-cut projection and Genetic Algorithm optimisation.
arXiv Detail & Related papers (2020-01-13T20:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.