LGPMA: Complicated Table Structure Recognition with Local and Global
Pyramid Mask Alignment
- URL: http://arxiv.org/abs/2105.06224v1
- Date: Thu, 13 May 2021 12:24:12 GMT
- Title: LGPMA: Complicated Table Structure Recognition with Local and Global
Pyramid Mask Alignment
- Authors: Liang Qiao and Zaisheng Li and Zhanzhan Cheng and Peng Zhang and
Shiliang Pu and Yi Niu and Wenqi Ren and Wenming Tan and Fei Wu
- Abstract summary: Table structure recognition is a challenging task due to the various structures and complicated cell spanning relations.
We propose the framework of Local and Global Pyramid Mask Alignment, which adopts the soft pyramid mask learning mechanism in both the local and global feature maps.
A pyramid mask re-scoring module is then integrated to compromise the local and global information and refine the predicted boundaries.
- Score: 54.768354427967296
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Table structure recognition is a challenging task due to the various
structures and complicated cell spanning relations. Previous methods handled
the problem starting from elements in different granularities (rows/columns,
text regions), which somehow fell into the issues like lossy heuristic rules or
neglect of empty cell division. Based on table structure characteristics, we
find that obtaining the aligned bounding boxes of text region can effectively
maintain the entire relevant range of different cells. However, the aligned
bounding boxes are hard to be accurately predicted due to the visual
ambiguities. In this paper, we aim to obtain more reliable aligned bounding
boxes by fully utilizing the visual information from both text regions in
proposed local features and cell relations in global features. Specifically, we
propose the framework of Local and Global Pyramid Mask Alignment, which adopts
the soft pyramid mask learning mechanism in both the local and global feature
maps. It allows the predicted boundaries of bounding boxes to break through the
limitation of original proposals. A pyramid mask re-scoring module is then
integrated to compromise the local and global information and refine the
predicted boundaries. Finally, we propose a robust table structure recovery
pipeline to obtain the final structure, in which we also effectively solve the
problems of empty cells locating and division. Experimental results show that
the proposed method achieves competitive and even new state-of-the-art
performance on several public benchmarks.
Related papers
- Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - BLADE: Box-Level Supervised Amodal Segmentation through Directed
Expansion [10.57956193654977]
Box-level supervised amodal segmentation addresses this challenge by relying solely on ground truth bounding boxes and instance classes as supervision.
We present a novel solution by introducing a directed expansion approach from visible masks to corresponding amodal masks.
Our approach involves a hybrid end-to-end network based on the overlapping region - the area where different instances intersect.
arXiv Detail & Related papers (2024-01-03T09:37:03Z) - TRUST: An Accurate and End-to-End Table structure Recognizer Using
Splitting-based Transformers [56.56591337457137]
We propose an accurate and end-to-end transformer-based table structure recognition method, referred to as TRUST.
Transformers are suitable for table structure recognition because of their global computations, perfect memory, and parallel computation.
We conduct experiments on several popular benchmarks including PubTabNet and SynthTable, our method achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-08-31T08:33:36Z) - Table Structure Recognition with Conditional Attention [13.976736586808308]
Table Structure Recognition (TSR) problem aims to recognize the structure of a table and transform the unstructured tables into a structured and machine-readable format.
In this study, we hypothesize that a complicated table structure can be represented by a graph whose vertices and edges represent the cells and association between cells, respectively.
Experimental results show that the alignment of a cell bounding box can help improve the Micro-averaged F1 score from 0.915 to 0.963, and the Macro-average F1 score from 0.787 to 0.923.
arXiv Detail & Related papers (2022-03-08T02:44:58Z) - Visual Understanding of Complex Table Structures from Document Images [32.95187519339354]
We propose a novel object-detection-based deep model that captures the inherent alignments of cells within tables.
We also aim to improve structure recognition by deducing a novel rectilinear graph-based formulation.
Our framework improves the previous state-of-the-art performance by a 2.7% average F1-score on benchmark datasets.
arXiv Detail & Related papers (2021-11-13T14:54:33Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Split, embed and merge: An accurate table structure recognizer [42.579215135672094]
We introduce Split, Embed and Merge (SEM) as an accurate table structure recognizer.
SEM can achieve an average F-Measure of $96.9%$ on the SciTSR dataset.
arXiv Detail & Related papers (2021-07-12T06:26:19Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.