ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition
- URL: http://arxiv.org/abs/2308.07743v1
- Date: Tue, 15 Aug 2023 12:50:06 GMT
- Title: ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition
- Authors: Wenyuan Xue, Dapeng Chen, Baosheng Yu, Yifei Chen, Sai Zhou, Wei Peng
- Abstract summary: Current methods rely on keypoint detection to estimate data element shapes in charts but suffer from grouping errors in post-processing.
We propose ChartDETR, a transformer-based multi-shape detector that localizes keypoints at the corners of regular shapes to reconstruct multiple data elements in a single chart image.
Our method predicts all data element shapes at once by introducing query groups in set prediction, eliminating the need for further postprocessing.
- Score: 33.89209291115389
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual chart recognition systems are gaining increasing attention due to the
growing demand for automatically identifying table headers and values from
chart images. Current methods rely on keypoint detection to estimate data
element shapes in charts but suffer from grouping errors in post-processing. To
address this issue, we propose ChartDETR, a transformer-based multi-shape
detector that localizes keypoints at the corners of regular shapes to
reconstruct multiple data elements in a single chart image. Our method predicts
all data element shapes at once by introducing query groups in set prediction,
eliminating the need for further postprocessing. This property allows ChartDETR
to serve as a unified framework capable of representing various chart types
without altering the network architecture, effectively detecting data elements
of diverse shapes. We evaluated ChartDETR on three datasets, achieving
competitive results across all chart types without any additional enhancements.
For example, ChartDETR achieved an F1 score of 0.98 on Adobe Synthetic,
significantly outperforming the previous best model with a 0.71 F1 score.
Additionally, we obtained a new state-of-the-art result of 0.97 on
ExcelChart400k. The code will be made publicly available.
Related papers
- ChartEye: A Deep Learning Framework for Chart Information Extraction [2.4936576553283287]
In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline.
The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection.
Our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
arXiv Detail & Related papers (2024-08-28T20:22:39Z) - Advancing Chart Question Answering with Robust Chart Component Recognition [18.207819321127182]
We introduce a unified framework that enhances chart component recognition by accurately identifying and classifying components such as bars, lines, pies, titles, legends, and axes.
We also propose a novel Question-guided Deformable Co-Attention mechanism, which fuses chart features encoded by Chartformer with the given question, leveraging the question's guidance to ground the correct answer.
arXiv Detail & Related papers (2024-07-19T20:55:06Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z) - An extensible point-based method for data chart value detection [7.9137747195666455]
We present a method for identifying semantic points to reverse engineer data charts.
Our method uses a point proposal network to directly predict the position of points of interest in a chart.
We focus on complex bar charts in the scientific literature, on which our model is able to detect salient points with an accuracy of 0.8705 F1.
arXiv Detail & Related papers (2023-08-22T21:03:58Z) - GenPlot: Increasing the Scale and Diversity of Chart Derendering Data [0.0]
We propose GenPlot, a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.
OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks.
arXiv Detail & Related papers (2023-06-20T17:25:53Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - EGRC-Net: Embedding-induced Graph Refinement Clustering Network [66.44293190793294]
We propose a novel graph clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net)
EGRC-Net effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance.
Our proposed methods consistently outperform several state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-19T09:08:43Z) - Table2Charts: Recommending Charts by Learning Shared Table
Representations [61.68711232246847]
Table2Charts learns common patterns from a large corpus of (table, charts) pairs.
On a large spreadsheet corpus with 165k tables and 266k charts, we show that Table2Charts could learn a shared representation of table fields.
arXiv Detail & Related papers (2020-08-24T15:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.