ChartEye: A Deep Learning Framework for Chart Information Extraction
- URL: http://arxiv.org/abs/2408.16123v1
- Date: Wed, 28 Aug 2024 20:22:39 GMT
- Title: ChartEye: A Deep Learning Framework for Chart Information Extraction
- Authors: Osama Mustafa, Muhammad Khizer Ali, Momina Moetesum, Imran Siddiqi,
- Abstract summary: In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline.
The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection.
Our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
- Score: 2.4936576553283287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked process due to style variations and, as a consequence, it is challenging to design an end-to-end system. In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline. The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection. The detected text is then enhanced using Super Resolution Generative Adversarial Networks to improve the recognition output of the OCR. Experimental results on a benchmark dataset show that our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
Related papers
- On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - An extensible point-based method for data chart value detection [7.9137747195666455]
We present a method for identifying semantic points to reverse engineer data charts.
Our method uses a point proposal network to directly predict the position of points of interest in a chart.
We focus on complex bar charts in the scientific literature, on which our model is able to detect salient points with an accuracy of 0.8705 F1.
arXiv Detail & Related papers (2023-08-22T21:03:58Z) - ChartDETR: A Multi-shape Detection Network for Visual Chart Recognition [33.89209291115389]
Current methods rely on keypoint detection to estimate data element shapes in charts but suffer from grouping errors in post-processing.
We propose ChartDETR, a transformer-based multi-shape detector that localizes keypoints at the corners of regular shapes to reconstruct multiple data elements in a single chart image.
Our method predicts all data element shapes at once by introducing query groups in set prediction, eliminating the need for further postprocessing.
arXiv Detail & Related papers (2023-08-15T12:50:06Z) - Context-Aware Chart Element Detection [0.22559617939136503]
We propose a novel method CACHED, which stands for Context-Aware Chart Element Detection.
We refine the existing chart element categorization and standardized 18 classes for chart basic elements, excluding plot elements.
Our method achieves state-of-the-art performance in our experiments, underscoring the importance of context in chart element detection.
arXiv Detail & Related papers (2023-05-07T00:08:39Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Towards an efficient framework for Data Extraction from Chart Images [27.114170963444074]
We adopt state-of-the-art computer vision techniques for the data extraction stage in a data mining system.
For building a robust point detector, a fully convolutional network with feature fusion module is adopted.
For data conversion, we translate the detected element into data with semantic value.
arXiv Detail & Related papers (2021-05-05T13:18:53Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Towards Robust Visual Information Extraction in Real World: New Dataset
and Novel Solution [30.438041837029875]
We propose a robust visual information extraction system (VIES) towards real-world scenarios.
VIES is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction.
We construct a fully-annotated dataset called EPHOIE, which is the first Chinese benchmark for both text spotting and visual information extraction.
arXiv Detail & Related papers (2021-01-24T11:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.