Context-Aware Chart Element Detection
- URL: http://arxiv.org/abs/2305.04151v2
- Date: Fri, 8 Sep 2023 18:10:35 GMT
- Title: Context-Aware Chart Element Detection
- Authors: Pengyu Yan, Saleem Ahmed, David Doermann
- Abstract summary: We propose a novel method CACHED, which stands for Context-Aware Chart Element Detection.
We refine the existing chart element categorization and standardized 18 classes for chart basic elements, excluding plot elements.
Our method achieves state-of-the-art performance in our experiments, underscoring the importance of context in chart element detection.
- Score: 0.22559617939136503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a prerequisite of chart data extraction, the accurate detection of chart
basic elements is essential and mandatory. In contrast to object detection in
the general image domain, chart element detection relies heavily on context
information as charts are highly structured data visualization formats. To
address this, we propose a novel method CACHED, which stands for Context-Aware
Chart Element Detection, by integrating a local-global context fusion module
consisting of visual context enhancement and positional context encoding with
the Cascade R-CNN framework. To improve the generalization of our method for
broader applicability, we refine the existing chart element categorization and
standardized 18 classes for chart basic elements, excluding plot elements. Our
CACHED method, with the updated category of chart elements, achieves
state-of-the-art performance in our experiments, underscoring the importance of
context in chart element detection. Extending our method to the bar plot
detection task, we obtain the best result on the PMC test dataset.
Related papers
- ChartKG: A Knowledge-Graph-Based Representation for Chart Images [9.781118203308438]
We propose a knowledge graph (KG) based representation for chart images, which can model the visual elements in a chart image and semantic relations among them.
It integrates a series of image processing techniques to identify visual elements and relations, e.g., CNNs to classify charts, yolov5 and optical character recognition to parse charts.
We present four cases to illustrate how our knowledge-graph-based representation can model the detailed visual elements and semantic relations in charts.
arXiv Detail & Related papers (2024-10-13T07:38:44Z) - ChartEye: A Deep Learning Framework for Chart Information Extraction [2.4936576553283287]
In this study, we propose a deep learning-based framework that provides a solution for key steps in the chart information extraction pipeline.
The proposed framework utilizes hierarchal vision transformers for the tasks of chart-type and text-role classification, while YOLOv7 for text detection.
Our proposed framework achieves excellent performance at every stage with F1-scores of 0.97 for chart-type classification, 0.91 for text-role classification, and a mean Average Precision of 0.95 for text detection.
arXiv Detail & Related papers (2024-08-28T20:22:39Z) - Hypergraph based Understanding for Document Semantic Entity Recognition [65.84258776834524]
We build a novel hypergraph attention document semantic entity recognition framework, HGA, which uses hypergraph attention to focus on entity boundaries and entity categories at the same time.
Our results on FUNSD, CORD, XFUNDIE show that our method can effectively improve the performance of semantic entity recognition tasks.
arXiv Detail & Related papers (2024-07-09T14:35:49Z) - FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearn dataset is a resource tailored to enhance the understanding of flowcharts.
The scientific subset contains 3,858 flowcharts sourced from scientific literature.
The simulated subset contains 10,000 flowcharts created using a customizable script.
arXiv Detail & Related papers (2024-07-06T20:58:51Z) - TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification [59.779532652634295]
We propose an embarrassingly simple approach to better align image and text features with no need of additional data formats other than image-text pairs.
We parse objects and attributes from the description, which are highly likely to exist in the image.
Experiments substantiate the average 5.2% improvement of our framework over existing alternatives.
arXiv Detail & Related papers (2023-12-21T18:59:06Z) - StructChart: Perception, Structuring, Reasoning for Visual Chart
Understanding [58.38480335579541]
Current chart-related tasks focus on either chart perception which refers to extracting information from the visual charts, or performing reasoning given the extracted data.
In this paper, we aim to establish a unified and label-efficient learning paradigm for joint perception and reasoning tasks.
Experiments are conducted on various chart-related tasks, demonstrating the effectiveness and promising potential for a unified chart perception-reasoning paradigm.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - UniChart: A Universal Vision-language Pretrained Model for Chart
Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning.
UniChart encodes the relevant text, data, and visual elements of charts and then uses a chart-grounded text decoder to generate the expected output in natural language.
We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills.
arXiv Detail & Related papers (2023-05-24T06:11:17Z) - Let the Chart Spark: Embedding Semantic Context into Chart with
Text-to-Image Generative Model [7.587729429265939]
Pictorial visualization seamlessly integrates data and semantic context into visual representation.
We propose ChartSpark, a novel system that embeds semantic context into chart based on text-to-image generative model.
We develop an interactive visual interface that integrates a text analyzer, editing module, and evaluation module to enable users to generate, modify, and assess pictorial visualizations.
arXiv Detail & Related papers (2023-04-28T05:18:30Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.