MatGD: Materials Graph Digitizer
- URL: http://arxiv.org/abs/2311.12806v1
- Date: Tue, 19 Sep 2023 07:19:16 GMT
- Title: MatGD: Materials Graph Digitizer
- Authors: Jaewoong Lee, Wonseok Lee, Jihan Kim
- Abstract summary: MatGD (Material Graph Digitizer) is a tool for digitizing a data line from scientific graphs.
From the 62,534 papers in the areas of batteries, MOFs, 501,045 figures were mined.
Our tool showcased performance with over 99% accuracy in legend marker and text detection.
- Score: 2.4857235004269165
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We have developed MatGD (Material Graph Digitizer), which is a tool for
digitizing a data line from scientific graphs. The algorithm behind the tool
consists of four steps: (1) identifying graphs within subfigures, (2)
separating axes and data sections, (3) discerning the data lines by eliminating
irrelevant graph objects and matching with the legend, and (4) data extraction
and saving. From the 62,534 papers in the areas of batteries, catalysis, and
MOFs, 501,045 figures were mined. Remarkably, our tool showcased performance
with over 99% accuracy in legend marker and text detection. Moreover, its
capability for data line separation stood at 66%, which is much higher compared
to other existing figure mining tools. We believe that this tool will be
integral to collecting both past and future data from publications, and these
data can be used to train various machine learning models that can enhance
material predictions and new materials discovery.
Related papers
- Towards Data-centric Graph Machine Learning: Review and Outlook [120.64417630324378]
We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle.
A thorough taxonomy of each stage is presented to answer three critical graph-centric questions.
We pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.
arXiv Detail & Related papers (2023-09-20T00:40:13Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Line Graphics Digitization: A Step Towards Full Automation [29.017383766914406]
We present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories.
Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines.
Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection.
arXiv Detail & Related papers (2023-07-05T07:08:58Z) - Bures-Wasserstein Means of Graphs [60.42414991820453]
We propose a novel framework for defining a graph mean via embeddings in the space of smooth graph signal distributions.
By finding a mean in this embedding space, we can recover a mean graph that preserves structural information.
We establish the existence and uniqueness of the novel graph mean, and provide an iterative algorithm for computing it.
arXiv Detail & Related papers (2023-05-31T11:04:53Z) - Curriculum Graph Machine Learning: A Survey [51.89783017927647]
curriculum graph machine learning (Graph CL) integrates the strength of graph machine learning and curriculum learning.
This paper comprehensively overview approaches on Graph CL and present a detailed survey of recent advances in this direction.
arXiv Detail & Related papers (2023-02-06T16:59:25Z) - A Framework for Large Scale Synthetic Graph Dataset Generation [2.248608623448951]
This work proposes a scalable synthetic graph generation tool to scale the datasets to production-size graphs.
The tool learns a series of parametric models from proprietary datasets that can be released to researchers.
We demonstrate the generalizability of the framework across a series of datasets.
arXiv Detail & Related papers (2022-10-04T22:41:33Z) - Data Augmentation for Deep Graph Learning: A Survey [66.04015540536027]
We first propose a taxonomy for graph data augmentation and then provide a structured review by categorizing the related work based on the augmented information modalities.
Focusing on the two challenging problems in DGL (i.e., optimal graph learning and low-resource graph learning), we also discuss and review the existing learning paradigms which are based on graph data augmentation.
arXiv Detail & Related papers (2022-02-16T18:30:33Z) - VizExtract: Automatic Relation Extraction from Data Visualizations [7.2241069295727955]
This paper presents a framework for automatically extracting compared variables from statistical charts.
We leverage a computer vision based framework to automatically identify and localize visualization facets in line graphs, scatter plots, or bar graphs.
In controlled experiments, our framework is able to classify, with 87.5% accuracy, the correlation between variables for graphs with 1-3 series per graph, varying colors, and solid line styles.
arXiv Detail & Related papers (2021-12-07T04:27:08Z) - CHARTER: heatmap-based multi-type chart data extraction [7.838284602257369]
We present a method and a system for end-to-end conversion of document charts into machine readable data format.
Our approach extracts and analyses charts along with their graphical elements and supporting structures.
Our detection system is based on neural networks, trained solely on synthetic data.
arXiv Detail & Related papers (2021-11-28T11:01:21Z) - Plot2Spectra: an Automatic Spectra Extraction Tool [10.64947007982639]
This paper develops a plot digitizer, named Plot2Spectra, to extract data points from spectroscopy graph images in an automatic fashion.
In the first axis alignment stage, we adopt an anchor-free detector to detect the plot region and then refine the detected bounding boxes.
In the second plot data extraction stage, we first employ semantic segmentation to separate pixels belonging to plot lines from the background.
arXiv Detail & Related papers (2021-07-06T18:17:28Z) - Promoting Graph Awareness in Linearized Graph-to-Text Generation [72.83863719868364]
We study the ability of linearized models to encode local graph structures.
Our findings motivate solutions to enrich the quality of models' implicit graph encodings.
We find that these denoising scaffolds lead to substantial improvements in downstream generation in low-resource settings.
arXiv Detail & Related papers (2020-12-31T18:17:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.