Image and Data Mining in Reticular Chemistry Using GPT-4V
- URL: http://arxiv.org/abs/2312.05468v1
- Date: Sat, 9 Dec 2023 05:05:25 GMT
- Title: Image and Data Mining in Reticular Chemistry Using GPT-4V
- Authors: Zhiling Zheng, Zhiguo He, Omar Khattab, Nakul Rampal, Matei A.
Zaharia, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi
- Abstract summary: GPT-4V is a large language model featuring enhanced vision capabilities, accessible through ChatGPT or an API.
This study demonstrates the remarkable ability of GPT-4V to navigate and obtain complex data for metal-organic frameworks.
- Score: 5.440238820637818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of artificial intelligence into scientific research has
reached a new pinnacle with GPT-4V, a large language model featuring enhanced
vision capabilities, accessible through ChatGPT or an API. This study
demonstrates the remarkable ability of GPT-4V to navigate and obtain complex
data for metal-organic frameworks, especially from graphical sources. Our
approach involved an automated process of converting 346 scholarly articles
into 6240 images, which represents a benchmark dataset in this task, followed
by deploying GPT-4V to categorize and analyze these images using natural
language prompts. This methodology enabled GPT-4V to accurately identify and
interpret key plots integral to MOF characterization, such as nitrogen
isotherms, PXRD patterns, and TGA curves, among others, with accuracy and
recall above 93%. The model's proficiency in extracting critical information
from these plots not only underscores its capability in data mining but also
highlights its potential in aiding the creation of comprehensive digital
databases for reticular chemistry. In addition, the extracted nitrogen isotherm
data from the selected literature allowed for a comparison between theoretical
and experimental porosity values for over 200 compounds, highlighting certain
discrepancies and underscoring the importance of integrating computational and
experimental data. This work highlights the potential of AI in accelerating
scientific discovery and innovation, bridging the gap between computational
tools and experimental research, and paving the way for more efficient,
inclusive, and comprehensive scientific inquiry.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices [43.511428925893675]
This paper presents a novel approach for predicting Power Conversion Efficiency (PCE) of Organic Photovoltaic (OPV) devices, called GLaD: synergizing molecular Graphs and Language Descriptors.
We collect a dataset consisting of 500 pairs of OPV donor and acceptor molecules along with their corresponding PCE values, which we utilize as the training data for our predictive model.
GLaD achieves precise predictions of PCE, thereby facilitating the synthesis of new OPV molecules with improved efficiency.
arXiv Detail & Related papers (2024-05-23T06:02:07Z) - Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo [0.5110571587151475]
'RetChemQA' is a benchmark dataset designed to evaluate the capabilities of machine learning models in the domain of reticular chemistry.
This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type.
The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group.
arXiv Detail & Related papers (2024-05-03T14:29:54Z) - Extracting Protein-Protein Interactions (PPIs) from Biomedical
Literature using Attention-based Relational Context Information [5.456047952635665]
This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels.
A Transformer-based deep learning method exploits entities' relational context information for relation representation to improve relation classification performance.
The model's performance is evaluated on four widely studied biomedical relation extraction datasets.
arXiv Detail & Related papers (2024-03-08T01:43:21Z) - AutoIE: An Automated Framework for Information Extraction from
Scientific Literature [6.235887933544583]
AutoIE is a framework designed to automate the extraction of vital data from scientific PDF documents.
Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets.
This research paves the way for enhanced data management and interpretation in molecular sieve synthesis.
arXiv Detail & Related papers (2024-01-30T01:45:03Z) - Mining experimental data from Materials Science literature with Large Language Models: an evaluation study [1.9849264945671101]
This study is dedicated to assessing the capabilities of large language models (LLMs) in extracting structured information from scientific documents in materials science.
We focus on two critical tasks of information extraction: (i) a named entity recognition (NER) of studied materials and physical properties and (ii) a relation extraction (RE) between these entities.
The performance of LLMs in executing these tasks is benchmarked against traditional models based on the BERT architecture and rule-based approaches (baseline)
arXiv Detail & Related papers (2024-01-19T23:00:31Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.