Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3
- URL: http://arxiv.org/abs/2304.13846v1
- Date: Wed, 26 Apr 2023 22:21:33 GMT
- Title: Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3
- Authors: Nicholas Walker, John Dagdelen, Kevin Cruse, Sanghoon Lee, Samuel
Gleason, Alexander Dunn, Gerbrand Ceder, A. Paul Alivisatos, Kristin A.
Persson, Anubhav Jain
- Abstract summary: We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
- Score: 52.59930033705221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although gold nanorods have been the subject of much research, the pathways
for controlling their shape and thereby their optical properties remain largely
heuristically understood. Although it is apparent that the simultaneous
presence of and interaction between various reagents during synthesis control
these properties, computational and experimental approaches for exploring the
synthesis space can be either intractable or too time-consuming in practice.
This motivates an alternative approach leveraging the wealth of synthesis
information already embedded in the body of scientific literature by developing
tools to extract relevant structured data in an automated, high-throughput
manner. To that end, we present an approach using the powerful GPT-3 language
model to extract structured multi-step seed-mediated growth procedures and
outcomes for gold nanorods from unstructured scientific text. GPT-3 prompt
completions are fine-tuned to predict synthesis templates in the form of JSON
documents from unstructured text input with an overall accuracy of $86\%$. The
performance is notable, considering the model is performing simultaneous entity
recognition and relation extraction. We present a dataset of 11,644 entities
extracted from 1,137 papers, resulting in 268 papers with at least one complete
seed-mediated gold nanorod growth procedure and outcome for a total of 332
complete procedures.
Related papers
- Compositional Deep Probabilistic Models of DNA Encoded Libraries [6.206196935093064]
We introduce a compositional deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular representations into their mono-synthon, di-synthon, and tri-synthon building blocks.
Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure.
arXiv Detail & Related papers (2023-10-20T19:04:28Z) - Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large
Language Models by Extrapolating Errors from Small Models [69.76066070227452]
*Data Synthesis* is a promising way to train a small model with very little labeled data.
We propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap.
Our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data.
arXiv Detail & Related papers (2023-10-20T17:14:25Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Interdisciplinary Discovery of Nanomaterials Based on Convolutional
Neural Networks [6.350788459498522]
We use CNN to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications.
Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively.
Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3% classification accuracy and 4.3% data extraction mean square error.
arXiv Detail & Related papers (2022-12-06T07:51:51Z) - PcMSP: A Dataset for Scientific Action Graphs Extraction from
Polycrystalline Materials Synthesis Procedure Text [1.9573380763700712]
This dataset simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations.
A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus.
We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations.
arXiv Detail & Related papers (2022-10-22T09:43:54Z) - Machine-Learning-Optimized Perovskite Nanoplatelet Synthesis [55.41644538483948]
We develop an algorithm to improve the quality of CsPbBr3 nanoplatelets (NPLs) using only 200 total syntheses.
The algorithm can predict the resulting PL emission maxima of the NPL dispersions based on the precursor ratios.
arXiv Detail & Related papers (2022-10-18T11:54:11Z) - RelationPrompt: Leveraging Prompts to Generate Synthetic Data for
Zero-Shot Relation Triplet Extraction [65.4337085607711]
We introduce the task setting of Zero-Shot Relation Triplet Extraction (ZeroRTE)
Given an input sentence, each extracted triplet consists of the head entity, relation label, and tail entity where the relation label is not seen at the training stage.
We propose to synthesize relation examples by prompting language models to generate structured texts.
arXiv Detail & Related papers (2022-03-17T05:55:14Z) - Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA)
We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets.
The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z) - Annotating and Extracting Synthesis Process of All-Solid-State Batteries
from Scientific Literature [10.443499579567069]
We present a novel corpus of the synthesis process for all-solid-state batteries and an automated machine reading system.
We define the representation of the synthesis processes using flow graphs, and create a corpus from the experimental sections of 243 papers.
The automated machine-reading system is developed by a deep learning-based sequence tagger and simple rule-based relation extractor.
arXiv Detail & Related papers (2020-02-18T02:30:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.