Analyzing Research Trends in Inorganic Materials Literature Using NLP
- URL: http://arxiv.org/abs/2106.14157v1
- Date: Sun, 27 Jun 2021 06:29:10 GMT
- Title: Analyzing Research Trends in Inorganic Materials Literature Using NLP
- Authors: Fusataka Kuniyoshi and Jun Ozawa and Makoto Miwa
- Abstract summary: This study proposes a large-scale natural language processing (NLP) pipeline for extracting material names and properties from materials science literature.
We build a corpus containing 836 annotated paragraphs extracted from 301 papers for training a named entity recognition (NER) model.
Experimental results demonstrate the utility of this NER model; it achieves successful extraction with a micro-F1 score of 78.1%.
- Score: 8.645705008293838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of inorganic materials science, there is a growing demand to
extract knowledge such as physical properties and synthesis processes of
materials by machine-reading a large number of papers. This is because
materials researchers refer to many papers in order to come up with promising
terms of experiments for material synthesis. However, there are only a few
systems that can extract material names and their properties. This study
proposes a large-scale natural language processing (NLP) pipeline for
extracting material names and properties from materials science literature to
enable the search and retrieval of results in materials science. Therefore, we
propose a label definition for extracting material names and properties and
accordingly build a corpus containing 836 annotated paragraphs extracted from
301 papers for training a named entity recognition (NER) model. Experimental
results demonstrate the utility of this NER model; it achieves successful
extraction with a micro-F1 score of 78.1%. To demonstrate the efficacy of our
approach, we present a thorough evaluation on a real-world automatically
annotated corpus by applying our trained NER model to 12,895 materials science
papers. We analyze the trend in materials science by visualizing the outputs of
the NLP pipeline. For example, the country-by-year analysis indicates that in
recent years, the number of papers on "MoS2," a material used in perovskite
solar cells, has been increasing rapidly in China but decreasing in the United
States. Further, according to the conditions-by-year analysis, the processing
temperature of the catalyst material "PEDOT:PSS" is shifting below 200 degree,
and the number of reports with a processing time exceeding 5 h is increasing
slightly.
Related papers
- SciQu: Accelerating Materials Properties Prediction with Automated Literature Mining for Self-Driving Laboratories [0.7673339435080445]
Assessing different material properties to predict specific attributes is a fundamental requirement for materials science-based applications.
Our study addresses these challenges by leveraging machine learning to analyze material properties with greater precision and efficiency.
By automating the data extraction process and using the extracted information to train machine learning models, our developed model, SciQu, optimize material properties.
arXiv Detail & Related papers (2024-07-11T08:12:46Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Lessons in Reproducibility: Insights from NLP Studies in Materials
Science [4.205692673448206]
We aim to comprehend these studies from a perspective, acknowledging their significant influence on the field of materials informatics, rather than critiquing them.
Our study indicates that both papers offered thorough, tidy and well-documenteds, and clear guidance for model evaluation.
We highlight areas for improvement such as to provide access to training data where copyright restrictions permit, more transparency on model architecture and the training process, and specifications of software dependency versions.
arXiv Detail & Related papers (2023-07-28T18:36:42Z) - Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from
Literature with GPT-3 [52.59930033705221]
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.
arXiv Detail & Related papers (2023-04-26T22:21:33Z) - Application of Transformers based methods in Electronic Medical Records:
A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Interdisciplinary Discovery of Nanomaterials Based on Convolutional
Neural Networks [6.350788459498522]
We use CNN to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications.
Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively.
Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3% classification accuracy and 4.3% data extraction mean square error.
arXiv Detail & Related papers (2022-12-06T07:51:51Z) - A general-purpose material property data extraction pipeline from large
polymer corpora using Natural Language Processing [4.688077134982731]
We used natural language processing methods to automatically extract material property data from the abstracts of polymer literature.
We obtained 300,000 material property records from 130,000 abstracts in 60 hours.
The extracted data was analyzed for a diverse range of applications such as fuel cells, supercapacitors, and polymer solar cells.
arXiv Detail & Related papers (2022-09-27T03:47:03Z) - Text to Insight: Accelerating Organic Materials Knowledge Extraction via
Deep Learning [1.2774526936067927]
This study aims to explore knowledge extraction for organic materials.
We built a research dataset composed of 855 annotated and 708,376 unannotated sentences drawn from 92,667 abstracts.
We used named-entity-recognition (NER) with BiLSTM-CNN-CRF deep learning model to automatically extract key knowledge from literature.
arXiv Detail & Related papers (2021-09-27T01:58:35Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.