Text to Insight: Accelerating Organic Materials Knowledge Extraction via
Deep Learning
- URL: http://arxiv.org/abs/2109.12758v1
- Date: Mon, 27 Sep 2021 01:58:35 GMT
- Title: Text to Insight: Accelerating Organic Materials Knowledge Extraction via
Deep Learning
- Authors: Xintong Zhao, Steven Lopez, Semion Saikin, Xiaohua Hu and Jane
Greenberg
- Abstract summary: This study aims to explore knowledge extraction for organic materials.
We built a research dataset composed of 855 annotated and 708,376 unannotated sentences drawn from 92,667 abstracts.
We used named-entity-recognition (NER) with BiLSTM-CNN-CRF deep learning model to automatically extract key knowledge from literature.
- Score: 1.2774526936067927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific literature is one of the most significant resources for sharing
knowledge. Researchers turn to scientific literature as a first step in
designing an experiment. Given the extensive and growing volume of literature,
the common approach of reading and manually extracting knowledge is too time
consuming, creating a bottleneck in the research cycle. This challenge spans
nearly every scientific domain. For the materials science, experimental data
distributed across millions of publications are extremely helpful for
predicting materials properties and the design of novel materials. However,
only recently researchers have explored computational approaches for knowledge
extraction primarily for inorganic materials. This study aims to explore
knowledge extraction for organic materials. We built a research dataset
composed of 855 annotated and 708,376 unannotated sentences drawn from 92,667
abstracts. We used named-entity-recognition (NER) with BiLSTM-CNN-CRF deep
learning model to automatically extract key knowledge from literature.
Early-phase results show a high potential for automated knowledge extraction.
The paper presents our findings and a framework for supervised knowledge
extraction that can be adapted to other scientific domains.
Related papers
- From Text to Insight: Large Language Models for Materials Science Data Extraction [4.08853418443192]
The vast majority of materials science knowledge exists in unstructured natural language.
Structured data is crucial for innovative and systematic materials design.
The advent of large language models (LLMs) represents a significant shift.
arXiv Detail & Related papers (2024-07-23T22:23:47Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model [16.030268397865264]
This article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques.
MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology.
By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods.
arXiv Detail & Related papers (2024-04-03T21:46:14Z) - MatKB: Semantic Search for Polycrystalline Materials Synthesis
Procedures [2.578242050187029]
Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials.
The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information.
The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google.
arXiv Detail & Related papers (2023-02-11T04:18:07Z) - Interdisciplinary Discovery of Nanomaterials Based on Convolutional
Neural Networks [6.350788459498522]
We use CNN to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications.
Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively.
Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3% classification accuracy and 4.3% data extraction mean square error.
arXiv Detail & Related papers (2022-12-06T07:51:51Z) - Artificial Intelligence in Concrete Materials: A Scientometric View [77.34726150561087]
This chapter aims to uncover the main research interests and knowledge structure of the existing literature on AI for concrete materials.
To begin with, a total of 389 journal articles published from 1990 to 2020 were retrieved from the Web of Science.
Scientometric tools such as keyword co-occurrence analysis and documentation co-citation analysis were adopted to quantify features and characteristics of the research field.
arXiv Detail & Related papers (2022-09-17T18:24:56Z) - Embedding Knowledge for Document Summarization: A Survey [66.76415502727802]
Previous works proved that knowledge-embedded document summarizers excel at generating superior digests.
We propose novel to recapitulate knowledge and knowledge embeddings under the document summarization view.
arXiv Detail & Related papers (2022-04-24T04:36:07Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z) - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers [50.17242035034729]
We pursue the construction of a knowledge base (KB) of mechanisms.
We develop a broad, unified schema that strikes a balance between relevance and breadth.
Experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature.
arXiv Detail & Related papers (2020-10-08T07:54:14Z) - COVID-19 Literature Knowledge Graph Construction and Drug Repurposing
Report Generation [79.33545724934714]
We have developed a novel and comprehensive knowledge discovery framework, COVID-KG, to extract fine-grained multimedia knowledge elements from scientific literature.
Our framework also provides detailed contextual sentences, subfigures, and knowledge subgraphs as evidence.
arXiv Detail & Related papers (2020-07-01T16:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.