A Scientific Information Extraction Dataset for Nature Inspired
Engineering
- URL: http://arxiv.org/abs/2005.07753v2
- Date: Tue, 26 May 2020 13:47:47 GMT
- Title: A Scientific Information Extraction Dataset for Nature Inspired
Engineering
- Authors: Ruben Kruiper, Julian F.V. Vincent, Jessica Chen-Burger, Marc P.Y.
Desmulliez, Ioannis Konstas
- Abstract summary: This paper describes a dataset of 1,500 manually-annotated sentences that express domain-independent relations between central concepts in a scientific biology text.
The arguments of these relations can be Multi Word Expressions and have been annotated with modifying phrases to form non-projective graphs.
The dataset allows for training and evaluating Relation Extraction algorithms that aim for coarse-grained typing of scientific biological documents.
- Score: 12.819150283584328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nature has inspired various ground-breaking technological developments in
applications ranging from robotics to aerospace engineering and the
manufacturing of medical devices. However, accessing the information captured
in scientific biology texts is a time-consuming and hard task that requires
domain-specific knowledge. Improving access for outsiders can help
interdisciplinary research like Nature Inspired Engineering. This paper
describes a dataset of 1,500 manually-annotated sentences that express
domain-independent relations between central concepts in a scientific biology
text, such as trade-offs and correlations. The arguments of these relations can
be Multi Word Expressions and have been annotated with modifying phrases to
form non-projective graphs. The dataset allows for training and evaluating
Relation Extraction algorithms that aim for coarse-grained typing of scientific
biological documents, enabling a high-level filter for engineers.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Agent-based Learning of Materials Datasets from Scientific Literature [0.0]
We develop a chemist AI agent, powered by large language models (LLMs), to create structured datasets from natural language text.
Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles.
arXiv Detail & Related papers (2023-12-18T20:29:58Z) - Using Natural Language Processing and Networks to Automate Structured Literature Reviews: An Application to Farmers Climate Change Adaptation [0.0]
This work aims to sensibly use Natural Language Processing by extracting variables relations and synthesizing their findings using networks.
As an example, we apply our methodology to the analysis of farmers' adaptation to climate change.
Results show that the use of Natural Language Processing together with networks in a descriptive manner offers a fast and interpretable way to synthesize literature review findings.
arXiv Detail & Related papers (2023-06-16T10:05:47Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Cetacean Translation Initiative: a roadmap to deciphering the
communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species.
We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales.
The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z) - Semantic maps and metrics for science Semantic maps and metrics for
science using deep transformer encoders [1.599072005190786]
Recent advances in natural language understanding driven by deep transformer networks offer new possibilities for mapping science.
Transformer embedding models capture shades of association and connotation that vary across different linguistic contexts.
We report a procedure for encoding scientific documents with these tools, measuring their improvement over static word embeddings.
arXiv Detail & Related papers (2021-04-13T04:12:20Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.