Ontologies for increasing the FAIRness of plant research data
- URL: http://arxiv.org/abs/2309.07129v1
- Date: Fri, 25 Aug 2023 13:08:26 GMT
- Title: Ontologies for increasing the FAIRness of plant research data
- Authors: Kathryn Dumschott, Hannah D\"orpholz, Marie-Ang\'elique Laporte,
Dominik Brilhaus, Andrea Schrader, Bj\"orn Usadel, Steffen Neumann, Elizabeth
Arnaud and Angela Kranz
- Abstract summary: Onologies provide concepts for a particular domain as well as relationships between concepts.
By tagging with data terms data becomes both human machine interpretable, allowing increased reuse and interoperability.
We outline the most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The importance of improving the FAIRness (findability, accessibility,
interoperability, reusability) of research data is undeniable, especially in
the face of large, complex datasets currently being produced by omics
technologies. Facilitating the integration of a dataset with other types of
data increases the likelihood of reuse, and the potential of answering novel
research questions. Ontologies are a useful tool for semantically tagging
datasets as adding relevant metadata increases the understanding of how data
was produced and increases its interoperability. Ontologies provide concepts
for a particular domain as well as the relationships between concepts. By
tagging data with ontology terms, data becomes both human and machine
interpretable, allowing for increased reuse and interoperability. However, the
task of identifying ontologies relevant to a particular research domain or
technology is challenging, especially within the diverse realm of fundamental
plant research. In this review, we outline the ontologies most relevant to the
fundamental plant sciences and how they can be used to annotate data related to
plant-specific experiments within metadata frameworks, such as
Investigation-Study-Assay (ISA). We also outline repositories and platforms
most useful for identifying applicable ontologies or finding ontology terms.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Plant Disease Recognition Datasets in the Age of Deep Learning:
Challenges and Opportunities [1.9578088547147654]
This study explicitly propose an informative taxonomy to describe potential plant disease datasets.
We provide several directions for future, such as creating challenge-oriented datasets and the ultimate objective deploying deep learning in real-world applications with satisfactory performance.
arXiv Detail & Related papers (2023-12-13T05:24:36Z) - Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study
on Telematics Data with ChatGPT [0.0]
This research delves into the construction and utilization of synthetic datasets, specifically within the telematics sphere, leveraging OpenAI's powerful language model, ChatGPT.
To illustrate this data creation process, a hands-on case study is conducted, focusing on the generation of a synthetic telematics dataset.
arXiv Detail & Related papers (2023-06-23T15:15:13Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances [76.34037366117234]
We introduce a new dataset called Robot Control Gestures (RoCoG-v2)
The dataset is composed of both real and synthetic videos from seven gesture classes.
We present results using state-of-the-art action recognition and domain adaptation algorithms.
arXiv Detail & Related papers (2023-03-17T23:23:55Z) - Synthetic Data in Human Analysis: A Survey [16.562921709882865]
Survey is intended for researchers and practitioners in the field of human analysis.
We conduct a survey that summarises current state-of-the-art methods and the main benefits of using synthetic data.
We also provide an overview of publicly available synthetic datasets and generation models.
arXiv Detail & Related papers (2022-08-19T07:32:34Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Challenges in biomarker discovery and biorepository for Gulf-war-disease
studies: a novel data platform solution [48.7576911714538]
We introduce a novel data platform, named ROSALIND, to overcome the challenges, foster healthy and vital collaborations and advance scientific inquiries.
We follow the principles etched in the platform name - ROSALIND stands for resource organisms with self-governed accessibility, linkability, integrability, neutrality, and dependability.
The deployment of ROSALIND in our GWI study in recent 12 months has accelerated the pace of data experiment and analysis, removed numerous error sources, and increased research quality and productivity.
arXiv Detail & Related papers (2021-02-04T20:38:30Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.