An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on
the Web
- URL: http://arxiv.org/abs/2006.04161v1
- Date: Sun, 7 Jun 2020 14:26:32 GMT
- Title: An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on
the Web
- Authors: Maulik R. Kamdar and Mark A. Musen
- Abstract summary: We study the Life Sciences Linked Open Data (LSLOD) cloud.
We extract schemas from more than 80 publicly available biomedical linked data graphs.
We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources.
- Score: 1.2964393302157287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the biomedical community has published several "open data" sources in
the last decade, most researchers still endure severe logistical and technical
challenges to discover, query, and integrate heterogeneous data and knowledge
from multiple sources. To tackle these challenges, the community has
experimented with Semantic Web and linked data technologies to create the Life
Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from
more than 80 publicly available biomedical linked data graphs into an LSLOD
schema graph and conduct an empirical meta-analysis to evaluate the extent of
semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD
sources exist as stand-alone data sources that are not inter-linked with other
sources, use unpublished schemas with minimal reuse or mappings, and have
elements that are not useful for data integration from a biomedical
perspective. We envision that the LSLOD schema graph and the findings from this
research will aid researchers who wish to query and integrate data and
knowledge from multiple biomedical sources simultaneously on the Web.
Related papers
- iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine [28.917691563659467]
The iASiS infrastructure is able to convert clinical notes into usable data.
Using semantic integration of data gives the opportunity to generate information rich, auditable and reliable.
Data resources for two different disease categories are explored within the iASiS use cases, dementia and lung cancer.
arXiv Detail & Related papers (2024-07-09T10:52:19Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process [8.207427766052044]
The proposed approach is demonstrated on and analyzed through two mathematical and two materials science case studies.
It is observed that compared to using single-source and source unaware machine learning models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems.
arXiv Detail & Related papers (2024-02-06T16:54:59Z) - CARE: Extracting Experimental Findings From Clinical Literature [29.763929941107616]
This work presents CARE, a new IE dataset for the task of extracting clinical findings.
We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes.
We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports.
arXiv Detail & Related papers (2023-11-16T10:06:19Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - Synthetic data generation for a longitudinal cohort study -- Evaluation,
method extension and reproduction of published data analysis results [0.32593385688760446]
In the health sector, access to individual-level data is often challenging due to privacy concerns.
A promising alternative is the generation of fully synthetic data.
In this study, we use a state-of-the-art synthetic data generation method.
arXiv Detail & Related papers (2023-05-12T13:13:55Z) - Synthcity: facilitating innovative use cases of synthetic data in
different data modalities [86.52703093858631]
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation.
Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data.
arXiv Detail & Related papers (2023-01-18T14:49:54Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Challenges in biomarker discovery and biorepository for Gulf-war-disease
studies: a novel data platform solution [48.7576911714538]
We introduce a novel data platform, named ROSALIND, to overcome the challenges, foster healthy and vital collaborations and advance scientific inquiries.
We follow the principles etched in the platform name - ROSALIND stands for resource organisms with self-governed accessibility, linkability, integrability, neutrality, and dependability.
The deployment of ROSALIND in our GWI study in recent 12 months has accelerated the pace of data experiment and analysis, removed numerous error sources, and increased research quality and productivity.
arXiv Detail & Related papers (2021-02-04T20:38:30Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.