Biomedical Concept Relatedness -- A large EHR-based benchmark
- URL: http://arxiv.org/abs/2010.16218v1
- Date: Fri, 30 Oct 2020 12:20:18 GMT
- Title: Biomedical Concept Relatedness -- A large EHR-based benchmark
- Authors: Claudia Schulz and Josh Levy-Kramer and Camille Van Assel and Miklos
Kepes and Nils Hammerla
- Abstract summary: A promising application of AI to healthcare is the retrieval of information from electronic health records.
The suitability of AI methods for such applications is tested by predicting the relatedness of concepts with known relatedness scores.
All existing biomedical concept relatedness datasets are notoriously small and consist of hand-picked concept pairs.
We open-source a novel concept relatedness benchmark overcoming these issues.
- Score: 10.133874724214984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A promising application of AI to healthcare is the retrieval of information
from electronic health records (EHRs), e.g. to aid clinicians in finding
relevant information for a consultation or to recruit suitable patients for a
study. This requires search capabilities far beyond simple string matching,
including the retrieval of concepts (diagnoses, symptoms, medications, etc.)
related to the one in question. The suitability of AI methods for such
applications is tested by predicting the relatedness of concepts with known
relatedness scores. However, all existing biomedical concept relatedness
datasets are notoriously small and consist of hand-picked concept pairs. We
open-source a novel concept relatedness benchmark overcoming these issues: it
is six times larger than existing datasets and concept pairs are chosen based
on co-occurrence in EHRs, ensuring their relevance for the application of
interest. We present an in-depth analysis of our new dataset and compare it to
existing ones, highlighting that it is not only larger but also complements
existing datasets in terms of the types of concepts included. Initial
experiments with state-of-the-art embedding methods show that our dataset is a
challenging new benchmark for testing concept relatedness models.
Related papers
- Generative AI for Synthetic Data Across Multiple Medical Modalities: A Systematic Review of Recent Developments and Challenges [2.1835659964186087]
This paper presents a systematic review of generative models used to synthesize various medical data types.
Our study encompasses a broad array of medical data modalities and explores various generative models.
arXiv Detail & Related papers (2024-06-27T14:00:11Z) - EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling [22.94521527609479]
EMERGE is a Retrieval-Augmented Generation driven framework aimed at enhancing multimodal EHR predictive modeling.
Our approach extracts entities from both time-series data and clinical notes by prompting Large Language Models.
The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses.
arXiv Detail & Related papers (2024-05-27T10:53:15Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Ontology Enrichment from Texts: A Biomedical Dataset for Concept
Discovery and Placement [22.074094839360413]
Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases.
Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery.
We provide usage on the evaluation with the dataset for out-of-KB mention discovery and concept placement, recent Large Language Model based methods.
arXiv Detail & Related papers (2023-06-26T13:54:47Z) - ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for
Benchmarking Automatic Visit Note Generation [4.1331432182859436]
We present the largest dataset to date tackling the problem of AI-assisted note generation from visit dialogue.
We also present the benchmark performances of several common state-of-the-art approaches.
arXiv Detail & Related papers (2023-06-03T06:42:17Z) - Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors [58.340159346749964]
We propose a new neural-symbolic method to support end-to-end learning using complex queries with provable reasoning capability.
We develop a new dataset containing ten new types of queries with features that have never been considered.
Our method outperforms previous methods significantly in the new dataset and also surpasses previous methods in the existing dataset at the same time.
arXiv Detail & Related papers (2023-04-14T11:35:35Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Multimodal Machine Learning in Precision Health [10.068890037410316]
This review was conducted to summarize this field and identify topics ripe for future research.
We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021.
The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion.
arXiv Detail & Related papers (2022-04-10T21:56:07Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.