Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking
Technique
- URL: http://arxiv.org/abs/2312.03303v1
- Date: Wed, 6 Dec 2023 06:07:50 GMT
- Title: Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking
Technique
- Authors: Ilya Tyagin, Ilya Safro
- Abstract summary: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems.
We integrate knowledge from curated databases into a dynamic graph, accompanied by a method to quantify discovery importance.
Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification.
- Score: 2.0077755400451855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel benchmarking framework Dyport for evaluating
biomedical hypothesis generation systems. Utilizing curated datasets, our
approach tests these systems under realistic conditions, enhancing the
relevance of our evaluations. We integrate knowledge from the curated databases
into a dynamic graph, accompanied by a method to quantify discovery importance.
This not only assesses hypothesis accuracy but also their potential impact in
biomedical research which significantly extends traditional link prediction
benchmarks. Applicability of our benchmarking process is demonstrated on
several link prediction systems applied on biomedical semantic knowledge
graphs. Being flexible, our benchmarking system is designed for broad
application in hypothesis generation quality verification, aiming to expand the
scope of scientific discovery within the biomedical research community.
Availability and implementation: Dyport framework is fully open-source. All
code and datasets are available at: https://github.com/IlyaTyagin/Dyport
Related papers
- AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue.
SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query.
We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z) - The Role of Graph Topology in the Performance of Biomedical Knowledge Graph Completion Models [3.1666540219908272]
We conduct a comprehensive investigation into the properties of publicly available biomedical Knowledge Graphs.
We establish links to the accuracy observed in real-world applications.
We release all model predictions and a new suite of analysis tools.
arXiv Detail & Related papers (2024-09-06T08:09:15Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell)
It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains.
In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z) - Towards Biologically Plausible and Private Gene Expression Data
Generation [47.72947816788821]
Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications.
Existing literature, however, primarily focuses on basic benchmarking datasets and tends to report promising results only for elementary metrics and relatively simple data distributions.
We initiate a systematic analysis of how DP generative models perform in their natural application scenarios, specifically focusing on real-world gene expression data.
arXiv Detail & Related papers (2024-02-07T14:39:11Z) - A large dataset curation and benchmark for drug target interaction [0.7699646945563469]
Bioactivity data plays a key role in drug discovery and repurposing.
We propose a way to standardize and represent efficiently a very large dataset curated from multiple public sources.
arXiv Detail & Related papers (2024-01-30T17:06:25Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Biomedical Knowledge Graph Refinement and Completion using Graph
Representation Learning and Top-K Similarity Measure [1.4660617536303606]
This work demonstrates learning discrete representations of the integrated biomedical knowledge graph Chem2Bio2RD.
We perform a knowledge graph completion and refinement task using a simple top-K cosine similarity measure between the learned embedding vectors.
arXiv Detail & Related papers (2020-12-18T22:19:57Z) - Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs.
For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress.
GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z) - AGATHA: Automatic Graph-mining And Transformer based Hypothesis
generation Approach [1.7954335118363964]
We present a hypothesis generation system that can introduce data-driven insights earlier in the discovery process.
AGATHA prioritizes plausible term-pairs among entity sets, allowing us to recommend new research directions.
This system achieves best-in-class performance on an established benchmark.
arXiv Detail & Related papers (2020-02-13T17:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.