Related papers: Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique

Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique

URL: http://arxiv.org/abs/2312.03303v1
Date: Wed, 6 Dec 2023 06:07:50 GMT
Title: Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking Technique
Authors: Ilya Tyagin, Ilya Safro
Abstract summary: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. We integrate knowledge from curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification.
Score: 2.0077755400451855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypothesis accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. Availability and implementation: Dyport framework is fully open-source. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport

Related papers

Flow Matching Meets Biology and Life Science: A Survey [65.2146737141455]
Flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling.<n>This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains.
arXiv Detail & Related papers (2025-07-23T17:44:29Z)
Benchmarking and Evaluation of AI Models in Biology: Outcomes and Recommendations from the CZI Virtual Cells Workshop [18.00029758641004]
We aim to accelerate the development of robust benchmarks for AI driven Virtual Cells.<n>These benchmarks are crucial for ensuring rigor, relevance, and biological relevance.<n>These benchmarks will advance the field toward integrated models that drive new discoveries, therapeutic insights, and a deeper understanding of cellular systems.
arXiv Detail & Related papers (2025-07-14T17:25:28Z)
PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models [59.17570021208177]
PyTDC is a machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models.<n>This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task.
arXiv Detail & Related papers (2025-05-08T18:15:38Z)
WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking [13.880278087741482]
deep learning has revolutionized computer-aided drug discovery. While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation. We seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate.
arXiv Detail & Related papers (2024-11-14T21:49:41Z)
Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data. We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work. Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels [19.90354530235266]
We introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue. SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query. We present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios.
arXiv Detail & Related papers (2024-10-26T02:53:20Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
UniCell: Universal Cell Nucleus Classification via Prompt Learning [76.11864242047074]
We propose a universal cell nucleus classification framework (UniCell) It employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains. In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets.
arXiv Detail & Related papers (2024-02-20T11:50:27Z)
Towards Biologically Plausible and Private Gene Expression Data Generation [47.72947816788821]
Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications. Existing literature, however, primarily focuses on basic benchmarking datasets and tends to report promising results only for elementary metrics and relatively simple data distributions. We initiate a systematic analysis of how DP generative models perform in their natural application scenarios, specifically focusing on real-world gene expression data.
arXiv Detail & Related papers (2024-02-07T14:39:11Z)
A large dataset curation and benchmark for drug target interaction [0.7699646945563469]
Bioactivity data plays a key role in drug discovery and repurposing. We propose a way to standardize and represent efficiently a very large dataset curated from multiple public sources.
arXiv Detail & Related papers (2024-01-30T17:06:25Z)
EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach [1.7954335118363964]
We present a hypothesis generation system that can introduce data-driven insights earlier in the discovery process. AGATHA prioritizes plausible term-pairs among entity sets, allowing us to recommend new research directions. This system achieves best-in-class performance on an established benchmark.
arXiv Detail & Related papers (2020-02-13T17:06:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.