AGATHA: Automatic Graph-mining And Transformer based Hypothesis
generation Approach
- URL: http://arxiv.org/abs/2002.05635v1
- Date: Thu, 13 Feb 2020 17:06:47 GMT
- Title: AGATHA: Automatic Graph-mining And Transformer based Hypothesis
generation Approach
- Authors: Justin Sybrandt, Ilya Tyagin, Michael Shtutman, Ilya Safro
- Abstract summary: We present a hypothesis generation system that can introduce data-driven insights earlier in the discovery process.
AGATHA prioritizes plausible term-pairs among entity sets, allowing us to recommend new research directions.
This system achieves best-in-class performance on an established benchmark.
- Score: 1.7954335118363964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical research is risky and expensive. Drug discovery, as an example,
requires that researchers efficiently winnow thousands of potential targets to
a small candidate set for more thorough evaluation. However, research groups
spend significant time and money to perform the experiments necessary to
determine this candidate set long before seeing intermediate results.
Hypothesis generation systems address this challenge by mining the wealth of
publicly available scientific information to predict plausible research
directions. We present AGATHA, a deep-learning hypothesis generation system
that can introduce data-driven insights earlier in the discovery process.
Through a learned ranking criteria, this system quickly prioritizes plausible
term-pairs among entity sets, allowing us to recommend new research directions.
We massively validate our system with a temporal holdout wherein we predict
connections first introduced after 2015 using data published beforehand. We
additionally explore biomedical sub-domains, and demonstrate AGATHA's
predictive capacity across the twenty most popular relationship types. This
system achieves best-in-class performance on an established benchmark, and
demonstrates high recommendation scores across subdomains. Reproducibility: All
code, experimental data, and pre-trained models are available online:
sybrandt.com/2020/agatha
Related papers
- Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine [10.692728349388297]
Pandemic PACT project aims to track and analyse research funding and clinical evidence for a wide range of diseases with outbreak potential.
This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset.
arXiv Detail & Related papers (2024-07-14T05:22:53Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Data-driven Discovery with Large Generative Models [47.324203863823335]
This position paper urges the Machine Learning (ML) community to exploit the capabilities of large generative models (LGMs)
We demonstrate how LGMs fulfill several desideratas for an ideal data-driven discovery system.
We advocate for fail-proof tool integration, along with active user moderation through feedback mechanisms.
arXiv Detail & Related papers (2024-02-21T08:26:43Z) - Dyport: Dynamic Importance-based Hypothesis Generation Benchmarking
Technique [2.0077755400451855]
This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems.
We integrate knowledge from curated databases into a dynamic graph, accompanied by a method to quantify discovery importance.
Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification.
arXiv Detail & Related papers (2023-12-06T06:07:50Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Benchmarking Predictive Risk Models for Emergency Departments with Large
Public Electronic Health Records [7.928862476020428]
There is an absence of widely accepted ED benchmarks based on large-scale public EHR.
We proposed a public ED benchmark suite and obtained a benchmark dataset containing over 500,000 ED visits episodes from 2011 to 2019.
Our codes are open-source so that anyone with access to MIMIC-IV-ED could follow the same steps of data processing, build the benchmarks, and reproduce the experiments.
arXiv Detail & Related papers (2021-11-22T06:51:11Z) - Accelerating COVID-19 research with graph mining and transformer-based
learning [2.493740042317776]
We present an automated general purpose hypothesis generation systems AGATHA-C and AGATHA-GP for COVID-19 research.
Both systems achieve high-quality predictions across domains (in some domains up to 0.97% ROC AUC) in fast computational time.
We show that the systems are able to discover on-going research findings such as the relationship between COVID-19 and oxytocin hormone.
arXiv Detail & Related papers (2021-02-10T15:11:36Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.