Biomedical Hypothesis Explainability with Graph-Based Context Retrieval
- URL: http://arxiv.org/abs/2511.05498v1
- Date: Mon, 15 Sep 2025 09:50:51 GMT
- Title: Biomedical Hypothesis Explainability with Graph-Based Context Retrieval
- Authors: Ilya Tyagin, Saeideh Valipour, Aliaksandra Sikirzhytskaya, Michael Shtutman, Ilya Safro,
- Abstract summary: We introduce an explainability method for biomedical hypothesis generation systems.<n>Our approach combines semantic graph-based retrieval and relevant data-restrictive training to simulate real-world discovery constraints.
- Score: 13.805590287547792
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We introduce an explainability method for biomedical hypothesis generation systems, built on top of the novel Hypothesis Generation Context Retriever framework. Our approach combines semantic graph-based retrieval and relevant data-restrictive training to simulate real-world discovery constraints. Integrated with large language models (LLMs) via retrieval-augmented generation, the system explains hypotheses with contextual evidence using published scientific literature. We also propose a novel feedback loop approach, which iteratively identifies and corrects flawed parts of LLM-generated explanations, refining both the evidence paths and supporting context. We demonstrate the performance of our method with multiple large language models and evaluate the explanation and context retrieval quality through both expert-curated assessment and large-scale automated analysis. Our code is available at: https://github.com/IlyaTyagin/HGCR.
Related papers
- Biomedical Literature Q&A System Using Retrieval-Augmented Generation (RAG) [0.0]
This work presents a Biomedical Literature Question Answering (Q&A) system based on a Retrieval-Augmented Generation architecture.<n>The system integrates diverse sources, including PubMed articles, curated Q&A datasets, and medical encyclopedias.<n>The system supports both general medical queries and domain-specific tasks, with a focused evaluation on breast cancer literature.
arXiv Detail & Related papers (2025-09-05T21:29:52Z) - BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation [0.0]
Existing automated methods struggle to generate novel and evidence-grounded hypotheses.<n>BioDisco is a multi-agent framework that draws upon language model-based reasoning and a dual-mode evidence system.
arXiv Detail & Related papers (2025-08-02T09:32:52Z) - HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z) - Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z) - Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z) - Contextualizing biological perturbation experiments through language [3.704686482174365]
PerturbQA is a benchmark for structured reasoning over perturbation experiments.<n>We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations.<n>As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR), a simple, domain-informed LLM framework.
arXiv Detail & Related papers (2025-02-28T18:15:31Z) - Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented.
LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones.
We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
We take a dramatic departure with a novel setting in which models use as input background contexts.
We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z) - Ontology-based Interpretable Machine Learning for Textual Data [35.01650633374998]
We introduce a novel interpreting framework that learns an interpretable model based on sampling technique to explain prediction models.
To narrow down the search space for explanations, we design a learnable anchor algorithm.
A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible explanations.
arXiv Detail & Related papers (2020-04-01T02:51:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.