Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph
- URL: http://arxiv.org/abs/2504.01309v1
- Date: Wed, 02 Apr 2025 02:40:19 GMT
- Title: Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph
- Authors: Lingxiao Guan, Yuanhao Huang, Jie Liu,
- Abstract summary: We propose a novel method that utilizes propositional claims to construct a local knowledge graph from retrieved documents.<n>We achieved comparable or superior performance with our method over RAG baselines on several biomedical QA benchmarks.
- Score: 3.22027269248374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Question Answering (QA), Retrieval Augmented Generation (RAG) has revolutionized performance in various domains. However, how to effectively capture multi-document relationships, particularly critical for biomedical tasks, remains an open question. In this work, we propose a novel method that utilizes propositional claims to construct a local knowledge graph from retrieved documents. Summaries are then derived via layerwise summarization from the knowledge graph to contextualize a small language model to perform QA. We achieved comparable or superior performance with our method over RAG baselines on several biomedical QA benchmarks. We also evaluated each individual step of our methodology over a targeted set of metrics, demonstrating its effectiveness.
Related papers
- In-depth Analysis of Graph-based RAG in a Unified Framework [17.941941997783267]
Graph-based Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models.<n>We first summarize a unified framework to incorporate all graph-based RAG methods from a high-level perspective.<n>We then extensively compare representative graph-based RAG methods over a range of questing-answering (QA) datasets.
arXiv Detail & Related papers (2025-03-06T11:34:49Z) - Multiple Abstraction Level Retrieve Augment Generation [4.516242893120263]
A Retrieval-Augmented Generation (RAG) model powered by a large language model (LLM) provides a faster and more cost-effective solution for adapting to new data and knowledge.
We propose a novel RAG approach that uses chunks of multiple abstraction levels (MAL), including multi-sentence-level, paragraph-level, section-level, and document-level.
Compared to traditional single-level RAG approaches, our approach improves AI evaluated answer correctness of Q/A by 25.739% on Glyco-related papers.
arXiv Detail & Related papers (2025-01-28T13:49:39Z) - NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.<n>We construct a taxonomy and review the most prominent papers in recent years.<n>We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models [18.984165679347026]
Self-BioRAG is a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses.
We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens.
arXiv Detail & Related papers (2024-01-27T02:29:42Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Integrating Logical Rules Into Neural Multi-Hop Reasoning for Drug
Repurposing [23.783111050856245]
We propose a novel method that combines these rules with a neural multi-hop reasoning approach that uses reinforcement learning.
We apply our method to the biomedical knowledge graph Hetionet and show that our approach outperforms several baseline methods.
arXiv Detail & Related papers (2020-07-10T10:32:08Z) - CAiRE-COVID: A Question Answering and Query-focused Multi-Document
Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge.
Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.