Related papers: Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification

Related papers

FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights [63.32178443510396]
We introduce FIRE-Bench (Full-cycle Insight Rediscovery Evaluation), a benchmark that evaluates agents through the rediscovery of established findings.<n>Even the strongest agents achieve limited rediscovery success (50 F1), exhibit high variance across runs, and display recurring failure modes in experimental design, execution, and evidence-based reasoning.
arXiv Detail & Related papers (2026-02-02T23:21:13Z)
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature [92.88058660627678]
"Fish-in-the-Ocean" (FITO) paradigm requires models to construct explicit cross-modal evidence chains within scientific documents.<n>We construct SIN-Bench with four progressive tasks covering evidence discovery (SIN-Find), hypothesis verification (SIN-Verify), grounded QA (SIN-QA) and evidence-anchored synthesis (SIN-Summary)<n>We introduce "No Evidence, No Score", scoring predictions when grounded to verifiable anchors and diagnosing evidence quality via matching, relevance, and logic.
arXiv Detail & Related papers (2026-01-15T06:25:25Z)
CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research [25.578430277176988]
Generative language models (LMs) can facilitate the translation of fundamental research into clinically-actionable insights.<n>CGBench is a benchmark that tests reasoning capabilities of LMs on scientific publications.<n>We test 8 different LMs and find that while models show promise, substantial gaps exist in literature interpretation.
arXiv Detail & Related papers (2025-10-13T22:28:51Z)
VeriCite: Towards Reliable Citations in Retrieval-Augmented Generation via Rigorous Verification [107.75781898355562]
We introduce a novel framework, called VeriCite, designed to rigorously validate supporting evidence and enhance answer attribution.<n>We conduct experiments across five open-source LLMs and four datasets, demonstrating that VeriCite can significantly improve citation quality while maintaining the correctness of the answers.
arXiv Detail & Related papers (2025-10-13T13:38:54Z)
Combating Biomedical Misinformation through Multi-modal Claim Detection and Evidence-based Verification [11.555285143713315]
CER (Combining Evidence and Reasoning) is a novel framework for biomedical fact-checking.<n>It integrates scientific evidence retrieval, reasoning via large language models, and supervised veracity prediction.<n>It effectively mitigates the risk of hallucinations, ensuring that generated outputs are grounded in verifiable, evidence-based sources.
arXiv Detail & Related papers (2025-09-17T10:31:09Z)
Biomedical Hypothesis Explainability with Graph-Based Context Retrieval [13.805590287547792]
We introduce an explainability method for biomedical hypothesis generation systems.<n>Our approach combines semantic graph-based retrieval and relevant data-restrictive training to simulate real-world discovery constraints.
arXiv Detail & Related papers (2025-09-15T09:50:51Z)
Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain [8.094811345546118]
Retrieval augmented generation (RAG) systems provide a method for factually grounding the responses of a Large Language Model (LLM) by providing retrieved evidence, or context, as support.<n>This design introduces a critical vulnerability: LLMs may absorb and reproduce misinformation present in retrieved evidence.<n>This problem is magnified if retrieved evidence contains adversarial material explicitly intended to promulgate misinformation.
arXiv Detail & Related papers (2025-09-04T00:45:58Z)
Introducing Answered with Evidence -- a framework for evaluating whether LLM responses to biomedical questions are founded in evidence [1.3250161978024673]
Large language models (LLMs) for biomedical question answering raise concerns about the accuracy and evidentiary support of their responses.<n>We analyzed thousands of physician-submitted questions using a comparative pipeline that included: (1) Alexandria, fka the Atropos Evidence Library, a retrieval-augmented generation (RAG) system based on novel observational studies, and (2) two PubMed-based retrieval-augmented systems (System and Perplexity)<n>We found that PubMed-based systems provided evidence-supported answers for approximately 44% of questions, while the novel evidence source did so for about 50%.
arXiv Detail & Related papers (2025-06-30T18:00:52Z)
Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine [22.983780823136925]
Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare.<n>We propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model.<n>Our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support.
arXiv Detail & Related papers (2025-03-18T09:17:31Z)
Causal Representation Learning from Multimodal Biomedical Observations [57.00712157758845]
We develop flexible identification conditions for multimodal data and principled methods to facilitate the understanding of biomedical datasets.<n>Key theoretical contribution is the structural sparsity of causal connections between modalities.<n>Results on a real-world human phenotype dataset are consistent with established biomedical research.
arXiv Detail & Related papers (2024-11-10T16:40:27Z)
A generative framework to bridge data-driven models and scientific theories in language neuroscience [84.76462599023802]
We present generative explanation-mediated validation, a framework for generating concise explanations of language selectivity in the brain. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying statistical models.
arXiv Detail & Related papers (2024-10-01T15:57:48Z)
Evidence-Enhanced Triplet Generation Framework for Hallucination Alleviation in Generative Question Answering [41.990482015732574]
We propose a novel evidence-enhanced triplet generation framework, EATQA, to predict all the combinations of (Question, Evidence, Answer) triplet. We bridge the distribution gap to distill the knowledge from evidence in inference stage. Our framework ensures the model to learn the logical relation between query, evidence and answer, which simultaneously improves the evidence generation and query answering.
arXiv Detail & Related papers (2024-08-27T13:07:07Z)
Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models [46.05020842978823]
Large Language Models (LLMs) have emerged as powerful tools to navigate this complex data landscape. RAGGED is a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation.
arXiv Detail & Related papers (2024-07-17T07:44:18Z)
Uncertainty Estimation of Large Language Models in Medical Question Answering [60.72223137560633]
Large Language Models (LLMs) show promise for natural language generation in healthcare, but risk hallucinating factually incorrect information. We benchmark popular uncertainty estimation (UE) methods with different model sizes on medical question-answering datasets. Our results show that current approaches generally perform poorly in this domain, highlighting the challenge of UE for medical applications.
arXiv Detail & Related papers (2024-07-11T16:51:33Z)
Answering real-world clinical questions using large language model based systems [2.2605659089865355]
Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD) We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability.
arXiv Detail & Related papers (2024-06-29T22:39:20Z)
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data. Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded. Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z)
Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables [22.18384189336634]
HeterFC is a word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information. We perform information propagation via a relational graph neural network, interactions between claims and evidence. We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval.
arXiv Detail & Related papers (2024-02-20T14:10:40Z)
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification [60.10193972862099]
This work proposes a framework to characterize and recover simplification-induced information loss in form of question-and-answer pairs. QA pairs are designed to help readers deepen their knowledge of a text.
arXiv Detail & Related papers (2024-01-29T19:00:01Z)
A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes. We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z)
Grow-and-Clip: Informative-yet-Concise Evidence Distillation for Answer Explanation [22.20733260041759]
We argue that the evidences of an answer is critical to enhancing the interpretability of QA models. We are the first to explicitly define the concept of evidence as the supporting facts in a context which are informative, concise, and readable. We propose Grow-and-Clip Evidence Distillation (GCED) algorithm to extract evidences from the contexts by trade-off informativeness, conciseness, and readability.
arXiv Detail & Related papers (2022-01-13T17:18:17Z)
Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text. Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
Commonsense Evidence Generation and Injection in Reading Comprehension [57.31927095547153]
We propose a Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI. The framework injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking. Experiments on the CosmosQA dataset demonstrate that the proposed CEGI model outperforms the current state-of-the-art approaches.
arXiv Detail & Related papers (2020-05-11T16:31:08Z)
Evidence Inference 2.0: More Data, Better Models [22.53884716373888]
The Evidence Inference dataset was recently released to facilitate research toward this end. This paper collects additional annotations to expand the Evidence Inference dataset by 25%. The updated corpus, documentation, and code for new baselines and evaluations are available at http://evidence-inference.ebm-nlp.com/.
arXiv Detail & Related papers (2020-05-08T17:16:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.