Evaluating the Effectiveness of Retrieval-Augmented Large Language
Models in Scientific Document Reasoning
- URL: http://arxiv.org/abs/2311.04348v1
- Date: Tue, 7 Nov 2023 21:09:57 GMT
- Title: Evaluating the Effectiveness of Retrieval-Augmented Large Language
Models in Scientific Document Reasoning
- Authors: Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana
- Abstract summary: Large Language Model (LLM) often provide seemingly plausible but not factual information, often referred to as hallucinations.
Retrieval-augmented LLMs provide a non-parametric approach to solve these issues by retrieving relevant information from external data sources.
We critically evaluate these models in their ability to perform in scientific document reasoning tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the dramatic progress in Large Language Model (LLM) development, LLMs
often provide seemingly plausible but not factual information, often referred
to as hallucinations. Retrieval-augmented LLMs provide a non-parametric
approach to solve these issues by retrieving relevant information from external
data sources and augment the training process. These models help to trace
evidence from an externally provided knowledge base allowing the model
predictions to be better interpreted and verified. In this work, we critically
evaluate these models in their ability to perform in scientific document
reasoning tasks. To this end, we tuned multiple such model variants with
science-focused instructions and evaluated them on a scientific document
reasoning benchmark for the usefulness of the retrieved document passages. Our
findings suggest that models justify predictions in science tasks with
fabricated evidence and leveraging scientific corpus as pretraining data does
not alleviate the risk of evidence fabrication.
Related papers
- A Debate-Driven Experiment on LLM Hallucinations and Accuracy [7.821303946741665]
This study investigates the phenomenon of hallucination in large language models (LLMs)
Multiple instances of GPT-4o-Mini models engage in a debate-like interaction prompted with questions from the TruthfulQA dataset.
One model is deliberately instructed to generate plausible but false answers while the other models are asked to respond truthfully.
arXiv Detail & Related papers (2024-10-25T11:41:27Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Reasoning and Tools for Human-Level Forecasting [0.4261908132550109]
We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can retrieve updated information and run numerical simulation with equipped tools.
We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions.
arXiv Detail & Related papers (2024-08-21T23:42:06Z) - Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks.
Few approaches consider evidence retrieval as part of misinformation detection.
We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Empirical evaluation of Uncertainty Quantification in
Retrieval-Augmented Language Models for Science [0.0]
This study investigates how uncertainty scores vary when scientific knowledge is incorporated as pretraining and retrieval data.
We observe that an existing RALM finetuned with scientific knowledge as the retrieval data tends to be more confident in generating predictions.
We also found that RALMs are overconfident in their predictions, making inaccurate predictions more confidently than accurate ones.
arXiv Detail & Related papers (2023-11-15T20:42:11Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - Scientific Inference With Interpretable Machine Learning: Analyzing Models to Learn About Real-World Phenomena [4.312340306206884]
Interpretable machine learning offers a solution by analyzing models holistically to derive interpretations.
Current IML research is focused on auditing ML models rather than leveraging them for scientific inference.
We present a framework for designing IML methods-termed 'property descriptors' that illuminate not just the model, but also the phenomenon it represents.
arXiv Detail & Related papers (2022-06-11T10:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.