Related papers: Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions

URL: http://arxiv.org/abs/2505.15633v1
Date: Wed, 21 May 2025 15:17:38 GMT
Title: Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
Authors: David Thulke, Jakob Kemmler, Christian Dugast, Hermann Ney,
Abstract summary: Large language models that use retrieval augmented generation have the potential to unlock valuable knowledge.<n>This approach can help alleviate factual hallucinations by relying on retrieved passages as additional context.<n>We explore the automatic assessment of faithfulness of different models in this setting.<n>We develop ClimateGPT Faithful+, which achieves an improvement in faithfulness from 30% to 57% in supported atomic claims.
Score: 31.7025759960363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models that use retrieval augmented generation have the potential to unlock valuable knowledge for researchers, policymakers, and the public by making long and technical climate-related documents more accessible. While this approach can help alleviate factual hallucinations by relying on retrieved passages as additional context, its effectiveness depends on whether the model's output remains faithful to these passages. To address this, we explore the automatic assessment of faithfulness of different models in this setting. We then focus on ClimateGPT, a large language model specialised in climate science, to examine which factors in its instruction fine-tuning impact the model's faithfulness. By excluding unfaithful subsets of the model's training data, we develop ClimateGPT Faithful+, which achieves an improvement in faithfulness from 30% to 57% in supported atomic claims according to our automatic metric.

Related papers

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering [10.31170458584116]
We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS)<n>CAIRNS is a framework that enables experts to obtain credible preliminary answers from complex evidence sources from the web.<n>It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation.
arXiv Detail & Related papers (2025-12-01T22:44:43Z)
Improving Diversity in Language Models: When Temperature Fails, Change the Loss [81.73385878967899]
We propose rethinking loss functions in language models by leveraging the Precision-Recall framework.<n>Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling.
arXiv Detail & Related papers (2025-08-13T09:37:53Z)
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method [61.76389719956301]
We contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns time series climate data from ERA5, extreme weather events data from NOAA, and satellite image data from NASA.<n>Under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks.
arXiv Detail & Related papers (2025-04-10T02:22:23Z)
Exploring Large Language Models for Climate Forecasting [5.25781442142288]
Large language models (LLMs) present a promising approach to bridging the gap between complex climate data and the general public. This study investigates the capability of GPT-4 in predicting rainfall at short-term (15-day) and long-term (12-month) scales.
arXiv Detail & Related papers (2024-11-20T21:58:19Z)
Pointwise Mutual Information as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that the pointwise mutual information between a context and a question is an effective gauge for language model performance.<n>We propose two methods that use the pointwise mutual information between a document and a question as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Unlearning Climate Misinformation in Large Language Models [17.95497650321137]
Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information.
arXiv Detail & Related papers (2024-05-29T23:11:53Z)
ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change [21.827936253363603]
This paper introduces ClimateGPT, a model family of domain-specific large language models that synthesize interdisciplinary research on climate change. We trained two 7B models from scratch on a science-oriented dataset of 300B tokens. ClimateGPT-7B, 13B and 70B are continuously pre-trained from Llama2 on a domain-specific dataset of 4.2B tokens.
arXiv Detail & Related papers (2024-01-17T23:29:46Z)
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM [77.17254959695218]
Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. We propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning Arabic dataset Clima500-Instruct. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation.
arXiv Detail & Related papers (2023-12-14T22:04:07Z)
ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science. It can be pre-trained with a self-supervised learning objective on climate datasets. It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z)
Towards Answering Climate Questionnaires from Unstructured Climate Reports [26.036105166376284]
Activists and policymakers need NLP tools to process the vast and rapidly growing unstructured textual climate reports into structured form. We introduce two new large-scale climate questionnaire datasets and use their existing structure to train self-supervised models. We then use these models to help align texts from unstructured climate documents to the semi-structured questionnaires in a human pilot study.
arXiv Detail & Related papers (2023-01-11T00:22:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.