Related papers: ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

Related papers

CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering [10.31170458584116]
We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS)<n>CAIRNS is a framework that enables experts to obtain credible preliminary answers from complex evidence sources from the web.<n>It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation.
arXiv Detail & Related papers (2025-12-01T22:44:43Z)
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning [118.46980291324148]
ATLAS is a large-scale, high-difficulty, and cross-disciplinary evaluation suite composed of approximately 800 original problems.<n>Its key features include: High Originality and Contamination Resistance, with all questions newly created or substantially adapted to prevent test data leakage.<n>Preliminary results on leading models demonstrate ATLAS's effectiveness in differentiating their advanced scientific reasoning capabilities.
arXiv Detail & Related papers (2025-11-18T11:13:06Z)
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method [61.76389719956301]
We contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns time series climate data from ERA5, extreme weather events data from NOAA, and satellite image data from NASA. Under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks.
arXiv Detail & Related papers (2025-04-10T02:22:23Z)
CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ) [14.065907685322097]
CliME is a first-of-its-kind multimodal dataset, comprising 2579 Twitter and Reddit posts. The benchmark features a diverse collection of humorous memes and skeptical posts, capturing how these formats distill complex issues into viral narratives that shape public opinion and policy discussions. We present the Climate Alignment Quotient (CAQ), a novel metric comprising five distinct dimensions: Articulation, Evidence, Resonance, Transition, and Specificity. Our findings, based on the CAQ metric, indicate that while most evaluated LLMs perform relatively well in Criticality and Justice, they consistently underperform on the Actionability axis.
arXiv Detail & Related papers (2025-04-04T20:01:00Z)
AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science [2.7804525903465964]
AtmosSci-Bench is an evaluation benchmark for large language models (LLMs) in atmospheric science.<n>AtmosSci-Bench features a dual-format design comprising both multiple-choice questions (MCQs) and open-ended questions (OEQs)<n>We conduct a comprehensive evaluation of representative LLMs, categorized into four groups: instruction-tuned models, advanced reasoning models, math-augmented models, and domain-specific climate models.
arXiv Detail & Related papers (2025-02-03T08:50:46Z)
ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z)
Advancing Data-driven Weather Forecasting: Time-Sliding Data Augmentation of ERA5 [3.3748750222488657]
We introduce a novel strategy that deviates from the common dependence on high-resolution data. This paper improves on conventional approaches by adding more variables and a novel approach to data augmentation and processing. Our findings reveal that despite the lower resolution, the proposed approach demonstrates considerable accuracy in predicting atmospheric conditions.
arXiv Detail & Related papers (2024-02-13T03:01:22Z)
FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting [56.73502043159699]
This work presents FengWu-GHR, the first data-driven global weather forecasting model running at the 0.09$circ$ horizontal resolution. It introduces a novel approach that opens the door for operating ML-based high-resolution forecasts by inheriting prior knowledge from a low-resolution model. The hindcast of weather prediction in 2022 indicates that FengWu-GHR is superior to the IFS-HRES.
arXiv Detail & Related papers (2024-01-28T13:23:25Z)
ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change [21.827936253363603]
This paper introduces ClimateGPT, a model family of domain-specific large language models that synthesize interdisciplinary research on climate change. We trained two 7B models from scratch on a science-oriented dataset of 300B tokens. ClimateGPT-7B, 13B and 70B are continuously pre-trained from Llama2 on a domain-specific dataset of 4.2B tokens.
arXiv Detail & Related papers (2024-01-17T23:29:46Z)
Climate Change from Large Language Models [7.190384101545232]
Climate change poses grave challenges, demanding widespread understanding and low-carbon lifestyle awareness. Large language models (LLMs) offer a powerful tool to address this crisis. This paper proposes an automated evaluation framework to assess climate-crisis knowledge.
arXiv Detail & Related papers (2023-12-19T09:26:46Z)
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM [77.17254959695218]
Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. We propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning Arabic dataset Clima500-Instruct. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation.
arXiv Detail & Related papers (2023-12-14T22:04:07Z)
ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning [26.151056828513962]
Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives.
arXiv Detail & Related papers (2023-11-07T04:55:36Z)
ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling [20.63843548201849]
ClimateLearn is an open-source library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. It is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems.
arXiv Detail & Related papers (2023-07-04T20:36:01Z)
ClimaX: A foundation model for weather and climate [51.208269971019504]
ClimaX is a deep learning model for weather and climate science. It can be pre-trained with a self-supervised learning objective on climate datasets. It can be fine-tuned to address a breadth of climate and weather tasks.
arXiv Detail & Related papers (2023-01-24T23:19:01Z)
Towards Answering Climate Questionnaires from Unstructured Climate Reports [26.036105166376284]
Activists and policymakers need NLP tools to process the vast and rapidly growing unstructured textual climate reports into structured form. We introduce two new large-scale climate questionnaire datasets and use their existing structure to train self-supervised models. We then use these models to help align texts from unstructured climate documents to the semi-structured questionnaires in a human pilot study.
arXiv Detail & Related papers (2023-01-11T00:22:56Z)
Spatiotemporal modeling of European paleoclimate using doubly sparse Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden. We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z)
Climate-Invariant Machine Learning [0.8831201550856289]
Current climate models require representations of processes that occur at scales smaller than model grid size. Recent machine learning (ML) algorithms hold promise to improve such process representations, but tend to extrapolate poorly to climate regimes they were not trained on. We propose a new framework - termed "climate-invariant" ML - incorporating knowledge of climate processes into ML algorithms.
arXiv Detail & Related papers (2021-12-14T07:02:57Z)
Analyzing Sustainability Reports Using Natural Language Processing [68.8204255655161]
In recent years, companies have increasingly been aiming to both mitigate their environmental impact and adapt to the changing climate context. This is reported via increasingly exhaustive reports, which cover many types of climate risks and exposures under the umbrella of Environmental, Social, and Governance (ESG) We present this tool and the methodology that we used to develop it in the present article.
arXiv Detail & Related papers (2020-11-03T21:22:42Z)
HECT: High-Dimensional Ensemble Consistency Testing for Climate Models [1.7587442088965226]
Climate models play a crucial role in understanding the effect of environmental changes on climate to help mitigate climate risks and inform decisions. Large global climate models such as the Community Earth System Model (CESM), are very complex with millions of lines of code describing interactions of the atmosphere, land, oceans, and ice. Our work uses probabilistics like tree-based algorithms and deep neural networks to perform a statistically rigorous goodness-of-fit test of high-dimensional and man-made data.
arXiv Detail & Related papers (2020-10-08T15:16:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.