Related papers: Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights

Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights

URL: http://arxiv.org/abs/2407.12826v1
Date: Tue, 2 Jul 2024 13:14:57 GMT
Title: Assessing the Effectiveness of GPT-4o in Climate Change Evidence Synthesis and Systematic Assessments: Preliminary Insights
Authors: Elphin Tom Joe, Sai Dileep Koneru, Christine J Kirchhoff,
Abstract summary: GPT-4o is a state-of-the-art large language model (LLM) We assess the efficacy of GPT-4o to do climate change adaptation related extraction from the scientific literature. Our results indicate that while GPT-4o can achieve high accuracy in low-expertise tasks, their performance in intermediate and high-expertise tasks, such as stakeholder identification and assessment of depth of the adaptation response, is less reliable.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this research short, we examine the potential of using GPT-4o, a state-of-the-art large language model (LLM) to undertake evidence synthesis and systematic assessment tasks. Traditional workflows for such tasks involve large groups of domain experts who manually review and synthesize vast amounts of literature. The exponential growth of scientific literature and recent advances in LLMs provide an opportunity to complementing these traditional workflows with new age tools. We assess the efficacy of GPT-4o to do these tasks on a sample from the dataset created by the Global Adaptation Mapping Initiative (GAMI) where we check the accuracy of climate change adaptation related feature extraction from the scientific literature across three levels of expertise. Our results indicate that while GPT-4o can achieve high accuracy in low-expertise tasks like geographic location identification, their performance in intermediate and high-expertise tasks, such as stakeholder identification and assessment of depth of the adaptation response, is less reliable. The findings motivate the need for designing assessment workflows that utilize the strengths of models like GPT-4o while also providing refinements to improve their performance on these tasks.

Related papers

Large language models streamline automated systematic review: A preliminary study [12.976248955642037]
Large Language Models (LLMs) have shown promise in natural language processing tasks, with the potential to automate systematic reviews. This study evaluates the performance of three state-of-the-art LLMs in conducting systematic review tasks.
arXiv Detail & Related papers (2025-01-09T01:59:35Z)
An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks. We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2024-08-31T07:10:16Z)
AI based Multiagent Approach for Requirements Elicitation and Analysis [3.9422957660677476]
This study empirically investigates the effectiveness of utilizing Large Language Models (LLMs) to automate requirements analysis tasks. We deployed four models, namely GPT-3.5, GPT-4 Omni, LLaMA3-70, and Mixtral-8B, and conducted experiments to analyze requirements on four real-world projects. Preliminary results indicate notable variations in task completion among the models.
arXiv Detail & Related papers (2024-08-18T07:23:12Z)
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks. SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs. GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system. GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z)
Is GPT4 a Good Trader? [12.057320450155835]
Large language models (LLMs) have demonstrated significant capabilities in various planning and reasoning tasks. This study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis.
arXiv Detail & Related papers (2023-09-20T00:47:52Z)
Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise? [0.8924669503280334]
GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators. We demonstrated how to analyze GPT-4's predictions to identify and mitigate deficiencies in annotation guidelines.
arXiv Detail & Related papers (2023-06-24T08:48:24Z)
A GPT-4 Reticular Chemist for Guiding MOF Discovery [0.704345141118018]
We present a new framework integrating the AI model GPT-4 into the iterative process of reticular chemistry experimentation. This GPT-4 Reticular Chemist is an integrated system composed of three phases.
arXiv Detail & Related papers (2023-06-20T05:26:44Z)
Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z)
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities [66.36633042421387]
Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning evaluated. We propose AutoKG, a multi-agent-based approach employing LLMs and external sources for KG construction and reasoning.
arXiv Detail & Related papers (2023-05-22T15:56:44Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.