AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
- URL: http://arxiv.org/abs/2509.21553v1
- Date: Thu, 25 Sep 2025 20:38:23 GMT
- Title: AutoClimDS: Climate Data Science Agentic AI -- A Knowledge Graph is All You Need
- Authors: Ahmed Jaber, Wangshu Zhu, Karthick Jayavelu, Justin Downes, Sameer Mohamed, Candace Agonafir, Linnia Hawkins, Tian Zheng,
- Abstract summary: Climate data science faces persistent barriers stemming from fragmented nature of data sources, heterogeneous formats, and steep technical expertise required to identify, acquire, and process datasets.<n>We present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific research.
- Score: 1.1639172596200853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Climate data science faces persistent barriers stemming from the fragmented nature of data sources, heterogeneous formats, and the steep technical expertise required to identify, acquire, and process datasets. These challenges limit participation, slow discovery, and reduce the reproducibility of scientific workflows. In this paper, we present a proof of concept for addressing these barriers through the integration of a curated knowledge graph (KG) with AI agents designed for cloud-native scientific workflows. The KG provides a unifying layer that organizes datasets, tools, and workflows, while AI agents -- powered by generative AI services -- enable natural language interaction, automated data access, and streamlined analysis. Together, these components drastically lower the technical threshold for engaging in climate data science, enabling non-specialist users to identify and analyze relevant datasets. By leveraging existing cloud-ready API data portals, we demonstrate that "a knowledge graph is all you need" to unlock scalable and agentic workflows for scientific inquiry. The open-source design of our system further supports community contributions, ensuring that the KG and associated tools can evolve as a shared commons. Our results illustrate a pathway toward democratizing access to climate data and establishing a reproducible, extensible framework for human--AI collaboration in scientific research.
Related papers
- The Climate Change Knowledge Graph: Supporting Climate Services [33.331299436929946]
The Climate Change Knowledge Graph is designed to integrate diverse data sources related to climate simulations into a coherent knowledge graph.<n>This innovative resource allows for executing complex queries involving climate models, simulations, variables, configurations,temporal domains, and granularities.
arXiv Detail & Related papers (2026-02-23T12:42:05Z) - Can Agentic AI Match the Performance of Human Data Scientists? [27.236034079837044]
Large language models (LLMs) have significantly automated data science.<n>Can these agentic AI systems truly match the performance of human data scientists?<n>We show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights.
arXiv Detail & Related papers (2025-12-24T05:31:42Z) - What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - Autonomous Data Agents: A New Opportunity for Smart Data [51.530320431847834]
Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
arXiv Detail & Related papers (2025-09-23T06:46:41Z) - A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [221.34650992288505]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z) - Research Knowledge Graphs in NFDI4DataScience: Key Activities, Achievements, and Future Directions [4.258678191793365]
NFDI4DataScience is developing and providing Research Knowledge Graphs (RKGs)<n>RKGs aim to capture and connect complex datasets, models, software, and scientific publications.
arXiv Detail & Related papers (2025-08-04T11:11:51Z) - A Self-Evolving AI Agent System for Climate Science [59.08800209508371]
We introduce EarthLink, the first self-evolving AI agent system designed as an interactive "copilot" for Earth scientists.<n>Through natural language interaction, EarthLink automates the entire research workflow by integrating planning, code execution, data analysis, and physical reasoning.<n>It exhibits human-like cross-disciplinary analytical ability and proficiency comparable to a junior researcher in expert evaluations on core large-scale climate tasks.
arXiv Detail & Related papers (2025-07-23T08:29:25Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - A GenAI System for Improved FAIR Independent Biological Database Integration [0.0]
We introduce an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases.<n> FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases, and generate executable queries.<n>The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered.
arXiv Detail & Related papers (2025-06-22T08:04:24Z) - Capturing and Anticipating User Intents in Data Analytics via Knowledge Graphs [0.061446808540639365]
This work explores the usage of Knowledge Graphs (KG) as a basic framework for capturing a human-centered manner complex analytics.
The data stored in the generated KG can then be exploited to provide assistance (e.g., recommendations) to the users interacting with these systems.
arXiv Detail & Related papers (2024-11-01T20:45:23Z) - DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis.
Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.