A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
- URL: http://arxiv.org/abs/2510.23587v1
- Date: Mon, 27 Oct 2025 17:54:07 GMT
- Title: A Survey of Data Agents: Emerging Paradigm or Overstated Hype?
- Authors: Yizhang Zhu, Liangwei Wang, Chenyu Yang, Xiaotian Lin, Boyan Li, Wei Zhou, Xinyu Liu, Zhangyang Peng, Tianqi Luo, Yu Li, Chengliang Chai, Chong Chen, Shimin Di, Ju Fan, Ji Sun, Nan Tang, Fugee Tsung, Jiannan Wang, Chenglin Wu, Yanwei Xu, Shaolei Zhang, Yong Zhang, Xuanhe Zhou, Guoliang Li, Yuyu Luo,
- Abstract summary: "Data agent" currently suffers from terminological ambiguity and inconsistent adoption.<n>This survey introduces the first systematic hierarchical taxonomy for data agents.<n>We conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
- Score: 66.1526688475023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of large language models (LLMs) has spurred the emergence of data agents--autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5), thereby clarifying capability boundaries and responsibility allocation. Through this lens, we offer a structured review of existing research arranged by increasing autonomy, encompassing specialized data agents for data management, preparation, and analysis, alongside emerging efforts toward versatile, comprehensive systems with enhanced autonomy. We further analyze critical evolutionary leaps and technical gaps for advancing data agents, especially the ongoing L2-to-L3 transition, where data agents evolve from procedural execution to autonomous orchestration. Finally, we conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
Related papers
- Dataforge: A Data Agent Platform for Autonomous Data Engineering [22.691284342164334]
Data Agent is a fully autonomous system specialized for tabular data.<n>It automatically performs data cleaning, hierarchical routing, and feature-level optimization through dual feedback loops.<n>It embodies three core principles: automatic, safe, and non-expert friendly, which ensure end-to-end reliability without human supervision.
arXiv Detail & Related papers (2025-11-09T01:58:13Z) - What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - Autonomous Data Agents: A New Opportunity for Smart Data [50.02229219403014]
Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
arXiv Detail & Related papers (2025-09-23T06:46:41Z) - Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First [72.85721148326138]
Large Language Model (LLM) agents are likely to become the dominant workload for data systems in the future.<n>Agentic speculation can pose challenges for present-day data systems.<n>We outline a number of new research opportunities for a new agent-first data systems architecture.
arXiv Detail & Related papers (2025-08-31T21:19:40Z) - A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems [53.37728204835912]
Most existing AI systems rely on manually crafted configurations that remain static after deployment.<n>Recent research has explored agent evolution techniques that aim to automatically enhance agent systems based on interaction data and environmental feedback.<n>This survey aims to provide researchers and practitioners with a systematic understanding of self-evolving AI agents.
arXiv Detail & Related papers (2025-08-10T16:07:32Z) - AgenticData: An Agentic Data Analytics System for Heterogeneous Data [12.67277567222908]
AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
arXiv Detail & Related papers (2025-08-07T03:33:59Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - AI Agents for Ground-Based Gamma Astronomy [0.0]
We present two prototypes that integrate with the Cherenkov Telescope Array Observatory pipelines for operations and offline data analysis.<n>These AI agents offer a transformative approach to system management and data analysis by automating complex tasks and providing intelligent assistance.
arXiv Detail & Related papers (2025-03-02T09:55:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.