Autonomous Data Agents: A New Opportunity for Smart Data
- URL: http://arxiv.org/abs/2509.18710v2
- Date: Sat, 04 Oct 2025 00:36:12 GMT
- Title: Autonomous Data Agents: A New Opportunity for Smart Data
- Authors: Yanjie Fu, Dongjie Wang, Wangyang Ying, Xinyuan Wang, Xiangliang Zhang, Huan Liu, Jian Pei,
- Abstract summary: Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
- Score: 50.02229219403014
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between AI and data is essential. However, data is often not structured in ways that are optimal for AI utilization. Moreover, an important question arises: how much knowledge can we pack into data through intensive data operations? Autonomous data agents (DataAgents), which integrate LLM reasoning with task decomposition, action reasoning and grounding, and tool calling, can autonomously interpret data task descriptions, decompose tasks into subtasks, reason over actions, ground actions into python code or tool calling, and execute operations. Unlike traditional data management and engineering tools, DataAgents dynamically plan workflows, call powerful tools, and adapt to diverse data tasks at scale. This report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems. DataAgents are capable of handling collection, integration, preprocessing, selection, transformation, reweighing, augmentation, reprogramming, repairs, and retrieval. Through these capabilities, DataAgents transform complex and unstructured data into coherent and actionable knowledge. We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend. We then define the concept of DataAgents and discuss their architectural design, training strategies, as well as the new skills and capabilities they enable. Finally, we call for concerted efforts to advance action workflow optimization, establish open datasets and benchmark ecosystems, safeguard privacy, balance efficiency with scalability, and develop trustworthy DataAgent guardrails to prevent malicious actions.
Related papers
- Data Science and Technology Towards AGI Part I: Tiered Data Management [53.64581824953229]
We argue that the development of artificial intelligence is entering a new phase of data-model co-evolution.<n>We introduce an L0-L4 tiered data management framework, ranging from raw uncurated resources to organized and verifiable knowledge.<n>We validate the effectiveness of the proposed framework through empirical studies.
arXiv Detail & Related papers (2026-02-09T18:47:51Z) - Dataforge: A Data Agent Platform for Autonomous Data Engineering [22.691284342164334]
Data Agent is a fully autonomous system specialized for tabular data.<n>It automatically performs data cleaning, hierarchical routing, and feature-level optimization through dual feedback loops.<n>It embodies three core principles: automatic, safe, and non-expert friendly, which ensure end-to-end reliability without human supervision.
arXiv Detail & Related papers (2025-11-09T01:58:13Z) - What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - A Survey of Data Agents: Emerging Paradigm or Overstated Hype? [66.1526688475023]
"Data agent" currently suffers from terminological ambiguity and inconsistent adoption.<n>This survey introduces the first systematic hierarchical taxonomy for data agents.<n>We conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
arXiv Detail & Related papers (2025-10-27T17:54:07Z) - Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First [72.85721148326138]
Large Language Model (LLM) agents are likely to become the dominant workload for data systems in the future.<n>Agentic speculation can pose challenges for present-day data systems.<n>We outline a number of new research opportunities for a new agent-first data systems architecture.
arXiv Detail & Related papers (2025-08-31T21:19:40Z) - AgenticData: An Agentic Data Analytics System for Heterogeneous Data [12.67277567222908]
AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
arXiv Detail & Related papers (2025-08-07T03:33:59Z) - Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems [8.816332263275305]
Traditional Data+AI systems rely heavily on human experts to orchestrate system pipelines.<n>Existing Data+AI systems have limited capabilities in semantic understanding, reasoning, and planning.<n>We propose the concept of a 'Data Agent' - a comprehensive architecture designed to orchestrate Data+AI ecosystems.
arXiv Detail & Related papers (2025-07-02T11:04:49Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment.<n>It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering.<n>It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z) - Building Multi-Agent Copilot towards Autonomous Agricultural Data Management and Analysis [2.763670421921841]
We build a proof-of-concept multi-agent system called ADMA Copilot, which can understand user's intent.
ADMA Copilot accomplishes tasks automatically, in which three agents: a LLM based controller, an input formatter and an output formatter collaborate.
arXiv Detail & Related papers (2024-10-31T20:15:14Z) - DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback [62.235925602004535]
DataEnvGym is a testbed of teacher environments for data generation agents.<n>It frames data generation as a sequential decision-making task, involving an agent and a data generation engine.<n>Students are iteratively trained and evaluated on generated data, and their feedback is reported to the agent after each iteration.
arXiv Detail & Related papers (2024-10-08T17:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.