What's the next frontier for Data-centric AI? Data Savvy Agents
- URL: http://arxiv.org/abs/2511.01015v1
- Date: Sun, 02 Nov 2025 17:09:29 GMT
- Title: What's the next frontier for Data-centric AI? Data Savvy Agents
- Authors: Nabeel Seedat, Jiashuo Liu, Mihaela van der Schaar,
- Abstract summary: We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
- Score: 71.76058707995398
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent surge in AI agents that autonomously communicate, collaborate with humans and use diverse tools has unlocked promising opportunities in various real-world settings. However, a vital aspect remains underexplored: how agents handle data. Scalable autonomy demands agents that continuously acquire, process, and evolve their data. In this paper, we argue that data-savvy capabilities should be a top priority in the design of agentic systems to ensure reliable real-world deployment. Specifically, we propose four key capabilities to realize this vision: (1) Proactive data acquisition: enabling agents to autonomously gather task-critical knowledge or solicit human input to address data gaps; (2) Sophisticated data processing: requiring context-aware and flexible handling of diverse data challenges and inputs; (3) Interactive test data synthesis: shifting from static benchmarks to dynamically generated interactive test data for agent evaluation; and (4) Continual adaptation: empowering agents to iteratively refine their data and background knowledge to adapt to shifting environments. While current agent research predominantly emphasizes reasoning, we hope to inspire a reflection on the role of data-savvy agents as the next frontier in data-centric AI.
Related papers
- Can Agentic AI Match the Performance of Human Data Scientists? [27.236034079837044]
Large language models (LLMs) have significantly automated data science.<n>Can these agentic AI systems truly match the performance of human data scientists?<n>We show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights.
arXiv Detail & Related papers (2025-12-24T05:31:42Z) - Dataforge: A Data Agent Platform for Autonomous Data Engineering [22.691284342164334]
Data Agent is a fully autonomous system specialized for tabular data.<n>It automatically performs data cleaning, hierarchical routing, and feature-level optimization through dual feedback loops.<n>It embodies three core principles: automatic, safe, and non-expert friendly, which ensure end-to-end reliability without human supervision.
arXiv Detail & Related papers (2025-11-09T01:58:13Z) - A Survey of Data Agents: Emerging Paradigm or Overstated Hype? [66.1526688475023]
"Data agent" currently suffers from terminological ambiguity and inconsistent adoption.<n>This survey introduces the first systematic hierarchical taxonomy for data agents.<n>We conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
arXiv Detail & Related papers (2025-10-27T17:54:07Z) - Autonomous Data Agents: A New Opportunity for Smart Data [50.02229219403014]
Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
arXiv Detail & Related papers (2025-09-23T06:46:41Z) - Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First [72.85721148326138]
Large Language Model (LLM) agents are likely to become the dominant workload for data systems in the future.<n>Agentic speculation can pose challenges for present-day data systems.<n>We outline a number of new research opportunities for a new agent-first data systems architecture.
arXiv Detail & Related papers (2025-08-31T21:19:40Z) - DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery [26.388978716803464]
Can AI agents transcend conventional search to systematically discover any dataset that meets specific user requirements?<n>Our benchmark and comprehensive analysis provide the foundation for the next generation of self-improving AI systems.
arXiv Detail & Related papers (2025-08-09T12:15:08Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.