Can AI autonomously build, operate, and use the entire data stack?
- URL: http://arxiv.org/abs/2512.07926v1
- Date: Mon, 08 Dec 2025 18:59:01 GMT
- Title: Can AI autonomously build, operate, and use the entire data stack?
- Authors: Arvind Agarwal, Lisa Amini, Sameep Mehta, Horst Samulowitz, Kavitha Srinivas,
- Abstract summary: We argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle.<n>We explore how each stage of the modern data stack can be autonomously managed by intelligent agents to build self-sufficient systems.
- Score: 16.22441719281088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific persona, such as data engineers and stewards, to navigate and configure the data stack, they fall far short of full automation. However, as AI becomes increasingly capable of tackling tasks that have previously resisted automation due to inherent complexities, we believe there is an imminent opportunity to target fully autonomous data estates. Currently, AI is used in different parts of the data stack, but in this paper, we argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle. Towards that end, we explore how each stage of the modern data stack can be autonomously managed by intelligent agents to build self-sufficient systems that can be used not only by human end-users, but also by AI itself. We begin by describing the mounting forces and opportunities that demand this paradigm shift, examine how agents can streamline the data lifecycle, and highlight open questions and areas where additional research is needed. We hope this work will inspire lively debate, stimulate further research, motivate collaborative approaches, and facilitate a more autonomous future for data systems.
Related papers
- Can Agentic AI Match the Performance of Human Data Scientists? [27.236034079837044]
Large language models (LLMs) have significantly automated data science.<n>Can these agentic AI systems truly match the performance of human data scientists?<n>We show that agentic AI that relies on generic analytics workflow falls short of methods that use domain-specific insights.
arXiv Detail & Related papers (2025-12-24T05:31:42Z) - What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z) - A Survey of Data Agents: Emerging Paradigm or Overstated Hype? [66.1526688475023]
"Data agent" currently suffers from terminological ambiguity and inconsistent adoption.<n>This survey introduces the first systematic hierarchical taxonomy for data agents.<n>We conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
arXiv Detail & Related papers (2025-10-27T17:54:07Z) - Autonomous Data Agents: A New Opportunity for Smart Data [50.02229219403014]
Report argues that DataAgents represent a paradigm shift toward autonomous data-to-knowledge systems.<n>DataAgents transform complex and unstructured data into coherent and actionable knowledge.<n>We first examine why the convergence of agentic AI and data-to-knowledge systems has emerged as a critical trend.
arXiv Detail & Related papers (2025-09-23T06:46:41Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - NeurDB: An AI-powered Autonomous Data System [44.14807794638682]
We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component.
We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.
arXiv Detail & Related papers (2024-05-07T00:51:48Z) - Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [49.28944613907541]
Industries such as finance, meteorology, and energy generate vast amounts of data daily.<n>We propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests.
arXiv Detail & Related papers (2023-06-12T16:12:56Z) - Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling [73.14508303965683]
We interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI.<n>Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons.
arXiv Detail & Related papers (2023-04-17T15:30:05Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - Embodied AI-Driven Operation of Smart Cities: A Concise Review [3.441021278275805]
Embodied AI focuses on learning through interaction with the surrounding environment.
We will go through its definitions, its characteristics, and its current achievements along with different algorithms, approaches, and solutions.
We will then explore all the available simulators and 3D interactable databases that will make the research in this area feasible.
arXiv Detail & Related papers (2021-08-22T19:14:59Z) - Sensor Artificial Intelligence and its Application to Space Systems -- A
White Paper [35.78525324168878]
The goal of this white paper is to establish "Sensor AI" as a dedicated research topic.
A closer look at the sensors and their physical properties within AI approaches will lead to more robust and widely applicable algorithms.
Sensor AI will play a decisive role in autonomous driving as well as in areas of automated production, predictive maintenance or space research.
arXiv Detail & Related papers (2020-06-09T14:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.