AgenticData: An Agentic Data Analytics System for Heterogeneous Data
- URL: http://arxiv.org/abs/2508.05002v1
- Date: Thu, 07 Aug 2025 03:33:59 GMT
- Title: AgenticData: An Agentic Data Analytics System for Heterogeneous Data
- Authors: Ji Sun, Guoliang Li, Peiyao Zhou, Yihui Ma, Jingzhe Xu, Yuan Li,
- Abstract summary: AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
- Score: 12.67277567222908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic data analytics system that allows users to simply pose natural language (NL) questions while autonomously analyzing data sources across multiple domains, including both unstructured and structured data. First, AgenticData employs a feedback-driven planning technique that automatically converts an NL query into a semantic plan composed of relational and semantic operators. We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context and long-term knowledge. Second, we propose a semantic optimization model to refine and execute semantic plans effectively. Our system, AgenticData, has been tested using three benchmarks. Experimental results showed that AgenticData achieved superior accuracy on both easy and difficult tasks, significantly outperforming state-of-the-art methods.
Related papers
- I2I-STRADA -- Information to Insights via Structured Reasoning Agent for Data Analysis [0.0]
Real-world data analysis requires a consistent cognitive workflow.<n>We introduce I2I-STRADA, an agentic architecture designed to formalize this reasoning process.
arXiv Detail & Related papers (2025-07-23T18:58:42Z) - Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems [8.816332263275305]
Traditional Data+AI systems rely heavily on human experts to orchestrate system pipelines.<n>Existing Data+AI systems have limited capabilities in semantic understanding, reasoning, and planning.<n>We propose the concept of a 'Data Agent' - a comprehensive architecture designed to orchestrate Data+AI ecosystems.
arXiv Detail & Related papers (2025-07-02T11:04:49Z) - Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z) - TAIJI: MCP-based Multi-Modal Data Analytics on Data Lakes [25.05627023905607]
We envision a new multi-modal data analytics system based on the Model Context Protocol (MCP)<n>First, we define a semantic operator hierarchy tailored for querying multi-modal data in data lakes.<n>Next, we introduce an MCP-based execution framework, in which each MCP server hosts specialized foundation models optimized for specific data modalities.
arXiv Detail & Related papers (2025-05-16T14:03:30Z) - DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment.<n>It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering.<n>It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.<n>It consists of 100 datasets representing diverse business use cases such as finance and incident management.<n>Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems [10.71630696651595]
Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks have garnered significant interest within database and AI communities.
silos of multimodal data sources make it difficult to identify appropriate data sources for accomplishing the task at hand.
We propose CMDBench, a benchmark modeling the complexity of enterprise data platforms.
arXiv Detail & Related papers (2024-06-02T01:10:41Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [84.7788065721689]
In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks.
This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files.
Building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench.
arXiv Detail & Related papers (2024-01-10T19:04:00Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.