Related papers: AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery

AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery

URL: http://arxiv.org/abs/2504.07421v1
Date: Thu, 10 Apr 2025 03:27:25 GMT
Title: AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Authors: Amirhossein Abaskohi, Amrutha Varshini Ramesh, Shailesh Nanisetty, Chirag Goel, David Vazquez, Christopher Pal, Spandana Gella, Giuseppe Carenini, Issam H. Laradji,
Abstract summary: We introduce AgentAda, the first analytics agent that can learn and use new analytics skills to extract more specialized insights.<n>Unlike existing methods that require users to manually decide which data analytics method to apply, AgentAda automatically identifies the skill needed to perform the analysis.<n>We conducted a human evaluation demonstrating that AgentAda provides more insightful analytics than existing tools, with 48.78% of evaluators preferring its analyses, compared to 27.67% for the unskilled agent.
Score: 20.333502467911828
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce AgentAda, the first LLM-powered analytics agent that can learn and use new analytics skills to extract more specialized insights. Unlike existing methods that require users to manually decide which data analytics method to apply, AgentAda automatically identifies the skill needed from a library of analytical skills to perform the analysis. This also allows AgentAda to use skills that existing LLMs cannot perform out of the box. The library covers a range of methods, including clustering, predictive modeling, and NLP techniques like BERT, which allow AgentAda to handle complex analytics tasks based on what the user needs. AgentAda's dataset-to-insight extraction strategy consists of three key steps: (I) a question generator to generate queries relevant to the user's goal and persona, (II) a hybrid Retrieval-Augmented Generation (RAG)-based skill matcher to choose the best data analytics skill from the skill library, and (III) a code generator that produces executable code based on the retrieved skill's documentation to extract key patterns. We also introduce KaggleBench, a benchmark of curated notebooks across diverse domains, to evaluate AgentAda's performance. We conducted a human evaluation demonstrating that AgentAda provides more insightful analytics than existing tools, with 48.78% of evaluators preferring its analyses, compared to 27.67% for the unskilled agent. We also propose a novel LLM-as-a-judge approach that we show is aligned with human evaluation as a way to automate insight quality evaluation at larger scale.

Related papers

AgenticData: An Agentic Data Analytics System for Heterogeneous Data [12.67277567222908]
AgenticData is an agentic data analytics system that allows users to pose natural language (NL) questions while autonomously analyzing data sources across multiple domains.<n>We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context.
arXiv Detail & Related papers (2025-08-07T03:33:59Z)
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics [2.7933239275667545]
We present an agentic system that automates the data-to-dashboard pipeline through modular LLM agents.<n>Unlike existing chart systems, our framework simulates the analytical reasoning process of business analysts.<n>Our approach shows improved insightfulness, domain relevance, and analytical depth, as measured by tailored evaluation metrics.
arXiv Detail & Related papers (2025-05-29T17:32:15Z)
IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z)
Agent-centric Information Access [21.876205078570507]
Large language models (LLMs) become more specialized, each trained on proprietary data and excelling in specific domains. This paper introduces a framework for agent-centric information access, where LLMs function as knowledge agents that are dynamically ranked and queried based on their demonstrated expertise. We propose a scalable evaluation framework that leverages retrieval-augmented generation and clustering techniques to construct and assess thousands of specialized models, with the potential to scale toward millions.
arXiv Detail & Related papers (2025-02-26T16:56:19Z)
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z)
LAMBDA: A Large Model Based Data Agent [7.240586338370509]
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system. LAMBDA is designed to address data analysis challenges in complex data-driven applications. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence.
arXiv Detail & Related papers (2024-07-24T06:26:36Z)
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.<n>It consists of 100 datasets representing diverse business use cases such as finance and incident management.<n>Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations. Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z)
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [84.7788065721689]
In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files. Building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench.
arXiv Detail & Related papers (2024-01-10T19:04:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.