QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis
- URL: http://arxiv.org/abs/2410.10270v3
- Date: Mon, 21 Oct 2024 08:13:45 GMT
- Title: QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis
- Authors: Abhijit Manatkar, Ashlesha Akella, Parthivi Gupta, Krishnasuri Narayanam,
- Abstract summary: We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen)
The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.
- Score: 1.9521598508325781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discovering meaningful insights from a large dataset, known as Exploratory Data Analysis (EDA), is a challenging task that requires thorough exploration and analysis of the data. Automated Data Exploration (ADE) systems use goal-oriented methods with Large Language Models and Reinforcement Learning towards full automation. However, these methods require human involvement to anticipate goals that may limit insight extraction, while fully automated systems demand significant computational resources and retraining for new datasets. We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen). The QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples. The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.
Related papers
- Towards automated data analysis: A guided framework for LLM-based risk estimation [0.0]
Large Language Models (LLMs) are increasingly integrated into critical decision-making pipelines.<n>This work proposes a framework for dataset risk estimation that integrates Generative AI under human guidance and supervision.
arXiv Detail & Related papers (2026-03-04T21:44:22Z) - Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z) - A Survey of Data Agents: Emerging Paradigm or Overstated Hype? [66.1526688475023]
"Data agent" currently suffers from terminological ambiguity and inconsistent adoption.<n>This survey introduces the first systematic hierarchical taxonomy for data agents.<n>We conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.
arXiv Detail & Related papers (2025-10-27T17:54:07Z) - DeepAnalyze: Agentic Large Language Models for Autonomous Data Science [35.69385623867138]
We introduce DeepAnalyze-8B, the first agentic designed for autonomous data science.<n>We propose a curriculum-based agentic training paradigm that emulates the learning trajectory of human data scientists.<n>We also introduce a data-grounded trajectory synthesis framework that constructs high-quality training data.
arXiv Detail & Related papers (2025-10-19T15:13:42Z) - CoDA: Agentic Systems for Collaborative Data Visualization [57.270599188947294]
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations.<n>Existing approaches, including simple single- or multi-agent systems, often oversimplify the task.<n>We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection.
arXiv Detail & Related papers (2025-10-03T17:30:16Z) - DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery [26.388978716803464]
Can AI agents transcend conventional search to systematically discover any dataset that meets specific user requirements?<n>Our benchmark and comprehensive analysis provide the foundation for the next generation of self-improving AI systems.
arXiv Detail & Related papers (2025-08-09T12:15:08Z) - Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents [11.783547185760007]
Large Language Models (LLMs) are increasingly used as assistants for data science.<n> Proper automation of some data-science activities is now promised by the rise of LLM agents.
arXiv Detail & Related papers (2025-06-10T13:47:22Z) - AI Agents for Ground-Based Gamma Astronomy [0.0]
We present two prototypes that integrate with the Cherenkov Telescope Array Observatory pipelines for operations and offline data analysis.
These AI agents offer a transformative approach to system management and data analysis by automating complex tasks and providing intelligent assistance.
arXiv Detail & Related papers (2025-03-02T09:55:54Z) - Making Sense of Data in the Wild: Data Analysis Automation at Scale [0.1747623282473278]
We propose a novel approach that combines intelligent agents with retrieval augmented generation to automate data analysis, dataset curation and indexing at scale.
We demonstrate that our approach results in more detailed dataset descriptions, higher hit rates and greater diversity in dataset retrieval tasks.
arXiv Detail & Related papers (2025-01-27T10:04:10Z) - ILAEDA: An Imitation Learning Based Approach for Automatic Exploratory Data Analysis [5.012314384895538]
We argue that not all of the essential features of what makes an operation important can be accurately captured mathematically using rewards.
We propose an AutoEDA model trained through imitation learning from expert EDA sessions, bypassing the need for manually defined interestingness measures.
Our method outperforms the existing state-of-the-art end-to-end EDA approach on benchmarks by upto 3x, showing strong performance and generalization.
arXiv Detail & Related papers (2024-10-15T04:56:13Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - Automated data processing and feature engineering for deep learning and big data applications: a survey [0.0]
Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data.
Not all data processing tasks in conventional deep learning pipelines have been automated.
arXiv Detail & Related papers (2024-03-18T01:07:48Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Design & Implementation of Automatic Machine Condition Monitoring and
Maintenance System in Limited Resource Situations [0.0]
In the era of the fourth industrial revolution, it is essential to automate fault detection and diagnosis of machineries.
Some machines health monitoring systems are used globally but they are expensive and need trained personnel to operate and analyse.
Predictive maintenance and occupational health and safety culture are not available due to inadequate infrastructure, lack of skilled manpower, financial crisis, and others in developing countries.
arXiv Detail & Related papers (2024-01-22T08:06:04Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future [130.87142103774752]
This review systematically assesses over seventy open-source autonomous driving datasets.
It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets.
It also delves into the scientific and technical challenges that warrant resolution.
arXiv Detail & Related papers (2023-12-06T10:46:53Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Deep Transfer Learning for Automatic Speech Recognition: Towards Better
Generalization [3.6393183544320236]
Speech recognition has become an important challenge when using deep learning (DL)
It requires large-scale training datasets and high computational and storage resources.
Deep transfer learning (DTL) has been introduced to overcome these issues.
arXiv Detail & Related papers (2023-04-27T21:08:05Z) - How Can Subgroup Discovery Help AIOps? [0.0]
We study how Subgroup Discovery can help AIOps.
This project involves both data mining researchers and practitioners from Infologic, a French software editor.
arXiv Detail & Related papers (2021-09-10T14:41:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.