TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models
- URL: http://arxiv.org/abs/2503.13262v4
- Date: Mon, 31 Mar 2025 07:02:55 GMT
- Title: TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models
- Authors: Deyin Yi, Yihao Liu, Lang Cao, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang,
- Abstract summary: We present TablePilot, a pioneering data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results.<n>The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy.<n>We also propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences.
- Score: 44.4199653472754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data analysis is crucial in many scenarios, yet efficiently identifying the most relevant data analysis queries and results for a new table remains a significant challenge. The complexity of tabular data, diverse analytical operations, and the demand for high-quality analysis make the process tedious. To address these challenges, we aim to recommend query-code-result triplets tailored for new tables in tabular data analysis workflows. In this paper, we present TablePilot, a pioneering tabular data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results without relying on user profiles or prior interactions. The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy. Additionally, we propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences. Experiments on DART, a dataset specifically designed for comprehensive tabular data analysis recommendation, demonstrate the effectiveness of our framework. Based on GPT-4o, the tuned TablePilot achieves 77.0% top-5 recommendation recall. Human evaluations further highlight its effectiveness in optimizing tabular data analysis workflows.
Related papers
- Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach [0.3683202928838613]
We propose a novel methodology that infers the outcome of analytics operators by creating a model from datasets similar to the queried one.
Our model can project different real-world scenarios to a lower vector embedding representation and distinguish between them.
arXiv Detail & Related papers (2025-02-24T11:21:08Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - JarviX: A LLM No code Platform for Tabular Data Analysis and
Optimization [2.3501230561204522]
JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes.
JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling.
The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies.
arXiv Detail & Related papers (2023-12-03T07:03:04Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Cost-Sensitive Best Subset Selection for Logistic Regression: A
Mixed-Integer Conic Optimization Perspective [3.1468618177952785]
Key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions.
We propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective.
This allows us to systematically evaluate different and optimal cardinality- and budget-constrained feature selection procedures.
arXiv Detail & Related papers (2023-10-09T07:13:40Z) - ASTA: Learning Analytical Semantics over Tables for Intelligent Data
Analysis and Visualization [32.06228510098419]
We propose analytical semantics over tables to uncover common analysis pattern behind user-created analyses.
Here, we design analytical semantics by separating data focus from user intent, which extract the user motivation from data and human perspective respectively.
We also present the recommendation of conditional formatting for the first time, together with chart recommendation, to exemplify intelligent table analysis.
arXiv Detail & Related papers (2022-08-01T13:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.