Related papers: TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models

TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models

URL: http://arxiv.org/abs/2503.13262v4
Date: Mon, 31 Mar 2025 07:02:55 GMT
Title: TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models
Authors: Deyin Yi, Yihao Liu, Lang Cao, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang,
Abstract summary: We present TablePilot, a pioneering data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results.<n>The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy.<n>We also propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences.
Score: 44.4199653472754
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tabular data analysis is crucial in many scenarios, yet efficiently identifying the most relevant data analysis queries and results for a new table remains a significant challenge. The complexity of tabular data, diverse analytical operations, and the demand for high-quality analysis make the process tedious. To address these challenges, we aim to recommend query-code-result triplets tailored for new tables in tabular data analysis workflows. In this paper, we present TablePilot, a pioneering tabular data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results without relying on user profiles or prior interactions. The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy. Additionally, we propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences. Experiments on DART, a dataset specifically designed for comprehensive tabular data analysis recommendation, demonstrate the effectiveness of our framework. Based on GPT-4o, the tuned TablePilot achieves 77.0% top-5 recommendation recall. Human evaluations further highlight its effectiveness in optimizing tabular data analysis workflows.

Related papers

JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models [58.408398005993455]
JT-DA-8B is a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios.<n>We construct a comprehensive and diverse training corpus with 34 well-defined table reasoning tasks, by aggregating 29 public table QA datasets and 3 million tables.<n> Experimental results show that JT-DA-8B achieves strong performance in various table reasoning tasks.
arXiv Detail & Related papers (2025-12-07T14:29:23Z)
Analytical Survey of Learning with Low-Resource Data: From Analysis to Investigation [192.53529928861818]
Learning with high-resource data has demonstrated substantial success in artificial intelligence (AI)<n>However, the costs associated with data annotation and model training remain significant.<n>This survey employs active sampling theory to analyze the generalization error and label complexity associated with learning from low-resource data.
arXiv Detail & Related papers (2025-10-10T03:15:42Z)
Data Analysis Prediction over Multiple Unseen Datasets: A Vector Embedding Approach [0.3683202928838613]
We propose a novel methodology that infers the outcome of analytics operators by creating a model from datasets similar to the queried one. Our model can project different real-world scenarios to a lower vector embedding representation and distinguish between them.
arXiv Detail & Related papers (2025-02-24T11:21:08Z)
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z)
LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks. We also develop five innovative and effective annotation methods. We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z)
JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization [2.3501230561204522]
JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes. JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling. The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies.
arXiv Detail & Related papers (2023-12-03T07:03:04Z)
TRIAGE: Characterizing and auditing training data for improved regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z)
Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective [3.1468618177952785]
Key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions. We propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective. This allows us to systematically evaluate different and optimal cardinality- and budget-constrained feature selection procedures.
arXiv Detail & Related papers (2023-10-09T07:13:40Z)
ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization [32.06228510098419]
We propose analytical semantics over tables to uncover common analysis pattern behind user-created analyses. Here, we design analytical semantics by separating data focus from user intent, which extract the user motivation from data and human perspective respectively. We also present the recommendation of conditional formatting for the first time, together with chart recommendation, to exemplify intelligent table analysis.
arXiv Detail & Related papers (2022-08-01T13:32:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.