ASTA: Learning Analytical Semantics over Tables for Intelligent Data
Analysis and Visualization
- URL: http://arxiv.org/abs/2208.01043v1
- Date: Mon, 1 Aug 2022 13:32:36 GMT
- Title: ASTA: Learning Analytical Semantics over Tables for Intelligent Data
Analysis and Visualization
- Authors: Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang
- Abstract summary: We propose analytical semantics over tables to uncover common analysis pattern behind user-created analyses.
Here, we design analytical semantics by separating data focus from user intent, which extract the user motivation from data and human perspective respectively.
We also present the recommendation of conditional formatting for the first time, together with chart recommendation, to exemplify intelligent table analysis.
- Score: 32.06228510098419
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intelligent analysis and visualization of tables use techniques to
automatically recommend useful knowledge from data, thus freeing users from
tedious multi-dimension data mining. While many studies have succeeded in
automating recommendations through rules or machine learning, it is difficult
to generalize expert knowledge and provide explainable recommendations. In this
paper, we present the recommendation of conditional formatting for the first
time, together with chart recommendation, to exemplify intelligent table
analysis. We propose analytical semantics over tables to uncover common
analysis pattern behind user-created analyses. Here, we design analytical
semantics by separating data focus from user intent, which extract the user
motivation from data and human perspective respectively. Furthermore, the ASTA
framework is designed by us to apply analytical semantics to multiple automated
recommendations. ASTA framework extracts data features by designing signatures
based on expert knowledge, and enables data referencing at field- (chart) or
cell-level (conditional formatting) with pre-trained models. Experiments show
that our framework achieves recall at top 1 of 62.86% on public chart corpora,
outperforming the best baseline about 14%, and achieves 72.31% on the collected
corpus ConFormT, validating that ASTA framework is effective in providing
accurate and explainable recommendations.
Related papers
- TablePilot: Recommending Human-Preferred Tabular Data Analysis with Large Language Models [44.4199653472754]
We present TablePilot, a pioneering data analysis framework leveraging large language models to autonomously generate comprehensive and superior analytical results.
The framework incorporates key designs in analysis preparation and analysis optimization to enhance accuracy.
We also propose Rec-Align, a novel method to further improve recommendation quality and better align with human preferences.
arXiv Detail & Related papers (2025-03-17T15:16:59Z) - InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation [79.09622602860703]
We introduce InsightBench, a benchmark dataset with three key features.
It consists of 100 datasets representing diverse business use cases such as finance and incident management.
Unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics.
arXiv Detail & Related papers (2024-07-08T22:06:09Z) - KET-QA: A Dataset for Knowledge Enhanced Table Question Answering [63.56707527868466]
We propose to use a knowledge base (KB) as the external knowledge source for TableQA.
Every question requires the integration of information from both the table and the sub-graph to be answered.
We design a retriever-reasoner structured pipeline model to extract pertinent information from the vast knowledge sub-graph.
arXiv Detail & Related papers (2024-05-13T18:26:32Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - Instruct and Extract: Instruction Tuning for On-Demand Information
Extraction [86.29491354355356]
On-Demand Information Extraction aims to fulfill the personalized demands of real-world users.
We present a benchmark named InstructIE, inclusive of both automatically generated training data, as well as the human-annotated test set.
Building on InstructIE, we further develop an On-Demand Information Extractor, ODIE.
arXiv Detail & Related papers (2023-10-24T17:54:25Z) - ReTAG: Reasoning Aware Table to Analytic Text Generation [12.603569641254417]
ReTAG is a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output.
We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.
arXiv Detail & Related papers (2023-05-19T17:03:09Z) - A Visual Analytics Approach to Building Logistic Regression Models and
its Application to Health Records [0.0]
We present an open unified approach for generating, evaluating, and applying regression models in high-dimensional data sets.
The approach is based on exposing a broad correlation panorama for attributes, by which the user can select relevant attributes to build and evaluate prediction models.
We demonstrate effectiveness and efficiency of UCReg through the application of our framework to the analysis of Covid-19 and other synthetic and real health records data.
arXiv Detail & Related papers (2022-01-20T19:53:41Z) - Categorical exploratory data analysis on goodness-of-fit issues [0.6091702876917279]
We propose to utilize the data analysis paradigm called Categorical Exploratory Data Analysis (CEDA)
CEDA brings out where and how each data fits or deviates from the model shape via several important distributional aspects.
We make graphic display to illuminate the advantages of using CEDA as one primary way of data analysis in Data Science education.
arXiv Detail & Related papers (2020-11-19T06:11:06Z) - Goal-driven Command Recommendations for Analysts [2.1751694495249914]
We propose a framework to provide goal-driven data command recommendations to the user by leveraging unstructured logs.
We use the log data of a web-based analytics software to train our neural network models and quantify their performance.
We also propose an evaluation metric that captures the degree of goal orientation of the recommendations.
arXiv Detail & Related papers (2020-11-12T07:26:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.