DataTales: A Benchmark for Real-World Intelligent Data Narration
- URL: http://arxiv.org/abs/2410.17859v1
- Date: Wed, 23 Oct 2024 13:30:02 GMT
- Title: DataTales: A Benchmark for Real-World Intelligent Data Narration
- Authors: Yajing Yang, Qian Liu, Min-Yen Kan,
- Abstract summary: DataTales is a benchmark designed to assess the proficiency of language models in data narration.
Our findings highlight the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration.
- Score: 26.64184785980865
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.
Related papers
- MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model [1.33134751838052]
This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling.
The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights.
arXiv Detail & Related papers (2025-01-02T02:35:38Z) - Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries [0.0]
We evaluate OpenAI's GPT-3.5 as a "Language Data Scientist" (LDS)
The model was tested on a diverse set of benchmark datasets to evaluate its performance across multiple standards.
arXiv Detail & Related papers (2024-03-29T22:59:34Z) - Benchmarking Data Science Agents [11.582116078653968]
Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing.
Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process.
We introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents.
arXiv Detail & Related papers (2024-02-27T03:03:06Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - DataTales: Investigating the use of Large Language Models for Authoring
Data-Driven Articles [19.341156634212364]
Large language models (LLMs) present an opportunity to assist the authoring of data-driven articles.
We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart.
Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept.
arXiv Detail & Related papers (2023-08-08T06:21:58Z) - Weakly Supervised Data Augmentation Through Prompting for Dialogue
Understanding [103.94325597273316]
We present a novel approach that iterates on augmentation quality by applying weakly-supervised filters.
We evaluate our methods on the emotion and act classification tasks in DailyDialog and the intent classification task in Facebook Multilingual Task-Oriented Dialogue.
For DailyDialog specifically, using 10% of the ground truth data we outperform the current state-of-the-art model which uses 100% of the data.
arXiv Detail & Related papers (2022-10-25T17:01:30Z) - ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning [85.33459673197149]
We introduce a new Reading dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations.
In this paper, we propose to identify biased data points and separate them into EASY set and the rest as HARD set.
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models.
arXiv Detail & Related papers (2020-02-11T11:54:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.