DataTales: Investigating the use of Large Language Models for Authoring
Data-Driven Articles
- URL: http://arxiv.org/abs/2308.04076v1
- Date: Tue, 8 Aug 2023 06:21:58 GMT
- Title: DataTales: Investigating the use of Large Language Models for Authoring
Data-Driven Articles
- Authors: Nicole Sultanum, Arjun Srinivasan
- Abstract summary: Large language models (LLMs) present an opportunity to assist the authoring of data-driven articles.
We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart.
Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept.
- Score: 19.341156634212364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Authoring data-driven articles is a complex process requiring authors to not
only analyze data for insights but also craft a cohesive narrative that
effectively communicates the insights. Text generation capabilities of
contemporary large language models (LLMs) present an opportunity to assist the
authoring of data-driven articles and expedite the writing process. In this
work, we investigate the feasibility and perceived value of leveraging LLMs to
support authors of data-driven articles. We designed a prototype system,
DataTales, that leverages a LLM to generate textual narratives accompanying a
given chart. Using DataTales as a design probe, we conducted a qualitative
study with 11 professionals to evaluate the concept, from which we distilled
affordances and opportunities to further integrate LLMs as valuable data-driven
article authoring assistants.
Related papers
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.
Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - DataTales: A Benchmark for Real-World Intelligent Data Narration [26.64184785980865]
DataTales is a benchmark designed to assess the proficiency of language models in data narration.
Our findings highlight the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration.
arXiv Detail & Related papers (2024-10-23T13:30:02Z) - Improving Pinterest Search Relevance Using Large Language Models [15.24121687428178]
We integrate Large Language Models (LLMs) into our search relevance model.
Our approach uses search queries alongside content representations that include captions extracted from a generative visual language model.
We distill from the LLM-based model into real-time servable model architectures and features.
arXiv Detail & Related papers (2024-10-22T16:29:33Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts [27.218934418961197]
We introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources.
To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents.
While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
arXiv Detail & Related papers (2024-08-09T21:31:33Z) - Large Language Models for Data Annotation: A Survey [49.8318827245266]
The emergence of advanced Large Language Models (LLMs) presents an unprecedented opportunity to automate the complicated process of data annotation.
This survey includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted
Approach for Qualitative Data Analysis [6.592797748561459]
Large Language Models (LLMs) have enabled collaborative human-bot interactions in Software Engineering (SE)
We introduce a new dimension of scalability and accuracy in qualitative research, potentially transforming data interpretation methodologies in SE.
arXiv Detail & Related papers (2024-02-02T13:10:46Z) - Source Attribution for Large Language Model-Generated Data [57.85840382230037]
It is imperative to be able to perform source attribution by identifying the data provider who contributed to the generation of a synthetic text.
We show that this problem can be tackled by watermarking.
We propose a source attribution framework that satisfies these key properties due to our algorithmic designs.
arXiv Detail & Related papers (2023-10-01T12:02:57Z) - Demonstration of InsightPilot: An LLM-Empowered Automated Data
Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process.
InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining.
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.