Related papers: DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles

DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles

URL: http://arxiv.org/abs/2308.04076v1
Date: Tue, 8 Aug 2023 06:21:58 GMT
Title: DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles
Authors: Nicole Sultanum, Arjun Srinivasan
Abstract summary: Large language models (LLMs) present an opportunity to assist the authoring of data-driven articles. We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart. Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept.
Score: 19.341156634212364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Authoring data-driven articles is a complex process requiring authors to not only analyze data for insights but also craft a cohesive narrative that effectively communicates the insights. Text generation capabilities of contemporary large language models (LLMs) present an opportunity to assist the authoring of data-driven articles and expedite the writing process. In this work, we investigate the feasibility and perceived value of leveraging LLMs to support authors of data-driven articles. We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart. Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept, from which we distilled affordances and opportunities to further integrate LLMs as valuable data-driven article authoring assistants.

Related papers

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
An AI-Powered Research Assistant in the Lab: A Practical Guide for Text Analysis Through Iterative Collaboration with LLMs [0.7255608805275865]
Here we present a step-by-step tutorial to efficiently develop, test, and apply for analyzing unstructured data using LLMs.<n>We demonstrate how to write prompts to review datasets and generate a taxonomy of life domains, evaluate and refine the taxonomy through prompt and direct modifications, test the taxonomy and assess intercoder agreements, and apply the taxonomy to categorize an entire dataset with high intercoder reliability.
arXiv Detail & Related papers (2025-05-14T18:32:18Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document. Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative. Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z)
DataTales: A Benchmark for Real-World Intelligent Data Narration [26.64184785980865]
DataTales is a benchmark designed to assess the proficiency of language models in data narration. Our findings highlight the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration.
arXiv Detail & Related papers (2024-10-23T13:30:02Z)
Improving Pinterest Search Relevance Using Large Language Models [15.24121687428178]
We integrate Large Language Models (LLMs) into our search relevance model. Our approach uses search queries alongside content representations that include captions extracted from a generative visual language model. We distill from the LLM-based model into real-time servable model architectures and features.
arXiv Detail & Related papers (2024-10-22T16:29:33Z)
PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations. Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios. We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z)
DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts [27.218934418961197]
We introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
arXiv Detail & Related papers (2024-08-09T21:31:33Z)
Large Language Models for Data Annotation: A Survey [49.8318827245266]
The emergence of advanced Large Language Models (LLMs) presents an unprecedented opportunity to automate the complicated process of data annotation. This survey includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation.
arXiv Detail & Related papers (2024-02-21T00:44:04Z)
Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis [6.592797748561459]
Large Language Models (LLMs) have enabled collaborative human-bot interactions in Software Engineering (SE) We introduce a new dimension of scalability and accuracy in qualitative research, potentially transforming data interpretation methodologies in SE.
arXiv Detail & Related papers (2024-02-02T13:10:46Z)
Source Attribution for Large Language Model-Generated Data [57.85840382230037]
It is imperative to be able to perform source attribution by identifying the data provider who contributed to the generation of a synthetic text. We show that this problem can be tackled by watermarking. We propose a source attribution framework that satisfies these key properties due to our algorithmic designs.
arXiv Detail & Related papers (2023-10-01T12:02:57Z)
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process. InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining. In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z)
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks. We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate. We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.