Related papers: Conversational AI Threads for Visualizing Multidimensional Datasets

Conversational AI Threads for Visualizing Multidimensional Datasets

URL: http://arxiv.org/abs/2311.05590v1
Date: Thu, 9 Nov 2023 18:47:46 GMT
Title: Conversational AI Threads for Visualizing Multidimensional Datasets
Authors: Matt-Heun Hong, Anamaria Crisan
Abstract summary: Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted. Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces.
Score: 10.533569558002798
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted. Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces. We used an LLM to conduct a re-analysis of a prior Wizard-of-Oz study examining the use of chatbots for conducting visual analysis. We surfaced the strengths and weaknesses of LLM-driven analytic chatbots, finding that they fell short in supporting progressive visualization refinements. From these findings, we developed AI Threads, a multi-threaded analytic chatbot that enables analysts to proactively manage conversational context and improve the efficacy of its outputs. We evaluate its usability through a crowdsourced study (n=40) and in-depth interviews with expert analysts (n=10). We further demonstrate the capabilities of AI Threads on a dataset outside the LLM's training corpus. Our findings show the potential of LLMs while also surfacing challenges and fruitful avenues for future research.

Related papers

Potential and Perils of Large Language Models as Judges of Unstructured Textual Data [0.631976908971572]
This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs. Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances.
arXiv Detail & Related papers (2025-01-14T14:49:14Z)
LLM Augmentations to support Analytical Reasoning over Multiple Documents [8.99490805653946]
We investigate the application of large language models (LLMs) to enhance in-depth analytical reasoning within the context of intelligence analysis. We develop an architecture to augment the capabilities of an LLM with a memory module called dynamic evidence trees (DETs) to develop and track multiple investigation threads.
arXiv Detail & Related papers (2024-11-25T06:00:42Z)
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data. We curate a dataset of 40,000 two-person informational interviews from NPR and CNN. LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z)
The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks. This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z)
Automated test generation to evaluate tool-augmented LLMs as conversational AI agents [0.27309692684728615]
We present a test generation pipeline to evaluate conversational AI agents. Our framework uses LLMs to generate diverse tests grounded on user-defined procedures. Our results show that while tool-augmented LLMs perform well in single interactions, they often struggle to handle complete conversations.
arXiv Detail & Related papers (2024-09-24T09:57:43Z)
LLM-Assisted Visual Analytics: Opportunities and Challenges [4.851427485686741]
We explore the integration of large language models (LLMs) into visual analytics (VA) systems. We highlight the new possibilities that LLMs bring to VA, especially how they can change VA processes beyond the usual use cases. We carefully consider the prominent challenges of using current LLMs in VA tasks.
arXiv Detail & Related papers (2024-09-04T13:24:03Z)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z)
Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks. The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z)
Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language. This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z)
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics [32.123919380959485]
Multi-modal large language models (MLLMs) are trained based on large language models (LLM) While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. We show that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment.
arXiv Detail & Related papers (2023-09-13T17:57:21Z)
Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.