Conversational AI Threads for Visualizing Multidimensional Datasets
- URL: http://arxiv.org/abs/2311.05590v1
- Date: Thu, 9 Nov 2023 18:47:46 GMT
- Title: Conversational AI Threads for Visualizing Multidimensional Datasets
- Authors: Matt-Heun Hong, Anamaria Crisan
- Abstract summary: Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted.
Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces.
- Score: 10.533569558002798
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Large Language Models (LLMs) show potential in data analysis, yet
their full capabilities remain uncharted. Our work explores the capabilities of
LLMs for creating and refining visualizations via conversational interfaces. We
used an LLM to conduct a re-analysis of a prior Wizard-of-Oz study examining
the use of chatbots for conducting visual analysis. We surfaced the strengths
and weaknesses of LLM-driven analytic chatbots, finding that they fell short in
supporting progressive visualization refinements. From these findings, we
developed AI Threads, a multi-threaded analytic chatbot that enables analysts
to proactively manage conversational context and improve the efficacy of its
outputs. We evaluate its usability through a crowdsourced study (n=40) and
in-depth interviews with expert analysts (n=10). We further demonstrate the
capabilities of AI Threads on a dataset outside the LLM's training corpus. Our
findings show the potential of LLMs while also surfacing challenges and
fruitful avenues for future research.
Related papers
- LLM Augmentations to support Analytical Reasoning over Multiple Documents [8.99490805653946]
We investigate the application of large language models (LLMs) to enhance in-depth analytical reasoning within the context of intelligence analysis.
We develop an architecture to augment the capabilities of an LLM with a memory module called dynamic evidence trees (DETs) to develop and track multiple investigation threads.
arXiv Detail & Related papers (2024-11-25T06:00:42Z) - NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data.
We curate a dataset of 40,000 two-person informational interviews from NPR and CNN.
LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z) - The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? [60.01746782465275]
Large Language Models (LLMs) have shown capabilities close to human performance in various analytical tasks.
This paper investigates the efficiency and accuracy of LLMs in specialized tasks through a structured user study focusing on Human-LLM partnership.
arXiv Detail & Related papers (2024-10-07T02:30:18Z) - Automated test generation to evaluate tool-augmented LLMs as conversational AI agents [0.27309692684728615]
We present a test generation pipeline to evaluate conversational AI agents.
Our framework uses LLMs to generate diverse tests grounded on user-defined procedures.
Our results show that while tool-augmented LLMs perform well in single interactions, they often struggle to handle complete conversations.
arXiv Detail & Related papers (2024-09-24T09:57:43Z) - LLM-Assisted Visual Analytics: Opportunities and Challenges [4.851427485686741]
We explore the integration of large language models (LLMs) into visual analytics (VA) systems.
We highlight the new possibilities that LLMs bring to VA, especially how they can change VA processes beyond the usual use cases.
We carefully consider the prominent challenges of using current LLMs in VA tasks.
arXiv Detail & Related papers (2024-09-04T13:24:03Z) - CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks.
The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions.
We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Exploring the Potential of Large Language Models in Computational Argumentation [54.85665903448207]
Large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language.
This work aims to embark on an assessment of LLMs, such as ChatGPT, Flan models, and LLaMA2 models, in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-11-15T15:12:15Z) - Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness
and Ethics [32.123919380959485]
Multi-modal large language models (MLLMs) are trained based on large language models (LLM)
While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested.
We show that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment.
arXiv Detail & Related papers (2023-09-13T17:57:21Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.