Related papers: Designing a Dashboard for Transparency and Control of Conversational AI

Designing a Dashboard for Transparency and Control of Conversational AI

URL: http://arxiv.org/abs/2406.07882v3
Date: Mon, 14 Oct 2024 17:46:28 GMT
Title: Designing a Dashboard for Transparency and Control of Conversational AI
Authors: Yida Chen, Aoyu Wu, Trevor DePodesta, Catherine Yeh, Kenneth Li, Nicholas Castillo Marin, Oam Patel, Jan Riecke, Shivam Raval, Olivia Seow, Martin Wattenberg, Fernanda Viégas,
Abstract summary: We present an end-to-end prototype-connecting interpretability techniques with user experience design. Our results suggest that users appreciate seeing internal states, which helped them expose biased behavior and increased their sense of control.
Score: 39.01999161106776
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conversational LLMs function as black box systems, leaving users guessing about why they see the output they do. This lack of transparency is potentially problematic, especially given concerns around bias and truthfulness. To address this issue, we present an end-to-end prototype-connecting interpretability techniques with user experience design-that seeks to make chatbots more transparent. We begin by showing evidence that a prominent open-source LLM has a "user model": examining the internal state of the system, we can extract data related to a user's age, gender, educational level, and socioeconomic status. Next, we describe the design of a dashboard that accompanies the chatbot interface, displaying this user model in real time. The dashboard can also be used to control the user model and the system's behavior. Finally, we discuss a study in which users conversed with the instrumented system. Our results suggest that users appreciate seeing internal states, which helped them expose biased behavior and increased their sense of control. Participants also made valuable suggestions that point to future directions for both design and machine learning research. The project page and video demo of our TalkTuner system are available at https://bit.ly/talktuner-project-page

Related papers

RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts [6.0385743836962025]
RICoTA is a Korean red teaming dataset that consists of 609 prompts challenging large language models (LLMs) We utilize user-chatbot conversations that were self-posted on a Korean Reddit-like community. Our dataset will be made publicly available via GitHub.
arXiv Detail & Related papers (2025-01-29T15:32:27Z)
Large Language Models for User Interest Journeys [14.219969535206861]
Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation. This paper argues that LLMs can reason through user activities, and describe their interests in nuanced and interesting ways, similar to how a human would. We introduce a framework in which we first perform personalized extraction of interest journeys, and then summarize the extracted journeys via LLMs.
arXiv Detail & Related papers (2023-05-24T18:40:43Z)
Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios. Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system. We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z)
The System Model and the User Model: Exploring AI Dashboard Design [79.81291473899591]
We argue that sophisticated AI systems should have dashboards, just like all other complicated devices. We conjecture that, for many systems, the two most important models will be of the user and of the system itself. Finding ways to identify, interpret, and display these two models should be a core part of interface research for AI.
arXiv Detail & Related papers (2023-05-04T00:22:49Z)
First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization [112.40598205054994]
We formalize this idea as a completely unsupervised objective for optimizing interfaces. We conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains.
arXiv Detail & Related papers (2022-05-24T21:57:18Z)
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback [83.95599156217945]
We focus on assistive typing applications in which a user cannot operate a keyboard, but can supply other inputs. Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes. We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user.
arXiv Detail & Related papers (2022-03-04T00:07:20Z)
GANSlider: How Users Control Generative Models for Images using Multiple Sliders with and without Feedforward Information [33.28541180149195]
We investigate how multiple sliders with and without feedforward visualizations influence users' control of generative models. We found that more control dimensions (sliders) significantly increase task difficulty and user actions. Visualization alone are not always sufficient for users to understand individual control dimensions.
arXiv Detail & Related papers (2022-02-02T11:25:07Z)
GenNI: Human-AI Collaboration for Data-Backed Text Generation [102.08127062293111]
Table2Text systems generate textual output based on structured data utilizing machine learning. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text.
arXiv Detail & Related papers (2021-10-19T18:07:07Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions [36.00476428803116]
In this work, we attempt to build a user-centric dialogue system. We first model the user preferences as estimated distributions over the system ontology and map the users' utterances to such distributions. We build a new dataset named NUANCED that focuses on such realistic settings for conversational recommendation.
arXiv Detail & Related papers (2020-10-24T03:23:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.