Supporting Human-AI Collaboration in Auditing LLMs with LLMs
- URL: http://arxiv.org/abs/2304.09991v3
- Date: Thu, 30 Nov 2023 16:30:09 GMT
- Title: Supporting Human-AI Collaboration in Auditing LLMs with LLMs
- Authors: Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori,
Saleema Amershi
- Abstract summary: Large language models have been shown to be biased and behave irresponsibly.
It is crucial to audit these language models rigorously.
Existing auditing tools leverage either or both humans and AI to find failures.
- Score: 33.56822240549913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models are becoming increasingly pervasive and ubiquitous in
society via deployment in sociotechnical systems. Yet these language models, be
it for classification or generation, have been shown to be biased and behave
irresponsibly, causing harm to people at scale. It is crucial to audit these
language models rigorously. Existing auditing tools leverage either or both
humans and AI to find failures. In this work, we draw upon literature in
human-AI collaboration and sensemaking, and conduct interviews with research
experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro
and Lundberg, 2022), which is powered by a generative large language model
(LLM). Through the design process we highlight the importance of sensemaking
and human-AI communication to leverage complementary strengths of humans and
generative models in collaborative auditing. To evaluate the effectiveness of
the augmented tool, AdaTest++, we conduct user studies with participants
auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment
analysis model. Qualitative analysis shows that AdaTest++ effectively leverages
human strengths such as schematization, hypothesis formation and testing.
Further, with our tool, participants identified a variety of failures modes,
covering 26 different topics over 2 tasks, that have been shown before in
formal audits and also those previously under-reported.
Related papers
- Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Lessons from the Trenches on Reproducible Evaluation of Language Models [60.522749986793094]
We draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers.
We present the Language Model Evaluation Harness (lm-eval), an open source library for independent, reproducible, and evaluation of language models.
arXiv Detail & Related papers (2024-05-23T16:50:49Z) - Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI [20.21053807133341]
We try to provide an account of what constitutes a human-aware AI system.
We see that human-aware AI is a design oriented paradigm, one that focuses on the need for modeling the humans it may interact with.
arXiv Detail & Related papers (2024-05-13T14:17:52Z) - Generative AI in Writing Research Papers: A New Type of Algorithmic Bias
and Uncertainty in Scholarly Work [0.38850145898707145]
Large language models (LLMs) and generative AI tools present challenges in identifying and addressing biases.
generative AI tools are susceptible to goal misgeneralization, hallucinations, and adversarial attacks such as red teaming prompts.
We find that incorporating generative AI in the process of writing research manuscripts introduces a new type of context-induced algorithmic bias.
arXiv Detail & Related papers (2023-12-04T04:05:04Z) - Personality of AI [0.0]
This research paper delves into the evolving landscape of fine-tuning large language models to align with human users.
Acknowledging the impact of training methods on the formation of undefined personality traits in AI models, the study draws parallels with human fitting processes using personality tests.
The paper serves as a starting point for discussions and developments in the burgeoning field of AI personality alignment.
arXiv Detail & Related papers (2023-12-03T18:23:45Z) - Can AI Serve as a Substitute for Human Subjects in Software Engineering
Research? [24.39463126056733]
This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI)
We explore the potential of AI-generated synthetic text as an alternative source of qualitative data.
We discuss the prospective development of new foundation models aimed at emulating human behavior in observational studies and user evaluations.
arXiv Detail & Related papers (2023-11-18T14:05:52Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Towards Fair and Explainable AI using a Human-Centered AI Approach [5.888646114353372]
We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings.
The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers.
The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets.
The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups.
The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
arXiv Detail & Related papers (2023-06-12T21:08:55Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.