Related papers: Supporting Human-AI Collaboration in Auditing LLMs with LLMs

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

URL: http://arxiv.org/abs/2304.09991v3
Date: Thu, 30 Nov 2023 16:30:09 GMT
Title: Supporting Human-AI Collaboration in Auditing LLMs with LLMs
Authors: Charvi Rastogi, Marco Tulio Ribeiro, Nicholas King, Harsha Nori, Saleema Amershi
Abstract summary: Large language models have been shown to be biased and behave irresponsibly. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures.
Score: 33.56822240549913
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures. In this work, we draw upon literature in human-AI collaboration and sensemaking, and conduct interviews with research experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro and Lundberg, 2022), which is powered by a generative large language model (LLM). Through the design process we highlight the importance of sensemaking and human-AI communication to leverage complementary strengths of humans and generative models in collaborative auditing. To evaluate the effectiveness of the augmented tool, AdaTest++, we conduct user studies with participants auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment analysis model. Qualitative analysis shows that AdaTest++ effectively leverages human strengths such as schematization, hypothesis formation and testing. Further, with our tool, participants identified a variety of failures modes, covering 26 different topics over 2 tasks, that have been shown before in formal audits and also those previously under-reported.

Related papers

Bridging Psychometric and Content Development Practices with AI: A Community-Based Workflow for Augmenting Hawaiian Language Assessments [0.0]
The paper presents the design and evaluation of a community-based artificial intelligence (AI) workflow for the Kaiapuni Assessment of Educational Outcomes (K'EO) program.<n>K'EO is the only native language assessment used for federal accountability in the United States.<n>The project explored whether document-grounded language models could ethically and effectively augment human analysis of item performance.
arXiv Detail & Related papers (2025-12-19T00:21:48Z)
Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud [2.659655189346942]
This paper presents a methodology for evaluating AIETs in language models.<n>We selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling.<n>The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model.
arXiv Detail & Related papers (2025-12-16T02:43:37Z)
AIssistant: An Agentic Approach for Human--AI Collaborative Scientific Work on Reviews and Perspectives in Machine Learning [2.464267718050055]
We present here the first experiments with AIssistant for perspective and review research papers in machine learning.<n>Our system integrates modular tools and agents for literature, section-wise experimentation, citation management, and automatic paper text generation.<n>Despite its effectiveness, we identify key limitations, including hallucinated citations, difficulty adapting to dynamic paper structures, and incomplete integration of multimodal content.
arXiv Detail & Related papers (2025-09-14T15:50:31Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Are Large Language Models the future crowd workers of Linguistics? [0.0]
This research aims to answer the question of whether Large Language Models (LLMs) may overcome obstacles if included in empirical linguistic pipelines. The two forced elicitation tasks, originally designed for human participants, are reproduced with the help of OpenAI's GPT-4o-mini model.
arXiv Detail & Related papers (2025-02-14T16:23:39Z)
The Superalignment of Superhuman Intelligence with Large Language Models [63.96120398355404]
We discuss the concept of superalignment from the learning perspective to answer this question. We highlight some key research problems in superalignment, namely, weak-to-strong generalization, scalable oversight, and evaluation. We present a conceptual framework for superalignment, which consists of three modules: an attacker which generates adversary queries trying to expose the weaknesses of a learner model; a learner which will refine itself by learning from scalable feedbacks generated by a critic model along with minimal human experts; and a critic which generates critics or explanations for a given query-response pair, with a target of improving the learner by criticizing.
arXiv Detail & Related papers (2024-12-15T10:34:06Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Lessons from the Trenches on Reproducible Evaluation of Language Models [60.522749986793094]
We draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers. We present the Language Model Evaluation Harness (lm-eval), an open source library for independent, reproducible, and evaluation of language models.
arXiv Detail & Related papers (2024-05-23T16:50:49Z)
Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI [20.21053807133341]
We try to provide an account of what constitutes a human-aware AI system. We see that human-aware AI is a design oriented paradigm, one that focuses on the need for modeling the humans it may interact with.
arXiv Detail & Related papers (2024-05-13T14:17:52Z)
Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work [0.38850145898707145]
Large language models (LLMs) and generative AI tools present challenges in identifying and addressing biases. generative AI tools are susceptible to goal misgeneralization, hallucinations, and adversarial attacks such as red teaming prompts. We find that incorporating generative AI in the process of writing research manuscripts introduces a new type of context-induced algorithmic bias.
arXiv Detail & Related papers (2023-12-04T04:05:04Z)
Personality of AI [0.0]
This research paper delves into the evolving landscape of fine-tuning large language models to align with human users. Acknowledging the impact of training methods on the formation of undefined personality traits in AI models, the study draws parallels with human fitting processes using personality tests. The paper serves as a starting point for discussions and developments in the burgeoning field of AI personality alignment.
arXiv Detail & Related papers (2023-12-03T18:23:45Z)
Can AI Serve as a Substitute for Human Subjects in Software Engineering Research? [24.39463126056733]
This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI) We explore the potential of AI-generated synthetic text as an alternative source of qualitative data. We discuss the prospective development of new foundation models aimed at emulating human behavior in observational studies and user evaluations.
arXiv Detail & Related papers (2023-11-18T14:05:52Z)
Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Towards Fair and Explainable AI using a Human-Centered AI Approach [5.888646114353372]
We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings. The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers. The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets. The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups. The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
arXiv Detail & Related papers (2023-06-12T21:08:55Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.