Related papers: Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks

URL: http://arxiv.org/abs/2405.10632v5
Date: Fri, 12 Jul 2024 15:59:53 GMT
Title: Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks
Authors: Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung,
Abstract summary: "Human interaction evaluations" focus on the assessment of human-model interactions. We propose a safety-focused HIE design framework with three stages. We conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
Score: 1.3309842610191835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

Related papers

Interactive AI and Human Behavior: Challenges and Pathways for AI Governance [3.799233413990495]
Generative AI systems increasingly engage in long-term, personal, and relational interactions.<n>These Interactive AI systems adapt to users over time, build ongoing relationships, and even can take proactive actions on behalf of users.<n>This new paradigm requires us to rethink how such human-AI interactions can be studied effectively to inform governance and policy development.
arXiv Detail & Related papers (2025-08-12T19:15:35Z)
Interaction as Intelligence: Deep Research With Human-AI Partnership [25.28272178646003]
"Interaction as Intelligence" research series presents a reconceptualization of human-AI relationships in deep research tasks.<n>We introduce Deep Cognition, a system that transforms the human role from giving instructions to cognitive oversight.
arXiv Detail & Related papers (2025-07-21T16:15:18Z)
Confirmation Bias in Generative AI Chatbots: Mechanisms, Risks, Mitigation Strategies, and Future Research Directions [0.0]
The article analyzes the mechanisms by which confirmation bias may manifest in generative AI chatbots. It assesses the ethical and practical risks associated with such bias, and proposes a range of mitigation strategies. The article concludes by emphasizing the need for interdisciplinary collaboration and empirical evaluation to better understand and address confirmation bias in generative AI systems.
arXiv Detail & Related papers (2025-04-12T21:08:36Z)
The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning [49.32390524168273]
Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions. We introduce a large-scale real-world Human Robot Social Interaction (HSRI) dataset to benchmark the capabilities of language models (LMs) and foundational models (FMs) Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions.
arXiv Detail & Related papers (2025-04-07T06:27:02Z)
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z)
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking. We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert. We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z)
AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents [65.16893197330589]
Large Language Models (LLMs) have demonstrated their ability to replicate human behaviors across a wide range of scenarios. However, their capability in handling complex, multi-character social interactions has yet to be fully explored. We introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.
arXiv Detail & Related papers (2024-01-12T11:18:00Z)
Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review [6.013543974938446]
Leveraging Artificial Intelligence in decision support systems has disproportionately focused on technological advancements. A human-centered perspective attempts to alleviate this concern by designing AI solutions for seamless integration with existing processes.
arXiv Detail & Related papers (2023-10-30T17:46:38Z)
Sociotechnical Safety Evaluation of Generative AI Systems [13.546708226350963]
Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. We propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
arXiv Detail & Related papers (2023-10-18T14:13:58Z)
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation [15.8765167340819]
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment. Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations. This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem.
arXiv Detail & Related papers (2023-09-30T20:54:59Z)
Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors [26.003947740875482]
This paper focuses on assessing the human-likeness of the robot as the primary evaluation metric. Our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and objectivity.
arXiv Detail & Related papers (2023-08-21T20:21:07Z)
Human-AI Coevolution [48.74579595505374]
Coevolution AI is a process in which humans and AI algorithms continuously influence each other. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science.
arXiv Detail & Related papers (2023-06-23T18:10:54Z)
Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP. This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z)
Evaluating Human-Language Model Interaction [79.33022878034627]
We develop a new framework, Human-AI Language-based Interaction Evaluation (HALIE), that defines the components of interactive systems. We design five tasks to cover different forms of interaction: social dialogue, question answering, crossword puzzles, summarization, and metaphor generation. We find that better non-interactive performance does not always translate to better human-LM interaction.
arXiv Detail & Related papers (2022-12-19T18:59:45Z)
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation [136.16507050034755]
Existing human evaluation studies for summarization either exhibit a low inter-annotator agreement or have insufficient scale. We propose a modified summarization salience protocol, Atomic Content Units (ACUs), which is based on fine-grained semantic units. We curate the Robust Summarization Evaluation (RoSE) benchmark, a large human evaluation dataset consisting of 22,000 summary-level annotations over 28 top-performing systems.
arXiv Detail & Related papers (2022-12-15T17:26:05Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
A Mental-Model Centric Landscape of Human-AI Symbiosis [31.14516396625931]
We introduce a significantly general version of human-aware AI interaction scheme, called generalized human-aware interaction (GHAI) We will see how this new framework allows us to capture the various works done in the space of human-AI interaction and identify the fundamental behavioral patterns supported by these works.
arXiv Detail & Related papers (2022-02-18T22:08:08Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
Adversarial Interaction Attack: Fooling AI to Misinterpret Human Intentions [46.87576410532481]
We show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise. Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions. Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications.
arXiv Detail & Related papers (2021-01-17T16:23:20Z)
Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions. We propose two knowledge-based data-driven methods to effectively capture these social interactions. We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.