Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
- URL: http://arxiv.org/abs/2502.15620v1
- Date: Fri, 21 Feb 2025 17:44:05 GMT
- Title: Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
- Authors: John Burden, Marko Tešić, Lorenzo Pacchiardi, José Hernández-Orallo,
- Abstract summary: We survey recent work in the AI evaluation landscape and identify six main paradigms.<n>We aim to increase awareness of the breadth of current evaluation approaches and foster cross-pollination between different paradigms.
- Score: 16.361352880545073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in AI evaluation has grown increasingly complex and multidisciplinary, attracting researchers with diverse backgrounds and objectives. As a result, divergent evaluation paradigms have emerged, often developing in isolation, adopting conflicting terminologies, and overlooking each other's contributions. This fragmentation has led to insular research trajectories and communication barriers both among different paradigms and with the general public, contributing to unmet expectations for deployed AI systems. To help bridge this insularity, in this paper we survey recent work in the AI evaluation landscape and identify six main paradigms. We characterise major recent contributions within each paradigm across key dimensions related to their goals, methodologies and research cultures. By clarifying the unique combination of questions and approaches associated with each paradigm, we aim to increase awareness of the breadth of current evaluation approaches and foster cross-pollination between different paradigms. We also identify potential gaps in the field to inspire future research directions.
Related papers
- Towards deployment-centric multimodal AI beyond vision and language [67.02589156099391]
We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions.
We identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases.
By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
arXiv Detail & Related papers (2025-04-04T17:20:05Z) - Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice [57.94036023167952]
We argue that the efforts aiming to study AI's ethical ramifications should be made in tandem with those evaluating its impacts on the environment.
We propose best practices to better integrate AI ethics and sustainability in AI research and practice.
arXiv Detail & Related papers (2025-04-01T13:53:11Z) - Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey [124.23247710880008]
multimodal CoT (MCoT) reasoning has recently garnered significant research attention.
Existing MCoT studies design various methodologies to address the challenges of image, video, speech, audio, 3D, and structured data.
We present the first systematic survey of MCoT reasoning, elucidating the relevant foundational concepts and definitions.
arXiv Detail & Related papers (2025-03-16T18:39:13Z) - On Generalization Across Environments In Multi-Objective Reinforcement Learning [6.686583184622338]
We formalize the concept of generalization in Multi-Objective Reinforcement Learning (MORL) and how it can be evaluated.
We contribute a novel benchmark featuring diverse multi-objective domains with parameterized environment configurations.
Our baseline evaluations of state-of-the-art MORL algorithms on this benchmark reveals limited generalization capabilities, suggesting significant room for improvement.
arXiv Detail & Related papers (2025-03-02T08:50:14Z) - Survey on AI-Generated Media Detection: From Non-MLLM to MLLM [51.91311158085973]
Methods for detecting AI-generated media have evolved rapidly.<n>General-purpose detectors based on MLLMs integrate authenticity verification, explainability, and localization capabilities.<n>Ethical and security considerations have emerged as critical global concerns.
arXiv Detail & Related papers (2025-02-07T12:18:20Z) - The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models [28.743404185915697]
This paper provides a comprehensive overview of recent works on the evaluation of Attitudes, Opinions, Values (AOVs) in Large Language Models (LLMs)
By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences.
arXiv Detail & Related papers (2024-06-16T22:59:18Z) - Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions [101.67121669727354]
Recent advancements in AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment.
The lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment.
We introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML)
arXiv Detail & Related papers (2024-06-13T16:03:25Z) - Understanding the Application of Utility Theory in Robotics and
Artificial Intelligence: A Survey [5.168741399695988]
The utility is a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field.
This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions.
arXiv Detail & Related papers (2023-06-15T18:55:48Z) - Empathetic Conversational Systems: A Review of Current Advances, Gaps,
and Opportunities [2.741266294612776]
A growing number of studies have recognized the benefits of empathy and started to incorporate empathy in conversational systems.
This paper examines this rapidly growing field using five review dimensions.
arXiv Detail & Related papers (2022-05-09T05:19:48Z) - Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep
Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem.
We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.
We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z) - Transdisciplinary AI Observatory -- Retrospective Analyses and
Future-Oriented Contradistinctions [22.968817032490996]
This paper motivates the need for an inherently transdisciplinary AI observatory approach.
Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety.
arXiv Detail & Related papers (2020-11-26T16:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.