Related papers: Hues and Cues: Human vs. CLIP

Hues and Cues: Human vs. CLIP

URL: http://arxiv.org/abs/2509.02305v2
Date: Wed, 03 Sep 2025 09:16:08 GMT
Title: Hues and Cues: Human vs. CLIP
Authors: Nuria Alabau-Bosque, Jorge Vila-Tomás, Paula Daudén-Oliver, Pablo Hernández-Cámara, Jose Manuel Jaén-Lorites, Valero Laparra, Jesús Malo,
Abstract summary: This work proposes a new approach to evaluate artificial models via board games.<n>We test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues.
Score: 2.51105685855894
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Playing games is inherently human, and a lot of games are created to challenge different human characteristics. However, these tasks are often left out when evaluating the human-like nature of artificial models. The objective of this work is proposing a new approach to evaluate artificial models via board games. To this effect, we test the color perception and color naming capabilities of CLIP by playing the board game Hues & Cues and assess its alignment with humans. Our experiments show that CLIP is generally well aligned with human observers, but our approach brings to light certain cultural biases and inconsistencies when dealing with different abstraction levels that are hard to identify with other testing strategies. Our findings indicate that assessing models with different tasks like board games can make certain deficiencies in the models stand out in ways that are difficult to test with the commonly used benchmarks.

Related papers

People use fast, flat goal-directed simulation to reason about novel problems [68.55490343866545]
We show that people are systematic and adaptively rational in how they play a game for the first time.<n>We explain these capacities via a computational cognitive model that we call the "Intuitive Gamer"<n>Our work offers new insights into how people rapidly evaluate, act, and make suggestions when encountering novel problems.
arXiv Detail & Related papers (2025-10-13T15:12:08Z)
COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation [0.0]
This paper introduces the Human Perception-Based Fuzzy Color Model, COLIBRI, to bridge the gap between computational color representations and human visual perception.<n>The proposed model uses fuzzy sets and logic to create a framework for color categorization.<n>Our findings are significant for fields such as design, artificial intelligence, marketing, and human-computer interaction.
arXiv Detail & Related papers (2025-07-15T17:01:45Z)
CogniPlay: a work-in-progress Human-like model for General Game Playing [0.5524804393257919]
This paper presents an overview of findings from cognitive psychology and previous efforts to model human-like behavior in artificial agents.<n>It discusses their applicability to General Game Playing (GGP) and introduces our work-in-progress model based on these observations: CogniPlay.
arXiv Detail & Related papers (2025-07-08T10:48:29Z)
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests [89.09172401497213]
We examine three evaluation paradigms: standard benchmarks, interactive games, and cognitive tests.<n>Our analyses reveal that interactive games are superior to standard benchmarks in discriminating models.<n>We advocate for the development of new interactive benchmarks and targeted cognitive tasks inspired by assessing human abilities.
arXiv Detail & Related papers (2025-02-20T08:36:58Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape. We collect 35K trials of behavioral data from over 500 participants. We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z)
Generation of Games for Opponent Model Differentiation [2.164100958962259]
Previous results show that modeling human behavior can significantly improve the performance of the algorithms. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. We created a novel model that links its parameters to psychological traits.
arXiv Detail & Related papers (2023-11-28T13:45:03Z)
GameEval: Evaluating LLMs on Conversational Games [93.40433639746331]
We propose GameEval, a novel approach to evaluating large language models (LLMs) GameEval treats LLMs as game players and assigns them distinct roles with specific goals achieved by launching conversations of various forms. We show that GameEval can effectively differentiate the capabilities of various LLMs, providing a comprehensive assessment of their integrated abilities to solve complex problems.
arXiv Detail & Related papers (2023-08-19T14:33:40Z)
Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess [4.793072503820555]
We present a transformer-based approach to behavioral stylometry in the context of chess. Our method operates in a few-shot classification framework, and can correctly identify a player from among thousands of candidate players. We consider more broadly what our resulting embeddings reveal about human style in chess, as well as the potential ethical implications.
arXiv Detail & Related papers (2022-08-02T11:18:16Z)
WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models [91.92346150646007]
In this work, we introduce WinoGAViL: an online game to collect vision-and-language associations. We use the game to collect 3.5K instances, finding that they are intuitive for humans but challenging for state-of-the-art AI models. Our analysis as well as the feedback we collect from players indicate that the collected associations require diverse reasoning skills.
arXiv Detail & Related papers (2022-07-25T23:57:44Z)
Action similarity judgment based on kinematic primitives [48.99831733355487]
We investigate to which extent a computational model based on kinematics can determine action similarity. The chosen model has its roots in developmental robotics and performs action classification based on learned kinematic primitives. The results show that both the model and human performance are highly accurate in an action similarity task based on kinematic-level features.
arXiv Detail & Related papers (2020-08-30T13:58:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.