Related papers: Quantification and object perception in Multimodal Large Language Models deviate from human linguistic cognition

Quantification and object perception in Multimodal Large Language Models deviate from human linguistic cognition

URL: http://arxiv.org/abs/2511.08126v1
Date: Wed, 12 Nov 2025 01:41:14 GMT
Title: Quantification and object perception in Multimodal Large Language Models deviate from human linguistic cognition
Authors: Raquel Montero, Natalia Moskvina, Paolo Morosi, Tamara Serrano, Elena Pagliarini, Evelina Leivada,
Abstract summary: Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs)<n>This paper looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature.
Score: 0.12314765641075438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantification has been proven to be a particularly difficult linguistic phenomenon for (Multimodal) Large Language Models (MLLMs). However, given that quantification interfaces with the logic, pragmatic, and numerical domains, the exact reasons for the poor performance are still unclear. This papers looks at three key features of human quantification shared cross-linguistically that have remained so far unexplored in the (M)LLM literature: the ordering of quantifiers into scales, the ranges of use and prototypicality, and the biases inherent in the human approximate number system. The aim is to determine how these features are encoded in the models' architecture, how they may differ from humans, and whether the results are affected by the type of model and language under investigation. We find that there are clear differences between humans and MLLMs with respect to these features across various tasks that tap into the representation of quantification in vivo vs. in silico. This work, thus, paves the way for addressing the nature of MLLMs as semantic and pragmatic agents, while the cross-linguistic lens can elucidate whether their abilities are robust and stable across different languages.

Related papers

Benchmarking Concept-Spilling Across Languages in LLMs [7.577675422356702]
Large Language Models (LLMs) exhibit remarkable cross-lingual abilities, yet often exhibit a systematic bias toward representations from other languages.<n>This paper presents a novel comparative framework for evaluating multilingual semantic robustness by measuring how models handle polysemous words across languages.
arXiv Detail & Related papers (2026-01-18T19:28:26Z)
Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders [51.380449540006985]
Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear.<n>Do they form shared multilingual representations with language-specific decoding, and if so, why does performance still favor the dominant training language?<n>We analyze their internal mechanisms using cross-layer transcoders (CLT) and attribution graphs.
arXiv Detail & Related papers (2025-11-13T22:51:06Z)
Quantifier Scope Interpretation in Language Learners and LLMs [3.1478333653257367]
This study examines how large language models handle quantifier scope interpretation in English and Chinese.<n>Results reveal that most LLMs prefer the surface scope interpretations, aligning with human tendencies.<n>HS scores highlight variability in LLMs' approximation of human behavior, but their overall potential to align with humans is notable.
arXiv Detail & Related papers (2025-09-13T15:32:25Z)
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories.<n>These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives.<n>Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z)
Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility [7.183662547358301]
We examine whether large language models process language similarly to humans.<n>We find that some LLMs do quantitatively and qualitatively reflect human-like asymmetries between production and interpretation.
arXiv Detail & Related papers (2025-03-21T23:25:42Z)
Unnatural Languages Are Not Bugs but Features for LLMs [92.8332103170009]
Large Language Models (LLMs) have been observed to process non-human-readable text sequences, such as jailbreak prompts.<n>We present a systematic investigation challenging this perception, demonstrating that unnatural languages contain latent features usable by models.
arXiv Detail & Related papers (2025-03-02T12:10:17Z)
LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder [47.81850176849213]
We propose a framework for analyzing the linguistic mechanisms of large language models, based on Sparse Auto-Encoders (SAEs)<n>We extract a broad set of Chinese and English linguistic features across four dimensions (morphology, syntax, semantics, and pragmatics)<n>Our findings reveal intrinsic representations of linguistic knowledge in LLMs, uncover patterns of cross-layer and cross-lingual distribution, and demonstrate the potential to control model outputs.
arXiv Detail & Related papers (2025-02-27T18:16:47Z)
Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models [53.38288894305388]
Multilingual large language models (MLLMs) are able to leverage in-context learning (ICL) to achieve high performance by leveraging cross-lingual knowledge transfer without parameter updates.<n>Three key factors influence multilingual ICL: (1) semantic similarity, (2) linguistic alignment, and (3) language-specific performance.<n>We propose balanced multi-factor ICL (textbfBMF-ICL), a method that quantifies and optimally balances these factors for improved example selection.
arXiv Detail & Related papers (2025-02-17T06:56:33Z)
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages [15.203789021094982]
In large language models (LLMs), how are multiple languages learned and encoded?<n>We train sparse autoencoders on Llama-3-8B and Aya-23-8B, and demonstrate that abstract grammatical concepts are often encoded in feature directions shared across many languages.
arXiv Detail & Related papers (2025-01-10T21:18:21Z)
Evaluating Morphological Compositional Generalization in Large Language Models [17.507983593566223]
We investigate the morphological generalization abilities of large language models (LLMs) through the lens of compositionality.<n>We focus on agglutinative languages such as Turkish and Finnish.<n>Our analysis shows that LLMs struggle with morphological compositional generalization particularly when applied to novel word roots.<n>While models can identify individual morphological combinations better than chance, their performance lacks systematicity, leading to significant accuracy gaps compared to humans.
arXiv Detail & Related papers (2024-10-16T15:17:20Z)
High-Dimension Human Value Representation in Large Language Models [60.33033114185092]
We propose UniVaR, a high-dimensional neural representation of symbolic human value distributions in LLMs.<n>This is a continuous and scalable representation, self-supervised from the value-relevant output of 8 LLMs.<n>We explore how LLMs prioritize different values in 25 languages and cultures, shedding light on complex interplay between human values and language modeling.
arXiv Detail & Related papers (2024-04-11T16:39:00Z)
Naming, Describing, and Quantifying Visual Objects in Humans and LLMs [5.59181673439492]
We evaluate Vision & Language Large Language Models (VLLMs) on three categories (nouns, attributes, and quantifiers) We find mixed evidence on the ability of VLLMs to capture human naming preferences at generation time.
arXiv Detail & Related papers (2024-03-11T17:20:12Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.