Understanding the Cognitive Complexity in Language Elicited by Product Images
- URL: http://arxiv.org/abs/2409.16521v1
- Date: Wed, 25 Sep 2024 00:26:11 GMT
- Title: Understanding the Cognitive Complexity in Language Elicited by Product Images
- Authors: Yan-Ying Chen, Shabnam Hakimi, Monica Van, Francine Chen, Matthew Hong, Matt Klenk, Charlene Wu,
- Abstract summary: This work offers an approach for measuring and validating the cognitive complexity of human language elicited by product images.
We introduce a large dataset that includes diverse descriptive labels for product images, including human-rated complexity.
We demonstrate that human-rated cognitive complexity can be approximated using a set of natural language models.
- Score: 4.420255770397967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Product images (e.g., a phone) can be used to elicit a diverse set of consumer-reported features expressed through language, including surface-level perceptual attributes (e.g., "white") and more complex ones, like perceived utility (e.g., "battery"). The cognitive complexity of elicited language reveals the nature of cognitive processes and the context required to understand them; cognitive complexity also predicts consumers' subsequent choices. This work offers an approach for measuring and validating the cognitive complexity of human language elicited by product images, providing a tool for understanding the cognitive processes of human as well as virtual respondents simulated by Large Language Models (LLMs). We also introduce a large dataset that includes diverse descriptive labels for product images, including human-rated complexity. We demonstrate that human-rated cognitive complexity can be approximated using a set of natural language models that, combined, roughly capture the complexity construct. Moreover, this approach is minimally supervised and scalable, even in use cases with limited human assessment of complexity.
Related papers
- Multi-scale structural complexity as a quantitative measure of visual complexity [1.3499500088995464]
We suggest adopting the multi-scale structural complexity (MSSC) measure, an approach that defines structural complexity of an object as the amount of dissimilarities between distinct scales in its hierarchical organization.
We demonstrate that MSSC correlates with subjective complexity on par with other computational complexity measures, while being more intuitive by definition, consistent across categories of images, and easier to compute.
arXiv Detail & Related papers (2024-08-07T20:26:35Z) - Detecting Any Human-Object Interaction Relationship: Universal HOI
Detector with Spatial Prompt Learning on Foundation Models [55.20626448358655]
This study explores the universal interaction recognition in an open-world setting through the use of Vision-Language (VL) foundation models and large language models (LLMs)
Our design includes an HO Prompt-guided Decoder (HOPD), facilitates the association of high-level relation representations in the foundation model with various HO pairs within the image.
For open-category interaction recognition, our method supports either of two input types: interaction phrase or interpretive sentence.
arXiv Detail & Related papers (2023-11-07T08:27:32Z) - SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal
Scene Understanding [0.0]
We introduce SNeL (Structured Neuro-symbolic Language), a versatile query language designed to facilitate nuanced interactions with neural networks processing multimodal data.
SNeL's expressive interface enables the construction of intricate queries, supporting logical and arithmetic operators, comparators, nesting, and more.
Our evaluations demonstrate SNeL's potential to reshape the way we interact with complex neural networks.
arXiv Detail & Related papers (2023-06-09T17:01:51Z) - Natural Language Decomposition and Interpretation of Complex Utterances [47.30126929007346]
We introduce an approach to handle complex-intent-bearing utterances from a user via a process of hierarchical natural language decomposition.
Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of simpler natural language steps.
Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data.
arXiv Detail & Related papers (2023-05-15T14:35:00Z) - Cross-Lingual Transfer of Cognitive Processing Complexity [11.939409227407769]
We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity.
We show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages.
arXiv Detail & Related papers (2023-02-24T15:48:23Z) - LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations.
Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - Natural Language Rationales with Full-Stack Visual Reasoning: From
Pixels to Semantic Frames to Commonsense Graphs [106.15931418425906]
We present the first study focused on generating natural language rationales across several complex visual reasoning tasks.
We present RationaleVT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs.
Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.
arXiv Detail & Related papers (2020-10-15T05:08:56Z) - Human-like general language processing [0.6510507449705342]
We propose a human-like general language processing architecture, which contains sensorimotor, association, and cognitive systems.
The HGLP network learns from easy to hard like a child, understands word meaning by coactivating multimodal neurons, comprehends and generates sentences by real-time constructing a virtual world model.
HGLP rapidly learned 10+ different tasks including object recognition, sentence comprehension, imagination, attention control, query, inference, motion judgement, mixed arithmetic operation, digit tracing and writing, and human-like iterative thinking process guided by language.
arXiv Detail & Related papers (2020-05-19T02:44:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.