Related papers: For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives

URL: http://arxiv.org/abs/2407.03268v1
Date: Wed, 3 Jul 2024 16:57:38 GMT
Title: For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives
Authors: Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, Massimo Leone,
Abstract summary: This work presents FRESCO, a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer.
Score: 3.418398936676879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content.

Related papers

When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Structuring Quantitative Image Analysis with Object Prominence [0.0]
We suggest carefully considering objects' prominence as an essential step in analyzing images as data. Our approach combines qualitative analyses with the scalability of quantitative approaches.
arXiv Detail & Related papers (2024-08-30T19:05:28Z)
Semiotics Networks Representing Perceptual Inference [0.0]
We present a computational model designed to track and simulate the perception of objects. Our model is not limited to persons and can be applied to any system featuring a loop involving the processing from "internal" to "external" representations.
arXiv Detail & Related papers (2023-10-08T16:05:17Z)
ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z)
Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior [88.9319150230121]
Object-centric vision aims to construct an explicit representation of the objects in a scene. We incorporate a spatial-locality prior into state-of-the-art object-centric vision models. We obtain significant improvements in segmenting objects in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-31T04:35:50Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data. We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data. Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z)
Visual resemblance and communicative context constrain the emergence of graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world. Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)? Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z)
Enhancing Social Relation Inference with Concise Interaction Graph and Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE) It concisely learns interactive features of persons and discriminative features of holistic scenes. PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.