For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives
- URL: http://arxiv.org/abs/2407.03268v1
- Date: Wed, 3 Jul 2024 16:57:38 GMT
- Title: For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives
- Authors: Lia Morra, Antonio Santangelo, Pietro Basci, Luca Piano, Fabio Garcea, Fabrizio Lamberti, Massimo Leone,
- Abstract summary: This work presents FRESCO, a framework designed to explore the socio-cultural implications of images on social media platforms at scale.
FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques.
The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer.
- Score: 3.418398936676879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework's output that serves as a reliable measure of similarity in image content.
Related papers
- Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision
Transformers for High-Level Image Classification [0.1843404256219181]
We leverage situated perceptual knowledge of cultural images to enhance performance and interpretability in AC image classification.
This resource captures situated perceptual semantics gleaned from over 14,000 cultural images labeled with ACs.
We demonstrate the synergy and complementarity between KGE embeddings' situated perceptual knowledge and deep visual model's sensory-perceptual understanding for AC image classification.
arXiv Detail & Related papers (2024-02-29T16:46:48Z) - Semiotics Networks Representing Perceptual Inference [0.0]
We present a computational model designed to track and simulate the perception of objects.
Our model is not limited to persons and can be applied to any system featuring a loop involving the processing from "internal" to "external" representations.
arXiv Detail & Related papers (2023-10-08T16:05:17Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - Spotlight Attention: Robust Object-Centric Learning With a Spatial
Locality Prior [88.9319150230121]
Object-centric vision aims to construct an explicit representation of the objects in a scene.
We incorporate a spatial-locality prior into state-of-the-art object-centric vision models.
We obtain significant improvements in segmenting objects in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-31T04:35:50Z) - On Human Visual Contrast Sensitivity and Machine Vision Robustness: A
Comparative Study [68.41864523774164]
How color differences affect machine vision has not been well explored.
Our work tries to bridge this gap between the human color vision aspect of visual recognition and that of the machine.
We devise a new framework in two dimensions to perform extensive analyses on the effect of color contrast and corrupted images.
arXiv Detail & Related papers (2022-12-16T18:51:41Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal
Frames [1.4502611532302037]
Social concepts referring to non-physical objects are powerful tools to describe, index, and query the content of visual data.
We propose a software approach to represent social concepts as multimodal frames, by integrating multisensory data.
Our method focuses on the extraction, analysis, and integration of multimodal features from visual art material tagged with the concepts of interest.
arXiv Detail & Related papers (2021-10-14T14:50:22Z) - Visual resemblance and communicative context constrain the emergence of
graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world.
Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)?
Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.