An Exploratory Study on Abstract Images and Visual Representations Learned from Them
- URL: http://arxiv.org/abs/2509.14149v1
- Date: Wed, 17 Sep 2025 16:30:34 GMT
- Title: An Exploratory Study on Abstract Images and Visual Representations Learned from Them
- Authors: Haotian Li, Jianbo Jiao,
- Abstract summary: We study the reasons behind this performance gap and investigate how much high-level semantic content can be captured at different abstraction levels.<n>We then train and evaluate conventional vision systems on HAID across various tasks including classification, segmentation, and object detection.<n>We also discuss if the abstract image can be considered as a potentially effective format for conveying visual semantic information and contributing to vision tasks.
- Score: 27.912006389955945
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imagine living in a world composed solely of primitive shapes, could you still recognise familiar objects? Recent studies have shown that abstract images-constructed by primitive shapes-can indeed convey visual semantic information to deep learning models. However, representations obtained from such images often fall short compared to those derived from traditional raster images. In this paper, we study the reasons behind this performance gap and investigate how much high-level semantic content can be captured at different abstraction levels. To this end, we introduce the Hierarchical Abstraction Image Dataset (HAID), a novel data collection that comprises abstract images generated from normal raster images at multiple levels of abstraction. We then train and evaluate conventional vision systems on HAID across various tasks including classification, segmentation, and object detection, providing a comprehensive study between rasterised and abstract image representations. We also discuss if the abstract image can be considered as a potentially effective format for conveying visual semantic information and contributing to vision tasks.
Related papers
- What Makes a Maze Look Like a Maze? [92.80800000328277]
We introduce Deep Grounding (DSG), a framework that leverages explicit structured representations of visual abstractions for grounding and reasoning.<n>At the core of DSG are schemas--dependency graph descriptions of abstract concepts that decompose them into more primitive-level symbols.<n>We show that DSG significantly improves the abstract visual reasoning performance of vision-language models.
arXiv Detail & Related papers (2024-09-12T16:41:47Z) - An Image-based Typology for Visualization [23.716718517642878]
We derive an image-based typology of visualizations based on a qualitative analysis of visualization images.<n>The resulting image typology can serve a number of purposes.
arXiv Detail & Related papers (2024-03-07T04:33:42Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - Systematic Visual Reasoning through Object-Centric Relational
Abstraction [5.914610036560008]
We introduce OCRA, a model that extracts explicit representations of both objects and abstract relations.
It achieves strong systematic generalizations in tasks involving complex visual displays.
arXiv Detail & Related papers (2023-06-04T22:47:17Z) - Understanding Cross-modal Interactions in V&L Models that Generate Scene
Descriptions [3.7957452405531256]
This paper explores the potential of a state-of-the-art Vision and Language model, VinVL, to caption images at the scene level.
We show (3) that a small amount of curated data suffices to generate scene descriptions without losing the capability to identify object-level concepts in the scene.
We discuss the parallels between these results and insights from computational and cognitive science research on scene perception.
arXiv Detail & Related papers (2022-11-09T15:33:51Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Quantitative analysis of visual representation of sign elements in
COVID-19 context [2.9409535911474967]
We propose using computer analysis to perform a quantitative analysis of the elements used in the visual creations produced in reference to the epidemic.
The images compiled in The Covid Art Museum's Instagram account to analyze the different elements used to represent subjective experiences with regard to a global event.
This research reveals that the elements that are repeated in images to create narratives and the relations of association that are established in the sample.
arXiv Detail & Related papers (2021-12-15T15:54:53Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - SketchEmbedNet: Learning Novel Concepts by Imitating Drawings [125.45799722437478]
We explore properties of image representations learned by training a model to produce sketches of images.
We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting.
arXiv Detail & Related papers (2020-08-27T16:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.