Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise
- URL: http://arxiv.org/abs/2501.15890v3
- Date: Thu, 20 Mar 2025 12:06:51 GMT
- Title: Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise
- Authors: Karahan Sarıtaş, Peter Dayan, Tingke Shen, Surabhi S Nath,
- Abstract summary: Understanding how humans perceive visual complexity is a key area of study in visual cognition.<n>Modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases.<n>Our model improves predictive performance while maintaining interpretability, offering deeper insights into how visual complexity is perceived and assessed.
- Score: 6.324765782436764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how humans perceive visual complexity is a key area of study in visual cognition. Previous approaches to modeling visual complexity assessments have often resulted in intricate, difficult-to-interpret algorithms that employ numerous features or sophisticated deep learning architectures. While these complex models achieve high performance on specific datasets, they often sacrifice interpretability, making it challenging to understand the factors driving human perception of complexity. Recently (Shen, et al. 2024) proposed an interpretable segmentation-based model that accurately predicted complexity across various datasets, supporting the idea that complexity can be explained simply. In this work, we investigate the failure of their model to capture structural, color and surprisal contributions to complexity. To this end, we propose Multi-Scale Sobel Gradient (MSG) which measures spatial intensity variations, Multi-Scale Unique Color (MUC) which quantifies colorfulness across multiple scales, and surprise scores generated using a Large Language Model. We test our features on existing benchmarks and a novel dataset (Surprising Visual Genome) containing surprising images from Visual Genome. Our experiments demonstrate that modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases. Our model improves predictive performance while maintaining interpretability, offering deeper insights into how visual complexity is perceived and assessed. Our code, analysis and data are available at https://github.com/Complexity-Project/Complexity-in-Complexity.
Related papers
- Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization [56.17811386955609]
Graph-structured challenges are inherently difficult due to their nonlinear and intricate nature.<n>In this study, we propose transforming graphs into images to preserve their higher-order structural features accurately.<n>By combining the innovative paradigm powered by multimodal large language models with simple search techniques, we aim to develop a novel and effective framework.
arXiv Detail & Related papers (2025-01-21T08:28:10Z) - Multi-scale structural complexity as a quantitative measure of visual complexity [1.3499500088995464]
We suggest adopting the multi-scale structural complexity (MSSC) measure, an approach that defines structural complexity of an object as the amount of dissimilarities between distinct scales in its hierarchical organization.
We demonstrate that MSSC correlates with subjective complexity on par with other computational complexity measures, while being more intuitive by definition, consistent across categories of images, and easier to compute.
arXiv Detail & Related papers (2024-08-07T20:26:35Z) - Understanding Visual Feature Reliance through the Lens of Complexity [14.282243225622093]
We introduce a new metric for quantifying feature complexity, based on $mathscrV$-information.
We analyze the complexities of 10,000 features, represented as directions in the penultimate layer, that were extracted from a standard ImageNet-trained vision model.
arXiv Detail & Related papers (2024-07-08T16:21:53Z) - Meta Operator for Complex Query Answering on Knowledge Graphs [58.340159346749964]
We argue that different logical operator types, rather than the different complex query types, are the key to improving generalizability.
We propose a meta-learning algorithm to learn the meta-operators with limited data and adapt them to different instances of operators under various complex queries.
Empirical results show that learning meta-operators is more effective than learning original CQA or meta-CQA models.
arXiv Detail & Related papers (2024-03-15T08:54:25Z) - Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models [6.324765782436764]
We propose to model complexity using segment-based representations of images.
We find that complexity is well-explained by a simple linear model with these two features across six diverse image-sets.
arXiv Detail & Related papers (2024-03-05T17:21:31Z) - Inferring Local Structure from Pairwise Correlations [0.0]
We show that pairwise correlations provide enough information to recover local relations.
This proves to be successful even though higher order interaction structures are present in our data.
arXiv Detail & Related papers (2023-05-07T22:38:29Z) - On the Complexity of Bayesian Generalization [141.21610899086392]
We consider concept generalization at a large scale in the diverse and natural visual spectrum.
We study two modes when the problem space scales up, and the $complexity$ of concepts becomes diverse.
arXiv Detail & Related papers (2022-11-20T17:21:37Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Structural Landmarking and Interaction Modelling: on Resolution Dilemmas
in Graph Classification [50.83222170524406]
We study the intrinsic difficulty in graph classification under the unified concept of resolution dilemmas''
We propose SLIM'', an inductive neural network model for Structural Landmarking and Interaction Modelling.
arXiv Detail & Related papers (2020-06-29T01:01:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.