Related papers: Synthesis and Perceptual Scaling of High Resolution Natural Images Using Stable Diffusion

Synthesis and Perceptual Scaling of High Resolution Natural Images Using Stable Diffusion

URL: http://arxiv.org/abs/2410.13034v1
Date: Wed, 16 Oct 2024 20:49:19 GMT
Title: Synthesis and Perceptual Scaling of High Resolution Natural Images Using Stable Diffusion
Authors: Leonardo Pettini, Carsten Bogler, Christian Doeller, John-Dylan Haynes,
Abstract summary: We develop a custom stimulus set of photorealistic images from six categories with 18 objects each. For each object we generated 10 graded variants that are ordered along a perceptual continuum. This image set is of interest for studies on visual perception, attention and short- and long-term memory.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural scenes are of key interest for visual perception. Previous work on natural scenes has frequently focused on collections of discrete images with considerable physical differences from stimulus to stimulus. For many purposes it would, however, be desirable to have sets of natural images that vary smoothly along a continuum (for example in order to measure quantitative properties such as thresholds or precisions). This problem has typically been addressed by morphing a source into a target image. However, this approach yields transitions between images that primarily follow their low-level physical features and that can be semantically unclear or ambiguous. Here, in contrast, we used a different approach (Stable Diffusion XL) to synthesise a custom stimulus set of photorealistic images that are characterized by gradual transitions where each image is a clearly interpretable but unique exemplar from the same category. We developed natural scene stimulus sets from six categories with 18 objects each. For each object we generated 10 graded variants that are ordered along a perceptual continuum. We validated the image set psychophysically in a large sample of participants, ensuring that stimuli for each exemplar have varying levels of perceptual confusability. This image set is of interest for studies on visual perception, attention and short- and long-term memory.

Related papers

MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models [73.20126092411776]
We conduct the first systematic study of hallucinations in multi-image MLLMs.<n>We propose MIHBench, a benchmark specifically tailored for evaluating object-related hallucinations across multiple images.<n>MIHBench comprises three core tasks: Multi-Image Object Existence Hallucination, Multi-Image Object Count Hallucination, and Object Identity Consistency Hallucination.
arXiv Detail & Related papers (2025-08-01T15:49:29Z)
Hidden Bias in the Machine: Stereotypes in Text-to-Image Models [0.0]
Text-to-Image (T2I) models have transformed visual content creation, producing highly realistic images from natural language prompts.<n>We curated a diverse set of prompts spanning thematic categories such as occupations, traits, actions, ideologies, emotions, family roles, place descriptions, spirituality, and life events.<n>For each of the 160 unique topics, we crafted multiple prompt variations to reflect a wide range of meanings and perspectives.<n>Our analysis reveals significant disparities in the representation of gender, race, age, somatotype, and other human-centric factors across generated images.
arXiv Detail & Related papers (2025-06-09T23:06:04Z)
Image Segmentation via Divisive Normalization: dealing with environmental diversity [0.8796261172196743]
We put segmentation U-nets augmented with Divisive Normalization to work far from training conditions. We categorize scenes according to their radiance level and dynamic range (day/night), and according to their achromatic/chromatic contrasts. Results show that neural networks with Divisive Normalization get better results in all the scenarios.
arXiv Detail & Related papers (2024-07-25T07:38:27Z)
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning [71.14084801851381]
Change captioning aims to succinctly describe the semantic change between a pair of similar images. Most existing methods directly capture the difference between them, which risk obtaining error-prone difference features. We propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations.
arXiv Detail & Related papers (2024-07-16T13:00:33Z)
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images [5.529078451095096]
understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. Our work presents StableSemantics, a dataset comprising 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps corresponding to individual noun chunks.
arXiv Detail & Related papers (2024-06-19T17:59:40Z)
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation [60.943159830780154]
We introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. We demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.
arXiv Detail & Related papers (2024-03-25T17:52:07Z)
Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images [34.02058539403381]
We leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection. A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images.
arXiv Detail & Related papers (2024-03-13T19:56:30Z)
Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes [4.518404103861656]
We study the nature of variation in visuo-linguistic signals, and find that they correlate with each other. Given this result, we hypothesize that variation stems partly from the properties of the images, and explore whether image representations encoded by pretrained vision encoders can capture such variation. Our results indicate that pretrained models do so to a weak-to-moderate degree, suggesting that the models lack biases about what makes a stimulus complex for humans and what leads to variations in human outputs.
arXiv Detail & Related papers (2024-02-02T12:11:16Z)
Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion [6.491645162078057]
Text-to-image (TTI) systems have made it possible to create realistic images with simple text prompts. In all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept.
arXiv Detail & Related papers (2023-10-31T18:05:15Z)
Cones 2: Customizable Image Synthesis with Multiple Subjects [50.54010141032032]
We study how to efficiently represent a particular subject as well as how to appropriately compose different subjects. By rectifying the activations in the cross-attention map, the layout appoints and separates the location of different subjects in the image.
arXiv Detail & Related papers (2023-05-30T18:00:06Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring. We present a differentiable reblur model for self-supervised motion deblurring. Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.