Neural Additive Image Model: Interpretation through Interpolation
- URL: http://arxiv.org/abs/2405.02295v1
- Date: Wed, 6 Mar 2024 16:46:07 GMT
- Title: Neural Additive Image Model: Interpretation through Interpolation
- Authors: Arik Reuter, Anton Thielmann, Benjamin Saefken,
- Abstract summary: We propose a holistic modeling approach utilizing Neural Additive Models and Diffusion Autoencoders.
We demonstrate that the proposed method can precisely identify complex image effects in an ablation study.
To further showcase the practical applicability of our proposed model, we conduct a case study in which we investigate how the distinctive features and attributes captured within host images exert influence on the pricing of Airbnb rentals.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how images influence the world, interpreting which effects their semantics have on various quantities and exploring the reasons behind changes in image-based predictions are highly difficult yet extremely interesting problems. By adopting a holistic modeling approach utilizing Neural Additive Models in combination with Diffusion Autoencoders, we can effectively identify the latent hidden semantics of image effects and achieve full intelligibility of additional tabular effects. Our approach offers a high degree of flexibility, empowering us to comprehensively explore the impact of various image characteristics. We demonstrate that the proposed method can precisely identify complex image effects in an ablation study. To further showcase the practical applicability of our proposed model, we conduct a case study in which we investigate how the distinctive features and attributes captured within host images exert influence on the pricing of Airbnb rentals.
Related papers
- Data Augmentation via Latent Diffusion for Saliency Prediction [67.88936624546076]
Saliency prediction models are constrained by the limited diversity and quantity of labeled data.
We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes.
arXiv Detail & Related papers (2024-09-11T14:36:24Z) - DiffusionPID: Interpreting Diffusion via Partial Information Decomposition [24.83767778658948]
We apply information-theoretic principles to decompose the input text prompt into its elementary components.
We analyze how individual tokens and their interactions shape the generated image.
We show that PID is a potent tool for evaluating and diagnosing text-to-image diffusion models.
arXiv Detail & Related papers (2024-06-07T18:17:17Z) - Intrinsic Image Diffusion for Indoor Single-view Material Estimation [55.276815106443976]
We present Intrinsic Image Diffusion, a generative model for appearance decomposition of indoor scenes.
Given a single input view, we sample multiple possible material explanations represented as albedo, roughness, and metallic maps.
Our method produces significantly sharper, more consistent, and more detailed materials, outperforming state-of-the-art methods by $1.5dB$ on PSNR and by $45%$ better FID score on albedo prediction.
arXiv Detail & Related papers (2023-12-19T15:56:19Z) - Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images.
Our model learns to iteratively attend to regions of the image relevant for classification.
We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z) - Impressions: Understanding Visual Semiotics and Aesthetic Impact [66.40617566253404]
We present Impressions, a novel dataset through which to investigate the semiotics of images.
We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images.
This dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
arXiv Detail & Related papers (2023-10-27T04:30:18Z) - Causality-Driven One-Shot Learning for Prostate Cancer Grading from MRI [1.049712834719005]
We present a novel method to automatically classify medical images that learns and leverages weak causal signals in the image.
Our framework consists of a convolutional neural network backbone and a causality-extractor module.
Our findings show that causal relationships among features play a crucial role in enhancing the model's ability to discern relevant information.
arXiv Detail & Related papers (2023-09-19T16:08:33Z) - Exploiting Causality Signals in Medical Images: A Pilot Study with
Empirical Results [1.2400966570867322]
We present a novel technique to discover and exploit weak causal signals directly from images via neural networks for classification purposes.
This way, we model how the presence of a feature in one part of the image affects the appearance of another feature in a different part of the image.
Our method consists of a convolutional neural network backbone and a causality-factors extractor module, which computes weights to enhance each feature map according to its causal influence in the scene.
arXiv Detail & Related papers (2023-09-19T08:00:26Z) - Deep Collaborative Multi-Modal Learning for Unsupervised Kinship
Estimation [53.62256887837659]
Kinship verification is a long-standing research challenge in computer vision.
We propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties.
Our DCML method is always superior to some state-of-the-art kinship verification methods.
arXiv Detail & Related papers (2021-09-07T01:34:51Z) - What Image Features Boost Housing Market Predictions? [81.32205133298254]
We propose a set of techniques for the extraction of visual features for efficient numerical inclusion in predictive algorithms.
We discuss techniques such as Shannon's entropy, calculating the center of gravity, employing image segmentation, and using Convolutional Neural Networks.
The set of 40 image features selected here carries a significant amount of predictive power and outperforms some of the strongest metadata predictors.
arXiv Detail & Related papers (2021-07-15T06:32:10Z) - Proactive Pseudo-Intervention: Causally Informed Contrastive Learning
For Interpretable Vision Models [103.64435911083432]
We present a novel contrastive learning strategy called it Proactive Pseudo-Intervention (PPI)
PPI leverages proactive interventions to guard against image features with no causal relevance.
We also devise a novel causally informed salience mapping module to identify key image pixels to intervene, and show it greatly facilitates model interpretability.
arXiv Detail & Related papers (2020-12-06T20:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.