Related papers: Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?

URL: http://arxiv.org/abs/2508.10637v1
Date: Thu, 14 Aug 2025 13:34:13 GMT
Title: Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Authors: Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos, Yuta Nakashima, Giorgos Tolias, Noa Garcia,
Abstract summary: Prior work has analyzed the robustness of visual encoders to image transformations and corruptions.<n>We take a different perspective by analyzing parameters of the image acquisition process and transformations that may be subtle or even imperceptible to the human eye.
Score: 28.34664538014526
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they introduce a form of distribution shift at test time, often leading to performance degradation. The primary focus has been on severe corruptions that, when applied aggressively, distort useful signals necessary for accurate semantic predictions. We take a different perspective by analyzing parameters of the image acquisition process and transformations that may be subtle or even imperceptible to the human eye. We find that such parameters are systematically encoded in the learned visual representations and can be easily recovered. More strikingly, their presence can have a profound impact, either positively or negatively, on semantic predictions. This effect depends on whether there is a strong correlation or anti-correlation between semantic labels and these acquisition-based or processing-based labels. Our code and data are available at: https://github.com/ryan-caesar-ramos/visual-encoder-traces

Related papers

Traces of Image Memorability in Vision Encoders: Activations, Attention Distributions and Autoencoder Losses [5.369009163979958]
This paper explores the correlates of image memorability in pretrained vision encoders.<n>We find that these features correlate with memorability to some extent.<n>Results shed light on the relationship between model-internal features and memorability.
arXiv Detail & Related papers (2025-09-01T13:11:59Z)
From Images to Perception: Emergence of Perceptual Properties by Reconstructing Images [1.77513002450736]
A bio-inspired architecture that can accommodate several known facts in the retina-V1 cortex, the PerceptNet, has been end-to-end optimized for different tasks related to image reconstruction.<n>Our results show that the encoder stage consistently exhibits the highest correlation with human perceptual judgments on image distortion.
arXiv Detail & Related papers (2025-08-14T08:37:30Z)
Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images. Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms. We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z)
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture. We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z)
UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT. Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z)
How do Variational Autoencoders Learn? Insights from Representational Similarity [2.969705152497174]
We study the internal behaviour of Variational Autoencoders (VAEs) using representational similarity techniques. Using the CKA and Procrustes similarities, we found that the encoders' representations are learned long before the decoders'.
arXiv Detail & Related papers (2022-05-17T14:31:57Z)
Causal Transportability for Visual Recognition [70.13627281087325]
We show that standard classifiers fail because the association between images and labels is not transportable across settings. We then show that the causal effect, which severs all sources of confounding, remains invariant across domains. This motivates us to develop an algorithm to estimate the causal effect for image classification.
arXiv Detail & Related papers (2022-04-26T15:02:11Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Dissecting Image Crops [22.482090207522358]
The elementary operation of cropping underpins nearly every computer vision system. This paper investigates the subtle traces introduced by this operation. We study how to detect these traces, and investigate the impact that cropping has on the image distribution.
arXiv Detail & Related papers (2020-11-24T01:33:47Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Improving Image Autoencoder Embeddings with Perceptual Loss [0.1529342790344802]
This work investigates perceptual loss from the perspective of encoder embeddings themselves. Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss. Results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10.
arXiv Detail & Related papers (2020-01-10T13:48:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.