Related papers: PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers

URL: http://arxiv.org/abs/2506.14842v1
Date: Mon, 16 Jun 2025 08:57:03 GMT
Title: PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers
Authors: Lukas Schiesser, Cornelius Wolff, Sophie Haas, Simon Pukrop,
Abstract summary: In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC)<n>We present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) has emerged as a promising paradigm for few-shot image classification (FSIC), enabling models to generalize across domains without gradient-based adaptation. However, prior work has largely overlooked a critical component of ICL-based FSIC pipelines: the role of image embeddings. In this work, we present PictSure, an ICL framework that places the embedding model -- its architecture, pretraining, and training dynamics -- at the center of analysis. We systematically examine the effects of different visual encoder types, pretraining objectives, and fine-tuning strategies on downstream FSIC performance. Our experiments show that the training success and the out-of-domain performance are highly dependent on how the embedding models are pretrained. Consequently, PictSure manages to outperform existing ICL-based FSIC models on out-of-domain benchmarks that differ significantly from the training distribution, while maintaining comparable results on in-domain tasks. Code can be found at https://github.com/PictSure/pictsure-library.

Related papers

High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models. To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence. Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z)
Physically Feasible Semantic Segmentation [58.17907376475596]
State-of-the-art semantic segmentation models are typically optimized in a data-driven fashion, minimizing solely per-pixel or per-segment classification objectives on their training data.<n>This purely data-driven paradigm often leads to absurd segmentations, especially when the domain of input images is shifted from the one encountered during training.<n>Our method, Physically Feasible Semantic (PhyFea), first extracts explicit constraints that govern spatial class relations from the semantic segmentation training set at hand in an offline data-driven fashion, and then enforces a morphological yet differentiable loss that penalizes violations of these constraints during
arXiv Detail & Related papers (2024-08-26T22:39:08Z)
HazeCLIP: Towards Language Guided Real-World Image Dehazing [62.4454483961341]
Existing methods have achieved remarkable performance in image dehazing, particularly on synthetic datasets.<n>This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks.
arXiv Detail & Related papers (2024-07-18T17:18:25Z)
Memory-guided Network with Uncertainty-based Feature Augmentation for Few-shot Semantic Segmentation [12.653336728447654]
We propose a class-shared memory (CSM) module consisting of a set of learnable memory vectors. These memory vectors learn elemental object patterns from base classes during training whilst re-encoding query features during both training and inference. We integrate CSM and UFA into representative FSS works, with experimental results on the widely-used PASCAL-5$i$ and COCO-20$i$ datasets.
arXiv Detail & Related papers (2024-06-01T19:53:25Z)
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation [19.20874993309959]
vision-language foundation models, such as CLIP, have showcased remarkable effectiveness in numerous zero-shot image-level tasks. We propose a baseline for training-free OVSS, termed Neighbour-Aware CLIP (NACLIP) Our method enforces localization of patches in the self-attention of CLIP's vision transformer which, despite being crucial for dense prediction tasks, has been overlooked in the OVSS literature.
arXiv Detail & Related papers (2024-04-12T01:08:04Z)
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation [59.00092709848619]
Placing Objects in Context (POC) is a pipeline to realistically add objects to an image. POC can be used to extend any dataset with an arbitrary number of objects. We present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods.
arXiv Detail & Related papers (2024-02-26T08:32:41Z)
Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models. We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z)
In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification [5.323049242720532]
Self-supervised learning has emerged as a promising approach for remote sensing image classification. We present a study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets.
arXiv Detail & Related papers (2023-07-04T10:57:52Z)
Foundational Models for Continual Learning: An Empirical Study of Latent Replay [17.322679682451597]
We study the efficacy of pre-trained vision models as a foundation for downstream continual learning scenarios. We compare efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space.
arXiv Detail & Related papers (2022-04-30T19:11:37Z)
On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision. We present an in-depth study of image pre-training. We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z)
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process. We propose a novel UDA method, DAFormer, based on the benchmark results. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z)
Benchmarking the Robustness of Instance Segmentation Models [7.1699725781322465]
This paper presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions and out-of-domain image collections.<n>We find that group normalization enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top.<n>We also find that single-stage detectors do not generalize well to larger image resolutions than their training size.
arXiv Detail & Related papers (2021-09-02T17:50:07Z)
Explanation-Guided Training for Cross-Domain Few-Shot Classification [96.12873073444091]
Cross-domain few-shot classification task (CD-FSC) combines few-shot classification with the requirement to generalize across domains represented by datasets. We introduce a novel training approach for existing FSC models. We show that explanation-guided training effectively improves the model generalization.
arXiv Detail & Related papers (2020-07-17T07:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.