Related papers: Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning

Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning

URL: http://arxiv.org/abs/2502.01183v1
Date: Mon, 03 Feb 2025 09:18:03 GMT
Title: Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning
Authors: Qianyu Guo, Jingrong Wu, Tianxing Wu, Haofen Wang, Weifeng Ge, Wenqiang Zhang,
Abstract summary: Few-shot learning has been extensively utilized to overcome the scarcity of training data in domain-specific visual recognition.<n>In real-world scenarios, environmental factors such as complex backgrounds, varying lighting conditions, long-distance shooting, and moving targets often cause test images to exhibit numerous incomplete targets or noise disruptions.<n>We propose a novel conditional representation learning network (CRLNet) that integrates the interactions between training and testing images as conditional information in their respective representation processes.
Score: 27.549889991320203
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot learning (FSL) has recently been extensively utilized to overcome the scarcity of training data in domain-specific visual recognition. In real-world scenarios, environmental factors such as complex backgrounds, varying lighting conditions, long-distance shooting, and moving targets often cause test images to exhibit numerous incomplete targets or noise disruptions. However, current research on evaluation datasets and methodologies has largely ignored the concept of "environmental robustness", which refers to maintaining consistent performance in complex and diverse physical environments. This neglect has led to a notable decline in the performance of FSL models during practical testing compared to their training performance. To bridge this gap, we introduce a new real-world multi-domain few-shot learning (RD-FSL) benchmark, which includes four domains and six evaluation datasets. The test images in this benchmark feature various challenging elements, such as camouflaged objects, small targets, and blurriness. Our evaluation experiments reveal that existing methods struggle to utilize training images effectively to generate accurate feature representations for challenging test images. To address this problem, we propose a novel conditional representation learning network (CRLNet) that integrates the interactions between training and testing images as conditional information in their respective representation processes. The main goal is to reduce intra-class variance or enhance inter-class variance at the feature representation level. Finally, comparative experiments reveal that CRLNet surpasses the current state-of-the-art methods, achieving performance improvements ranging from 6.83% to 16.98% across diverse settings and backbones. The source code and dataset are available at https://github.com/guoqianyu-alberta/Conditional-Representation-Learning.

Related papers

A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing Images [18.191222010916405]
We present a novel self-supervised method called PerA, which produces all-purpose Remote Sensing features through semantically Perfectly Aligned sample pairs.<n>Our framework provides high-quality features by ensuring consistency between teacher and student.<n>We collect an unlabeled pre-training dataset, which contains about 5 million RS images.
arXiv Detail & Related papers (2025-05-26T03:12:49Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities. We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details. We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations [4.680881326162484]
Contrastive instance discrimination methods outperform supervised learning in downstream tasks such as image classification and object detection. A common augmentation technique in contrastive learning is random cropping followed by resizing. We introduce LeOCLR, a framework that employs a novel instance discrimination approach and an adapted loss function.
arXiv Detail & Related papers (2024-03-11T15:33:32Z)
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation. Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z)
Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL) A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z)
Cap2Aug: Caption guided Image to Image data Augmentation [41.53127698828463]
Cap2Aug is an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts. We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model. This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples.
arXiv Detail & Related papers (2022-12-11T04:37:43Z)
Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection [20.761672725633936]
Training deep learning-based change detection model heavily depends on labeled data. Recent trend is using remote sensing (RS) data to obtain in-domain representations via supervised or self-supervised learning (SSL) We propose dense semantic-aware pre-training for RS image CD via sampling multiple class-balanced points.
arXiv Detail & Related papers (2022-05-27T06:08:33Z)
On Efficient Transformer and Image Pre-training for Low-level Vision [74.22436001426517]
Pre-training has marked numerous state of the arts in high-level computer vision. We present an in-depth study of image pre-training. We find pre-training plays strikingly different roles in low-level tasks.
arXiv Detail & Related papers (2021-12-19T15:50:48Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Region Comparison Network for Interpretable Few-shot Image Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes. We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works. We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
Augmented Bi-path Network for Few-shot Learning [16.353228724916505]
We propose Augmented Bi-path Network (ABNet) for learning to compare both global and local features on multi-scales. Specifically, the salient patches are extracted and embedded as the local features for every image. Then, the model learns to augment the features for better robustness.
arXiv Detail & Related papers (2020-07-15T11:13:38Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.