Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
- URL: http://arxiv.org/abs/2404.16637v1
- Date: Thu, 25 Apr 2024 14:24:41 GMT
- Title: Zero-Shot Distillation for Image Encoders: How to Make Effective Use of Synthetic Data
- Authors: Niclas Popp, Jan Hendrik Metzen, Matthias Hein,
- Abstract summary: We focus on training smaller variants of the image encoder, which suffices for efficient zero-shot classification.
The use of synthetic data has shown promise in distilling representations from larger teachers, resulting in strong few-shot and linear probe performance.
We find that this approach surprisingly fails in true zero-shot settings when using contrastive losses.
- Score: 40.37396692278567
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Multi-modal foundation models such as CLIP have showcased impressive zero-shot capabilities. However, their applicability in resource-constrained environments is limited due to their large number of parameters and high inference time. While existing approaches have scaled down the entire CLIP architecture, we focus on training smaller variants of the image encoder, which suffices for efficient zero-shot classification. The use of synthetic data has shown promise in distilling representations from larger teachers, resulting in strong few-shot and linear probe performance. However, we find that this approach surprisingly fails in true zero-shot settings when using contrastive losses. We identify the exploitation of spurious features as being responsible for poor generalization between synthetic and real data. However, by using the image feature-based L2 distillation loss, we mitigate these problems and train students that achieve zero-shot performance which on four domain-specific datasets is on-par with a ViT-B/32 teacher model trained on DataCompXL, while featuring up to 92% fewer parameters.
Related papers
- Data-Efficient Generation for Dataset Distillation [12.106527496044473]
We train a conditional latent diffusion model capable of generating realistic synthetic images with labels.
We demonstrate that models can be effectively trained using only a small set of synthetic images and evaluated on a large real test set.
arXiv Detail & Related papers (2024-09-05T22:31:53Z) - Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge.
Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently.
Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z) - DataDAM: Efficient Dataset Distillation with Attention Matching [15.300968899043498]
Researchers have long tried to minimize training costs in deep learning by maintaining strong generalization across diverse datasets.
Emerging research on dataset aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset.
However, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data.
arXiv Detail & Related papers (2023-09-29T19:07:48Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Tensor feature hallucination for few-shot learning [17.381648488344222]
Few-shot classification addresses the challenge of classifying examples given limited supervision and limited data.
Previous works on synthetic data generation for few-shot classification focus on exploiting complex models.
We investigate how a simple and straightforward synthetic data generation method can be used effectively.
arXiv Detail & Related papers (2021-06-09T18:25:08Z) - Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality.
Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data.
We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z) - Supervised Contrastive Learning [42.27949000093086]
We extend the self-supervised batch contrastive approach to the fully-supervised setting.
We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss.
On ResNet-200, we achieve top-1 accuracy of 81.4% on the ImageNet dataset, which is 0.8% above the best number reported for this architecture.
arXiv Detail & Related papers (2020-04-23T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.