Related papers: Class-Incremental Exemplar Compression for Class-Incremental Learning

Class-Incremental Exemplar Compression for Class-Incremental Learning

URL: http://arxiv.org/abs/2303.14042v2
Date: Sat, 8 Apr 2023 03:25:41 GMT
Title: Class-Incremental Exemplar Compression for Class-Incremental Learning
Authors: Zilin Luo, Yaoyao Liu, Bernt Schiele, Qianru Sun
Abstract summary: We propose an adaptive mask generation model called class-incremental masking (CIM) We conduct experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000. We show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000.
Score: 90.93462714376078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Exemplar-based class-incremental learning (CIL) finetunes the model with all samples of new classes but few-shot exemplars of old classes in each incremental phase, where the "few-shot" abides by the limited memory budget. In this paper, we break this "few-shot" limit based on a simple yet surprisingly effective idea: compressing exemplars by downsampling non-discriminative pixels and saving "many-shot" compressed exemplars in the memory. Without needing any manual annotation, we achieve this compression by generating 0-1 masks on discriminative pixels from class activation maps (CAM). We propose an adaptive mask generation model called class-incremental masking (CIM) to explicitly resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to 0-1 masks with an arbitrary threshold leads to a trade-off between the coverage on discriminative pixels and the quantity of exemplars, as the total memory is fixed; and 2) optimal thresholds vary for different object classes, which is particularly obvious in the dynamic environment of CIL. We optimize the CIM model alternatively with the conventional CIL model through a bilevel optimization problem. We conduct extensive experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.

Related papers

Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation [27.795313102716726]
We introduce 1D binary image latents for compact discrete representation of images.<n>Our approach preserves high-resolution details while maintaining the compactness of 1D latents.<n>Our text-to-image models are the first to achieve competitive performance in both diffusion and auto-regressive generation.
arXiv Detail & Related papers (2025-06-26T05:48:36Z)
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification [65.46685389276443]
We ground our work on CLIP, a vision-language pre-trained encoder model that can perform zero-shot classification by matching an image with text prompts. We then formulate purification risk as the KL divergence between the joint distributions purification process. We propose two variants for our CLIPure approach: CLI-Diff which models the likelihood of images' latent vectors, and CLIPure-Cos which models the likelihood with the cosine similarity between the embeddings of an image and a photo of a.''
arXiv Detail & Related papers (2025-02-25T13:09:34Z)
Quantization-free Lossy Image Compression Using Integer Matrix Factorization [8.009813033356478]
We introduce a variant of integer matrix factorization (IMF) to develop a novel quantization-free lossy image compression method. IMF provides a low-rank representation of the image data as a product of two smaller factor matrices with bounded integer elements. Our method consistently outperforms JPEG at low bit rates below 0.25 bits per pixel (bpp) and remains comparable at higher bit rates.
arXiv Detail & Related papers (2024-08-22T19:08:08Z)
Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge. Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently. Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z)
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation [120.97144647340588]
Mask-awareProposals CLIP (IP-CLIP) is proposed to handle arbitrary numbers of image and mask proposals simultaneously. mask-aware loss and self-distillation loss are designed to fine-tune IP-CLIP, ensuring CLIP is responsive to different mask proposals. We conduct extensive experiments on the popular zero-shot benchmarks.
arXiv Detail & Related papers (2023-09-30T03:27:31Z)
Image Compression with Product Quantized Masked Image Modeling [44.15706119017024]
Recent neural compression methods have been based on the popular hyperprior framework. It relies on Scalar Quantization and offers a very strong compression performance. This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed.
arXiv Detail & Related papers (2022-12-14T17:50:39Z)
Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks. We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z)
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction. We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches. Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z)
Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images. MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z)
Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules [22.818632387206257]
Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. We propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations. In the encoding/decoding network design part, we propose a residual blocks (CRB) where multiple residual blocks are serially connected with additional shortcut connections.
arXiv Detail & Related papers (2021-07-14T02:54:22Z)
Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image. We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.