Class-Incremental Exemplar Compression for Class-Incremental Learning
- URL: http://arxiv.org/abs/2303.14042v2
- Date: Sat, 8 Apr 2023 03:25:41 GMT
- Title: Class-Incremental Exemplar Compression for Class-Incremental Learning
- Authors: Zilin Luo, Yaoyao Liu, Bernt Schiele, Qianru Sun
- Abstract summary: We propose an adaptive mask generation model called class-incremental masking (CIM)
We conduct experiments on high-resolution CIL benchmarks including Food-101, ImageNet-100, and ImageNet-1000.
We show that using the compressed exemplars by CIM can achieve a new state-of-the-art CIL accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase ImageNet-1000.
- Score: 90.93462714376078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exemplar-based class-incremental learning (CIL) finetunes the model with all
samples of new classes but few-shot exemplars of old classes in each
incremental phase, where the "few-shot" abides by the limited memory budget. In
this paper, we break this "few-shot" limit based on a simple yet surprisingly
effective idea: compressing exemplars by downsampling non-discriminative pixels
and saving "many-shot" compressed exemplars in the memory. Without needing any
manual annotation, we achieve this compression by generating 0-1 masks on
discriminative pixels from class activation maps (CAM). We propose an adaptive
mask generation model called class-incremental masking (CIM) to explicitly
resolve two difficulties of using CAM: 1) transforming the heatmaps of CAM to
0-1 masks with an arbitrary threshold leads to a trade-off between the coverage
on discriminative pixels and the quantity of exemplars, as the total memory is
fixed; and 2) optimal thresholds vary for different object classes, which is
particularly obvious in the dynamic environment of CIL. We optimize the CIM
model alternatively with the conventional CIL model through a bilevel
optimization problem. We conduct extensive experiments on high-resolution CIL
benchmarks including Food-101, ImageNet-100, and ImageNet-1000, and show that
using the compressed exemplars by CIM can achieve a new state-of-the-art CIL
accuracy, e.g., 4.8 percentage points higher than FOSTER on 10-Phase
ImageNet-1000. Our code is available at https://github.com/xfflzl/CIM-CIL.
Related papers
- Quantization-free Lossy Image Compression Using Integer Matrix Factorization [8.009813033356478]
We introduce a variant of integer matrix factorization (IMF) to develop a novel quantization-free lossy image compression method.
IMF provides a low-rank representation of the image data as a product of two smaller factor matrices with bounded integer elements.
Our method consistently outperforms JPEG at low bit rates below 0.25 bits per pixel (bpp) and remains comparable at higher bit rates.
arXiv Detail & Related papers (2024-08-22T19:08:08Z) - Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge.
Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently.
Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z) - Learning Mask-aware CLIP Representations for Zero-Shot Segmentation [120.97144647340588]
Mask-awareProposals CLIP (IP-CLIP) is proposed to handle arbitrary numbers of image and mask proposals simultaneously.
mask-aware loss and self-distillation loss are designed to fine-tune IP-CLIP, ensuring CLIP is responsive to different mask proposals.
We conduct extensive experiments on the popular zero-shot benchmarks.
arXiv Detail & Related papers (2023-09-30T03:27:31Z) - Image Compression with Product Quantized Masked Image Modeling [44.15706119017024]
Recent neural compression methods have been based on the popular hyperprior framework.
It relies on Scalar Quantization and offers a very strong compression performance.
This contrasts from recent advances in image generation and representation learning, where Vector Quantization is more commonly employed.
arXiv Detail & Related papers (2022-12-14T17:50:39Z) - Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks.
We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts.
We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z) - BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction.
We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches.
Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model
and Concatenated Residual Modules [22.818632387206257]
Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures.
We propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations.
In the encoding/decoding network design part, we propose a residual blocks (CRB) where multiple residual blocks are serially connected with additional shortcut connections.
arXiv Detail & Related papers (2021-07-14T02:54:22Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.