Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
- URL: http://arxiv.org/abs/2407.13363v2
- Date: Tue, 3 Sep 2024 05:10:50 GMT
- Title: Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
- Authors: Chang Liu, Giulia Rizzoli, Pietro Zanuttigh, Fu Li, Yi Niu,
- Abstract summary: We argue that widely available web images can also be considered for the learning of new classes.
To our knowledge, this is the first work to rely solely on web images for both the learning of new concepts and the preservation of the already learned ones.
- Score: 33.955384040748946
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current weakly-supervised incremental learning for semantic segmentation (WILSS) approaches only consider replacing pixel-level annotations with image-level labels, while the training images are still from well-designed datasets. In this work, we argue that widely available web images can also be considered for the learning of new classes. To achieve this, firstly we introduce a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator. Then, an effective caption-driven reharsal strategy is proposed to preserve previously learnt classes. To our knowledge, this is the first work to rely solely on web images for both the learning of new concepts and the preservation of the already learned ones in WILSS. Experimental results show that the proposed approach can reach state-of-the-art performances without using manually selected and annotated data in the incremental steps.
Related papers
- Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - Object discovery and representation networks [78.16003886427885]
We propose a self-supervised learning paradigm that discovers the structure encoded in priors by itself.
Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision.
arXiv Detail & Related papers (2022-03-16T17:42:55Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic
Segmentation [40.27705176115985]
Few-shot semantic segmentation addresses the learning task in which only few images with ground truth pixel-level labels are available for the novel classes of interest.
We propose a novel meta-learning framework, which predicts pseudo pixel-level segmentation masks from a limited amount of data and their semantic labels.
Our proposed learning model can be viewed as a pixel-level meta-learner.
arXiv Detail & Related papers (2021-11-02T08:28:11Z) - WEDGE: Web-Image Assisted Domain Generalization for Semantic
Segmentation [72.88657378658549]
We propose a WEb-image assisted Domain GEneralization scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation.
We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training.
arXiv Detail & Related papers (2021-09-29T05:19:58Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Few-Shot Semantic Segmentation Augmented with Image-Level Weak
Annotations [23.02986307143718]
Recent progress in fewshot semantic segmentation tackles the issue by only a few pixel-level annotated examples.
Our key idea is to learn a better prototype representation of the class by fusing the knowledge from the image-level labeled data.
We propose a new framework, called PAIA, to learn the class prototype representation in a metric space by integrating image-level annotations.
arXiv Detail & Related papers (2020-07-03T04:58:20Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.