Weakly Supervised Semantic Segmentation for Driving Scenes
- URL: http://arxiv.org/abs/2312.13646v3
- Date: Thu, 18 Jan 2024 10:13:51 GMT
- Title: Weakly Supervised Semantic Segmentation for Driving Scenes
- Authors: Dongseob Kim, Seungho Lee, Junsuk Choe, Hyunjung Shim
- Abstract summary: State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) exhibit severe performance degradation on driving scene datasets.
We develop a new WSSS framework tailored to driving scene datasets.
- Score: 27.0285166404621
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS)
using image-level labels exhibit severe performance degradation on driving
scene datasets such as Cityscapes. To address this challenge, we develop a new
WSSS framework tailored to driving scene datasets. Based on extensive analysis
of dataset characteristics, we employ Contrastive Language-Image Pre-training
(CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key
challenges: (1) pseudo-masks from CLIP lack in representing small object
classes, and (2) these masks contain notable noise. We propose solutions for
each issue as follows. (1) We devise Global-Local View Training that seamlessly
incorporates small-scale patches during model training, thereby enhancing the
model's capability to handle small-sized yet critical objects in driving scenes
(e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing
(CARB), a novel technique that discerns reliable and noisy regions through
evaluating the consistency between CLIP masks and segmentation predictions. It
prioritizes reliable pixels over noisy pixels via adaptive loss weighting.
Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test
dataset, showcasing its potential as a strong WSSS baseline on driving scene
datasets. Experimental results on CamVid and WildDash2 demonstrate the
effectiveness of our method across diverse datasets, even with small-scale
datasets or visually challenging conditions. The code is available at
https://github.com/k0u-id/CARB.
Related papers
- Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras [45.063747874243276]
We present EV-WSSS: a novel weakly supervised approach for event-based semantic segmentation.
The proposed framework performs asymmetric dual-student learning between 1) the original forward event data and 2) the longer reversed event data.
We show that the proposed method achieves substantial segmentation results even without relying on pixel-level dense ground truths.
arXiv Detail & Related papers (2024-07-15T20:00:50Z) - CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud
Semantic Segmentation [60.0893353960514]
We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations.
We propose a Contextual Point Cloud Modeling ( CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method.
arXiv Detail & Related papers (2023-07-19T04:41:18Z) - High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training.
Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss.
We reformulate both techniques based on binomial posteriors of multiple independent binary problems.
This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - 1st Place Solution of The Robust Vision Challenge (RVC) 2022 Semantic
Segmentation Track [67.56316745239629]
This report describes the winning solution to the semantic segmentation task of the Robust Vision Challenge on ECCV 2022.
Our method adopts the FAN-B-Hybrid model as the encoder and uses Segformer as the segmentation framework.
The proposed method could serve as a strong baseline for the multi-domain segmentation task and benefit future works.
arXiv Detail & Related papers (2022-10-23T20:52:22Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Large-scale Unsupervised Semantic Segmentation [163.3568726730319]
We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress.
Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 40k high-quality semantic segmentation annotations for evaluation.
arXiv Detail & Related papers (2021-06-06T15:02:11Z) - Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly
Supervised Semantic Segmentation [16.560870740946275]
Explicit Pseudo-pixel Supervision (EPS) learns from pixel-level feedback by combining two weak supervisions.
We devise a joint training strategy to fully utilize the complementary relationship between both information.
Our method can obtain accurate object boundaries and discard co-occurring pixels, thereby significantly improving the quality of pseudo-masks.
arXiv Detail & Related papers (2021-05-19T07:31:11Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z) - Reinforced active learning for image segmentation [34.096237671643145]
We present a new active learning strategy for semantic segmentation based on deep reinforcement learning (RL)
An agent learns a policy to select a subset of small informative image regions -- opposed to entire images -- to be labeled from a pool of unlabeled data.
Our method proposes a new modification of the deep Q-network (DQN) formulation for active learning, adapting it to the large-scale nature of semantic segmentation problems.
arXiv Detail & Related papers (2020-02-16T14:03:06Z) - SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy Dataset and
Benchmark [11.101588888002045]
We release SVIRO, a synthetic dataset for sceneries in the passenger compartment of ten different vehicles.
We analyze machine learning-based approaches for their generalization capacities and reliability when trained on a limited number of variations.
arXiv Detail & Related papers (2020-01-10T14:44:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.