BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic
Segmentation
- URL: http://arxiv.org/abs/2310.08035v2
- Date: Wed, 13 Mar 2024 02:05:59 GMT
- Title: BaSAL: Size-Balanced Warm Start Active Learning for LiDAR Semantic
Segmentation
- Authors: Jiarong Wei, Yancong Lin, Holger Caesar
- Abstract summary: Existing active learning methods overlook the severe class imbalance inherent in LiDAR semantic segmentation datasets.
We propose BaSAL, a size-balanced warm start active learning model, based on the observation that each object class has a characteristic size.
Results show that we are able to improve the performance of the initial model by a large margin.
- Score: 2.9290232815049926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning strives to reduce the need for costly data annotation, by
repeatedly querying an annotator to label the most informative samples from a
pool of unlabeled data, and then training a model from these samples. We
identify two problems with existing active learning methods for LiDAR semantic
segmentation. First, they overlook the severe class imbalance inherent in LiDAR
semantic segmentation datasets. Second, to bootstrap the active learning loop
when there is no labeled data available, they train their initial model from
randomly selected data samples, leading to low performance. This situation is
referred to as the cold start problem. To address these problems we propose
BaSAL, a size-balanced warm start active learning model, based on the
observation that each object class has a characteristic size. By sampling
object clusters according to their size, we can thus create a size-balanced
dataset that is also more class-balanced. Furthermore, in contrast to existing
information measures like entropy or CoreSet, size-based sampling does not
require a pretrained model, thus addressing the cold start problem effectively.
Results show that we are able to improve the performance of the initial model
by a large margin. Combining warm start and size-balanced sampling with
established information measures, our approach achieves comparable performance
to training on the entire SemanticKITTI dataset, despite using only 5% of the
annotations, outperforming existing active learning methods. We also match the
existing state-of-the-art in active learning on nuScenes. Our code is available
at: https://github.com/Tony-WJR/BaSAL.
Related papers
- S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference.
Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z) - Class Balance Matters to Active Class-Incremental Learning [61.11786214164405]
We aim to start from a pool of large-scale unlabeled data and then annotate the most informative samples for incremental learning.
We propose Class-Balanced Selection (CBS) strategy to achieve both class balance and informativeness in chosen samples.
Our CBS can be plugged and played into those CIL methods which are based on pretrained models with prompts tunning technique.
arXiv Detail & Related papers (2024-12-09T16:37:27Z) - Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods.
We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Iterative Loop Learning Combining Self-Training and Active Learning for
Domain Adaptive Semantic Segmentation [1.827510863075184]
Self-training and active learning have been proposed to alleviate this problem.
This paper proposes an iterative loop learning method combining Self-Training and Active Learning.
arXiv Detail & Related papers (2023-01-31T01:31:43Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Minority Class Oriented Active Learning for Imbalanced Datasets [6.009262446889319]
We introduce a new active learning method which is designed for imbalanced datasets.
It favors samples likely to be in minority classes so as to reduce the imbalance of the labeled subset.
We also compare two training schemes for active learning.
arXiv Detail & Related papers (2022-02-01T13:13:41Z) - Optimizing Active Learning for Low Annotation Budgets [6.753808772846254]
In deep learning, active learning is usually implemented as an iterative process in which successive deep models are updated via fine tuning.
We tackle this issue by using an approach inspired by transfer learning.
We introduce a novel acquisition function which exploits the iterative nature of AL process to select samples in a more robust fashion.
arXiv Detail & Related papers (2022-01-18T18:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.