Learn 3D VQA Better with Active Selection and Reannotation
- URL: http://arxiv.org/abs/2507.04630v1
- Date: Mon, 07 Jul 2025 03:18:54 GMT
- Title: Learn 3D VQA Better with Active Selection and Reannotation
- Authors: Shengli Zhou, Yang Liu, Feng Zheng,
- Abstract summary: In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset.<n>We propose a multi-turn interactive active learning strategy that selects data based on models' semantic uncertainty to form a solid knowledge foundation.<n>Experiments exhibit better model performance and a substantial reduction in training costs, with a halving of training costs for achieving relatively high accuracy.
- Score: 46.687613392366174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D Visual Question Answering (3D VQA) is crucial for enabling models to perceive the physical world and perform spatial reasoning. In 3D VQA, the free-form nature of answers often leads to improper annotations that can confuse or mislead models when training on the entire dataset. While other text generation tasks can mitigate this issue by learning on large-scale datasets, the scarcity of 3D scene data enlarges the negative effect of misleading annotations. Although active learning strategies can select valuable instances for training, they fail to identify and resolve misleading labels, which the oracle inevitably provides in practice. To address this issue, we propose a multi-turn interactive active learning strategy. This strategy selects data based on models' semantic uncertainty to form a solid knowledge foundation more effectively and actively requests reannotation from an oracle to resolve potentially misleading labels. For uncertainty assessment, we utilize a variance-based metric that takes semantic relationships between terms into consideration, thus avoiding the uniform inter-class similarity assumption of previous assessment metrics. Extensive experiments exhibit better model performance and a substantial reduction in training costs, with a halving of training costs for achieving relatively high accuracy. The code is available at https://github.com/fz-zsl/AQuA.
Related papers
- Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection [25.520626014113585]
Evolvable Symbolic Visual Grounder (EaSe) is a training-free symbolic framework for 3D visual grounding.<n>EaSe achieves 52.9% accuracy on Nr3D dataset and 49.2% Acc@0.25 on ScanRefer.<n>It substantially reduces the inference time and cost, offering a balanced trade-off between performance and efficiency.
arXiv Detail & Related papers (2025-02-03T14:32:36Z) - iMatching: Imperative Correspondence Learning [5.568520539073218]
We introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence.<n>It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels.<n>We demonstrate superior performance on tasks including feature matching and pose estimation.
arXiv Detail & Related papers (2023-12-04T18:58:20Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Data Augmentation-free Unsupervised Learning for 3D Point Cloud
Understanding [61.30276576646909]
We propose an augmentation-free unsupervised approach for point clouds to learn transferable point-level features via soft clustering, named SoftClu.
We exploit the affiliation of points to their clusters as a proxy to enable self-training through a pseudo-label prediction task.
arXiv Detail & Related papers (2022-10-06T10:18:16Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling [18.209409027211404]
3D object detection is an important task in computer vision.
Most existing methods require a large number of high-quality 3D annotations, which are expensive to collect.
We propose a novel semi-supervised framework based on pseudo-labeling for outdoor 3D object detection tasks.
arXiv Detail & Related papers (2021-08-15T02:58:43Z) - Hidden Footprints: Learning Contextual Walkability from 3D Human Trails [70.01257397390361]
Current datasets only tell you where people are, not where they could be.
We first augment the set of valid, labeled walkable regions by propagating person observations between images, utilizing 3D information to create what we call hidden footprints.
We devise a training strategy designed for such sparse labels, combining a class-balanced classification loss with a contextual adversarial loss.
arXiv Detail & Related papers (2020-08-19T23:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.