Related papers: Multi-label Scene Classification for Autonomous Vehicles: Acquiring and Accumulating Knowledge from Diverse Datasets

Multi-label Scene Classification for Autonomous Vehicles: Acquiring and Accumulating Knowledge from Diverse Datasets

URL: http://arxiv.org/abs/2506.17101v3
Date: Wed, 17 Sep 2025 19:22:23 GMT
Title: Multi-label Scene Classification for Autonomous Vehicles: Acquiring and Accumulating Knowledge from Diverse Datasets
Authors: Ke Li, Chenyu Zhang, Yuxin Ding, Xianbiao Hu, Ruwen Qin,
Abstract summary: This paper introduces a novel deep learning method that integrates Knowledge Acquisition and Accumulation (KAA) with Consistency-based Active Learning (CAL)<n>An ablation study on the newly developed Driving Scene Identification (DSI) dataset demonstrates a 56.1% improvement over an ImageNet-pretrained baseline.<n> KAA-CAL outperforms state-of-the-art multi-label classification methods on the BDD100K and HSD datasets.
Score: 13.41074576587372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Driving scenes are inherently heterogeneous and dynamic. Multi-attribute scene identification, as a high-level visual perception capability, provides autonomous vehicles (AVs) with essential contextual awareness to understand, reason through, and interact with complex driving environments. Although scene identification is best modeled as a multi-label classification problem via multitask learning, it faces two major challenges: the difficulty of acquiring balanced, comprehensively annotated datasets and the need to re-annotate all training data when new attributes emerge. To address these challenges, this paper introduces a novel deep learning method that integrates Knowledge Acquisition and Accumulation (KAA) with Consistency-based Active Learning (CAL). KAA leverages monotask learning on heterogeneous single-label datasets to build a knowledge foundation, while CAL bridges the gap between single- and multi-label data, adapting the foundation model for multi-label scene classification. An ablation study on the newly developed Driving Scene Identification (DSI) dataset demonstrates a 56.1% improvement over an ImageNet-pretrained baseline. Moreover, KAA-CAL outperforms state-of-the-art multi-label classification methods on the BDD100K and HSD datasets, achieving this with 85% less data and even recognizing attributes unseen during foundation model training. The DSI dataset and KAA-CAL implementation code are publicly available at https://github.com/KELISBU/KAA-CAL .

Related papers

Image Classification with Deep Reinforcement Active Learning [28.924413229981827]
In many real-world scenarios, labeled data are scarce, and their hand-labeling is time, effort and cost demanding.<n>Active learning is an alternative paradigm that mitigates the effort in hand-labeling data, and annotated by an expert.<n>In this work, we devise an adaptive active learning method based on Markov Decision Process (MDP)
arXiv Detail & Related papers (2024-12-27T18:37:51Z)
TSceneJAL: Joint Active Learning of Traffic Scenes for 3D Object Detection [26.059907173437114]
TSceneJAL framework can efficiently sample the balanced, diverse, and complex traffic scenes from both labeled and unlabeled data.<n>Our approach outperforms existing state-of-the-art methods on 3D object detection tasks with up to 12% improvements.
arXiv Detail & Related papers (2024-12-25T11:07:04Z)
Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z)
UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised [12.440461420762265]
Road segmentation is a critical task for autonomous driving systems. Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps. One of the primary challenges is the scarcity of large-scale, accurately labeled datasets.
arXiv Detail & Related papers (2024-09-10T03:57:30Z)
Incremental Self-training for Semi-supervised Learning [56.57057576885672]
IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed.
arXiv Detail & Related papers (2024-04-14T05:02:00Z)
Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems. On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z)
LVLane: Deep Learning for Lane Detection and Classification in Challenging Conditions [2.5641096293146712]
We present an end-to-end lane detection and classification system based on deep learning methodologies. In our study, we introduce a unique dataset meticulously curated to encompass scenarios that pose significant challenges for state-of-the-art (SOTA) lane localization models. We propose a CNN-based classification branch, seamlessly integrated with the detector, facilitating the identification of distinct lane types.
arXiv Detail & Related papers (2023-07-13T16:09:53Z)
Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings. We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z)
Association Graph Learning for Multi-Task Classification with Category Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. We learn an association graph to transfer knowledge among tasks for missing classes. Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z)
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation [71.51719469058666]
We propose a representation learning framework called X-Learner. X-Learner learns the universal feature of multiple vision tasks supervised by various sources. X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance. Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z)
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
Adaptive Prototypical Networks with Label Words and Joint Representation Learning for Few-Shot Relation Classification [17.237331828747006]
This work focuses on few-shot relation classification (FSRC) We propose an adaptive mixture mechanism to add label words to the representation of the class prototype. Experiments have been conducted on FewRel under different few-shot (FS) settings.
arXiv Detail & Related papers (2021-01-10T11:25:42Z)
Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs [8.44680447457879]
We present a simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships. We show that methods equipped with the multi-graph knowledge aggregation achieve significant performance improvement across almost all the measures on few/zero-shot labels.
arXiv Detail & Related papers (2020-10-15T01:15:43Z)
Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks. It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features. It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z)
Information Condensing Active Learning [4.769747792846005]
We introduce Information Condensing Active Learning (ICAL), a batch mode model Active Learning (AL) method targeted at Deep Bayesian Active Learning. ICAL uses the Hilbert Schmidt Independence Criterion (HSIC) to measure the strength of the dependency between a candidate batch of points and the unlabeled set. We show significant improvements in terms of model accuracy and negative log likelihood (NLL) on several image datasets compared to state of the art batch mode AL methods for deep learning.
arXiv Detail & Related papers (2020-02-18T22:55:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.