Related papers: Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation

URL: http://arxiv.org/abs/2602.02318v1
Date: Mon, 02 Feb 2026 16:46:45 GMT
Title: Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation
Authors: Xiang Li, Yupeng Zheng, Pengfei Li, Yilun Chen, Ya-Qin Zhang, Wenchao Ding,
Abstract summary: DiScene is a novel sparse query-based framework for occupancy prediction.<n>Our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, and (2) a Teacher-Guided Initialization policy.<n>With depth integration, DiScene attains new SOTA performance, surpassing EmbodiedOcc by 3.7% with 1.62$times$ faster inference speed.
Score: 29.342333234658682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty voxels, while sparse query-based approaches lack robustness in diverse and complex indoor scenes. In this paper, we propose DiScene, a novel sparse query-based framework that leverages multi-level distillation to achieve efficient and robust occupancy prediction. In particular, our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, which transfers hierarchical representations from large teacher models to lightweight students through coordinated alignment across four levels, including encoder-level feature alignment, query-level feature matching, prior-level spatial guidance, and anchor-level high-confidence knowledge transfer and (2) a Teacher-Guided Initialization policy, employing optimized parameter warm-up to accelerate model convergence. Validated on the Occ-Scannet benchmark, DiScene achieves 23.2 FPS without depth priors while outperforming our baseline method, OPUS, by 36.1% and even better than the depth-enhanced version, OPUS†. With depth integration, DiScene† attains new SOTA performance, surpassing EmbodiedOcc by 3.7% with 1.62$\times$ faster inference speed. Furthermore, experiments on the Occ3D-nuScenes benchmark and in-the-wild scenarios demonstrate the versatility of our approach in various environments. Code and models can be accessed at https://github.com/getterupper/DiScene.

Related papers

Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds [55.5576033344795]
We propose a novel DualBranch textbfCentertextbfSurrounding textbfContrast (CSCon) framework for 3D point clouds.<n>Under the FULL and ALL protocols, CSCon achieves performance comparable to generative methods.<n>Our method attains state-of-the-art results, even surpassing cross-modal approaches.
arXiv Detail & Related papers (2025-12-09T14:56:35Z)
Detect Anything via Next Point Prediction [51.55967987350882]
Rex- Omni is a 3B-scale MLLM that achieves state-of-the-art object perception performance.<n>On benchmarks like COCO and LVIS, Rex- Omni attains performance comparable to or exceeding regression-based models.
arXiv Detail & Related papers (2025-10-14T17:59:54Z)
HYPERDOA: Robust and Efficient DoA Estimation using Hyperdimensional Computing [8.27483835715597]
We introduce HYPERDOA, a novel estimator leveraging Hyperdimensional Computing (HDC)<n>It achieves 35.39% higher accuracy than state-of-the-art methods in low-SNR, coherent-source scenarios.<n>It also consumes 93% less energy than competing neural baselines on an embedded NVIDIA Jetson Xavier NX platform.
arXiv Detail & Related papers (2025-10-12T17:42:01Z)
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models [81.74999702045339]
Multi-Level Optimal Transport (MultiLevelOT) is a novel approach that advances the optimal transport for universal cross-tokenizer knowledge distillation.<n>Our method aligns the logit distributions of the teacher and the student at both token and sequence levels.<n>At the token level, MultiLevelOT integrates both global and local information by jointly optimizing all tokens within a sequence to enhance robustness.
arXiv Detail & Related papers (2024-12-19T04:51:06Z)
ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
OPUS: Occupancy Prediction Using a Sparse Set [64.60854562502523]
We present a framework to simultaneously predict occupied locations and classes using a set of learnable queries. OPUS incorporates a suite of non-trivial strategies to enhance model performance. Our lightest model achieves superior RayIoU on the Occ3D-nuScenes dataset at near 2x FPS, while our heaviest model surpasses previous best results by 6.1 RayIoU.
arXiv Detail & Related papers (2024-09-14T07:44:22Z)
Knowledge Transfer-Driven Few-Shot Class-Incremental Learning [23.163459923345556]
Few-shot class-incremental learning (FSCIL) aims to continually learn new classes using a few samples while not forgetting the old classes. Despite the advance of existing FSCIL methods, the proposed knowledge transfer learning schemes are sub-optimal due to the insufficient optimization for the model's plasticity. We propose a Random Episode Sampling and Augmentation (RESA) strategy that relies on diverse pseudo incremental tasks as agents to achieve the knowledge transfer.
arXiv Detail & Related papers (2023-06-19T14:02:45Z)
HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels. We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions. A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z)
Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points. Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches. We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy. Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z)
Boosting the Efficiency of Parametric Detection with Hierarchical Neural Networks [4.1410005218338695]
We propose Hierarchical Detection Network (HDN), a novel approach to efficient detection. The network is trained using a novel loss function, which encodes simultaneously the goals of statistical accuracy and efficiency. We show how training a three-layer HDN using two-layer model can further boost both accuracy and efficiency.
arXiv Detail & Related papers (2022-07-23T19:23:00Z)
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation [74.67594286008317]
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation. We propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level.
arXiv Detail & Related papers (2022-06-05T05:28:32Z)
Boosting RANSAC via Dual Principal Component Pursuit [24.942079487458624]
We introduce Dual Principal Component Pursuit (DPCP) as a robust subspace learning method with strong theoretical support and efficient algorithms. Experiments on estimating two-view homographies, fundamental and essential matrices, and three-view homographic tensors show that our approach consistently has higher accuracy than state-of-the-art alternatives.
arXiv Detail & Related papers (2021-10-06T17:04:45Z)
3DSSD: Point-based 3D Single Stage Object Detector [61.67928229961813]
We present a point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well.
arXiv Detail & Related papers (2020-02-24T12:01:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.