Related papers: CBDES MoE: Hierarchically Decoupled Mixture-of-Experts for Functional Modules in Autonomous Driving

CBDES MoE: Hierarchically Decoupled Mixture-of-Experts for Functional Modules in Autonomous Driving

URL: http://arxiv.org/abs/2508.07838v1
Date: Mon, 11 Aug 2025 10:44:25 GMT
Title: CBDES MoE: Hierarchically Decoupled Mixture-of-Experts for Functional Modules in Autonomous Driving
Authors: Qi Xiang, Kunsong Shi, Zhigui Lin, Lei He,
Abstract summary: We propose a hierarchically decoupled Mixture-of-Experts architecture at the functional module level.<n>CBDES MoE integrates multiple structurally heterogeneous expert networks with a lightweight Self-Attention Router gating mechanism.<n>It consistently outperforms fixed single-expert baselines in 3D object detection.
Score: 2.9741451632381755
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bird's Eye View (BEV) perception systems based on multi-sensor feature fusion have become a fundamental cornerstone for end-to-end autonomous driving. However, existing multi-modal BEV methods commonly suffer from limited input adaptability, constrained modeling capacity, and suboptimal generalization. To address these challenges, we propose a hierarchically decoupled Mixture-of-Experts architecture at the functional module level, termed Computing Brain DEvelopment System Mixture-of-Experts (CBDES MoE). CBDES MoE integrates multiple structurally heterogeneous expert networks with a lightweight Self-Attention Router (SAR) gating mechanism, enabling dynamic expert path selection and sparse, input-aware efficient inference. To the best of our knowledge, this is the first modular Mixture-of-Experts framework constructed at the functional module granularity within the autonomous driving domain. Extensive evaluations on the real-world nuScenes dataset demonstrate that CBDES MoE consistently outperforms fixed single-expert baselines in 3D object detection. Compared to the strongest single-expert model, CBDES MoE achieves a 1.6-point increase in mAP and a 4.1-point improvement in NDS, demonstrating the effectiveness and practical advantages of the proposed approach.

Related papers

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns [68.61814799047956]
Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation.<n>We introduce ExpertWeaver, a training-free framework that partitions neurons according to their activation patterns and constructs shared experts and specialized routed experts with layer-adaptive configurations.
arXiv Detail & Related papers (2026-02-17T11:50:58Z)
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe [51.26601054313749]
Recent efforts on Diffusion MoE models have primarily focused on developing more sophisticated routing mechanisms.<n>Inspired by the MoE design paradigms established in large language models (LLMs), we identify a set of crucial architectural factors for building effective Diffusion MoE models.<n>We present novel architectures that can be efficiently applied to both latent and pixel-space diffusion frameworks.
arXiv Detail & Related papers (2025-12-01T03:52:31Z)
Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms [55.1784306456972]
Mixture-of-Experts (MoE) architectures have emerged as a promising direction, offering efficiency and scalability by activating only a subset of parameters during inference.<n>We use an internal metric to investigate the mechanisms of MoE architecture by explicitly incorporating routing mechanisms and analyzing expert-level behaviors.<n>We uncover several findings: (1) neuron utilization decreases as models evolve, reflecting stronger generalization; (2) training exhibits a dynamic trajectory, where benchmark performance alone provides limited signal; (3) task completion emerges from collaborative contributions of multiple experts, with shared experts driving concentration; and (4) activation patterns at the neuron level provide a fine-grained proxy for data diversity.
arXiv Detail & Related papers (2025-09-28T15:13:38Z)
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models [52.876185634349575]
We propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to Large Vision-Language Models (LVLMs)<n>For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts.<n>Our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models.
arXiv Detail & Related papers (2025-08-13T13:00:05Z)
GMF-Drive: Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving [5.450011907283289]
This paper introduces GMF-Drive, an end-to-end framework that overcomes challenges through two principled innovations.<n>First, we supersede the information-limited histogram-based LiDAR representation with a geometrically-augmented pillar format.<n>Second, we propose a novel hierarchical mamba fusion architecture that substitutes an expensive transformer with a highly efficient, spatially-aware state-space model.
arXiv Detail & Related papers (2025-08-08T08:17:18Z)
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z)
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation [24.48561340129571]
RingMoE is a unified RS foundation model with 14.7 billion parameters, pre-trained on 400 million multi-modal RS images from nine satellites.<n>It has been deployed and trialed in multiple sectors, including emergency response, land management, marine sciences, and urban planning.
arXiv Detail & Related papers (2025-04-04T04:47:54Z)
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach [14.639659415276533]
Mixture of Experts (MoE) have shown remarkable success in leveraging specialized expert networks for complex machine learning tasks.<n>Their susceptibility to adversarial attacks presents a critical challenge for deployment in robust applications.<n>This paper addresses the question of how to incorporate robustness into MoEs while maintaining high natural accuracy.
arXiv Detail & Related papers (2025-02-05T20:45:52Z)
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging [111.8456671452411]
Multi-task learning (MTL) leverages a shared model to accomplish multiple tasks and facilitate knowledge transfer. We propose a Weight-Ensembling Mixture of Experts (WEMoE) method for multi-task model merging. We show that WEMoE and E-WEMoE outperform state-of-the-art (SOTA) model merging methods in terms of MTL performance, generalization, and robustness.
arXiv Detail & Related papers (2024-10-29T07:16:31Z)
AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data. This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems. We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z)
Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment [103.05005690990271]
Mixture of insighTful Experts (MoTE) is a novel framework that combines reasoning chains and expert mixtures to improve self-alignments.<n>MoTE significantly improves model safety, jailbreak resistance, and over-refusal capabilities, achieving performance comparable to OpenAI's state-of-the-art o1 model.
arXiv Detail & Related papers (2024-05-01T15:06:05Z)
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation [44.43376913419967]
We propose an efficient Mixture-of-Experts (MoE) architecture with weight sharing across experts. MoFME implicitly instantiates multiple experts via learnable activation modulations on a single shared expert block. Experiments show that our MoFME outperforms the baselines in the image restoration quality by 0.1-0.2 dB.
arXiv Detail & Related papers (2023-12-27T15:23:37Z)
LAMBO: Large AI Model Empowered Edge Intelligence [71.56135386994119]
Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques. Traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability. We propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems.
arXiv Detail & Related papers (2023-08-29T07:25:42Z)
UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving [47.590099762244535]
Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks. This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving. To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, we propose UniM$2$AE.
arXiv Detail & Related papers (2023-08-21T02:13:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.