DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding
- URL: http://arxiv.org/abs/2511.11232v1
- Date: Fri, 14 Nov 2025 12:32:45 GMT
- Title: DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding
- Authors: Mingwei Xing, Xinliang Wang, Yifeng Shi,
- Abstract summary: DoReMi is a Mixture-of-Experts (MoE) framework that jointly models Domain-aware Experts branch and a unified Representation branch.<n>DoReMi achieves 80.1% mIoU on ScanNet Val and 77.2% mIoU on S3DIS, demonstrating competitive or superior performance compared to existing approaches.
- Score: 10.259254902492978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generalization of 3D deep learning across multiple domains remains limited by the limited scale of existing datasets and the high heterogeneity of multi-source point clouds. Point clouds collected from different sensors (e.g., LiDAR scans and mesh-derived point clouds) exhibit substantial discrepancies in density and noise distribution, resulting in negative transfer during multi-domain fusion. Most existing approaches focus exclusively on either domain-aware or domain-general features, overlooking the potential synergy between them. To address this, we propose DoReMi (Domain-Representation Mixture), a Mixture-of-Experts (MoE) framework that jointly models Domain-aware Experts branch and a unified Representation branch to enable cooperative learning between specialized and generalizable knowledge. DoReMi dynamically activates domain-aware expert branch via Domain-Guided Spatial Routing (DSR) for context-aware expert selection and employs Entropy-Controlled Dynamic Allocation (EDA) for stable and efficient expert utilization, thereby adaptively modeling diverse domain distributions. Complemented by a frozen unified representation branch pretrained through robust multi-attribute self-supervised learning, DoReMi preserves cross-domain geometric and structural priors while maintaining global consistency. We evaluate DoReMi across multiple 3D understanding benchmarks. Notably, DoReMi achieves 80.1% mIoU on ScanNet Val and 77.2% mIoU on S3DIS, demonstrating competitive or superior performance compared to existing approaches, and showing strong potential as a foundation framework for future 3D understanding research. The code will be released soon.
Related papers
- MCI-Net: A Robust Multi-Domain Context Integration Network for Point Cloud Registration [28.6535442193107]
We propose a multi-domain context integration network (MCI-Net) that improves feature representation and registration performance.<n>Specifically, we propose a graph neighborhood aggregation module, which constructs a global graph to capture the overall structural relationships within point clouds.<n>We then propose a progressive context interaction module to enhance feature discriminability.<n>Experiments on indoor RGB-D and outdoor LiDAR datasets show that the proposed MCI-Net significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-29T13:55:33Z) - Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift [62.50795372173394]
We conduct an exhaustive study to identify recipes for exploiting vision foundation models (VFMs) in unsupervised domain adaptation for semantic segmentation of lidar point clouds.<n>The resulting pipeline achieves state-of-the-art results in four widely-recognized and challenging settings.
arXiv Detail & Related papers (2025-11-21T17:57:43Z) - MSCN: Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles [1.7616042687330637]
Multi-view Structural Convolution Network (MSCN) is a novel architecture designed to achieve domain-invariant recognition.<n>MSCN consistently outperforms state-of-the-art point cloud classification methods across all domain change scenarios.
arXiv Detail & Related papers (2025-01-27T18:25:35Z) - Multimodal 3D Object Detection on Unseen Domains [37.142470149311904]
Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem.
We propose CLIX$text3D$, a multimodal fusion and supervised contrastive learning framework for 3D object detection.
We show that CLIX$text3D$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
arXiv Detail & Related papers (2024-04-17T21:47:45Z) - Virtual Classification: Modulating Domain-Specific Knowledge for
Multidomain Crowd Counting [67.38137379297717]
Multidomain crowd counting aims to learn a general model for multiple diverse datasets.
Deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias.
We propose a Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting.
arXiv Detail & Related papers (2024-02-06T06:49:04Z) - DI-V2X: Learning Domain-Invariant Representation for
Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework.
DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z) - Adapting Self-Supervised Representations to Multi-Domain Setups [47.03992469282679]
Current state-of-the-art self-supervised approaches, are effective when trained on individual domains but show limited generalization on unseen domains.
We propose a general-purpose, lightweight Domain Disentanglement Module that can be plugged into any self-supervised encoder.
arXiv Detail & Related papers (2023-09-07T20:05:39Z) - Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval [55.122020263319634]
Video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query.
In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain but the domain of interest only contains unannotated datasets.
We propose a novel Multi-Modal Cross-Domain Alignment network to transfer the annotation knowledge from the source domain to the target domain.
arXiv Detail & Related papers (2022-09-23T12:58:20Z) - META: Mimicking Embedding via oThers' Aggregation for Generalizable
Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time.
This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z) - Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism.
This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.