Related papers: DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding

DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding

URL: http://arxiv.org/abs/2511.11232v1
Date: Fri, 14 Nov 2025 12:32:45 GMT
Title: DoReMi: A Domain-Representation Mixture Framework for Generalizable 3D Understanding
Authors: Mingwei Xing, Xinliang Wang, Yifeng Shi,
Abstract summary: DoReMi is a Mixture-of-Experts (MoE) framework that jointly models Domain-aware Experts branch and a unified Representation branch.<n>DoReMi achieves 80.1% mIoU on ScanNet Val and 77.2% mIoU on S3DIS, demonstrating competitive or superior performance compared to existing approaches.
Score: 10.259254902492978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generalization of 3D deep learning across multiple domains remains limited by the limited scale of existing datasets and the high heterogeneity of multi-source point clouds. Point clouds collected from different sensors (e.g., LiDAR scans and mesh-derived point clouds) exhibit substantial discrepancies in density and noise distribution, resulting in negative transfer during multi-domain fusion. Most existing approaches focus exclusively on either domain-aware or domain-general features, overlooking the potential synergy between them. To address this, we propose DoReMi (Domain-Representation Mixture), a Mixture-of-Experts (MoE) framework that jointly models Domain-aware Experts branch and a unified Representation branch to enable cooperative learning between specialized and generalizable knowledge. DoReMi dynamically activates domain-aware expert branch via Domain-Guided Spatial Routing (DSR) for context-aware expert selection and employs Entropy-Controlled Dynamic Allocation (EDA) for stable and efficient expert utilization, thereby adaptively modeling diverse domain distributions. Complemented by a frozen unified representation branch pretrained through robust multi-attribute self-supervised learning, DoReMi preserves cross-domain geometric and structural priors while maintaining global consistency. We evaluate DoReMi across multiple 3D understanding benchmarks. Notably, DoReMi achieves 80.1% mIoU on ScanNet Val and 77.2% mIoU on S3DIS, demonstrating competitive or superior performance compared to existing approaches, and showing strong potential as a foundation framework for future 3D understanding research. The code will be released soon.

Related papers

MCI-Net: A Robust Multi-Domain Context Integration Network for Point Cloud Registration [28.6535442193107]
We propose a multi-domain context integration network (MCI-Net) that improves feature representation and registration performance.<n>Specifically, we propose a graph neighborhood aggregation module, which constructs a global graph to capture the overall structural relationships within point clouds.<n>We then propose a progressive context interaction module to enhance feature discriminability.<n>Experiments on indoor RGB-D and outdoor LiDAR datasets show that the proposed MCI-Net significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-12-29T13:55:33Z)
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift [62.50795372173394]
We conduct an exhaustive study to identify recipes for exploiting vision foundation models (VFMs) in unsupervised domain adaptation for semantic segmentation of lidar point clouds.<n>The resulting pipeline achieves state-of-the-art results in four widely-recognized and challenging settings.
arXiv Detail & Related papers (2025-11-21T17:57:43Z)
MSCN: Multi-view Structural Convolution Network for Domain-Invariant Point Cloud Recognition of Autonomous Vehicles [1.7616042687330637]
Multi-view Structural Convolution Network (MSCN) is a novel architecture designed to achieve domain-invariant recognition.<n>MSCN consistently outperforms state-of-the-art point cloud classification methods across all domain change scenarios.
arXiv Detail & Related papers (2025-01-27T18:25:35Z)
Multimodal 3D Object Detection on Unseen Domains [37.142470149311904]
Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. We propose CLIX$text3D$, a multimodal fusion and supervised contrastive learning framework for 3D object detection. We show that CLIX$text3D$ yields state-of-the-art domain generalization performance under multiple dataset shifts.
arXiv Detail & Related papers (2024-04-17T21:47:45Z)
Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting [67.38137379297717]
Multidomain crowd counting aims to learn a general model for multiple diverse datasets. Deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias. We propose a Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting.
arXiv Detail & Related papers (2024-02-06T06:49:04Z)
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection [78.09431523221458]
DI-V2X aims to learn Domain-Invariant representations through a new distillation framework. DI-V2X comprises three essential components: a domain-mixing instance augmentation (DMA) module, a progressive domain-invariant distillation (PDD) module, and a domain-adaptive fusion (DAF) module.
arXiv Detail & Related papers (2023-12-25T14:40:46Z)
Adapting Self-Supervised Representations to Multi-Domain Setups [47.03992469282679]
Current state-of-the-art self-supervised approaches, are effective when trained on individual domains but show limited generalization on unseen domains. We propose a general-purpose, lightweight Domain Disentanglement Module that can be plugged into any self-supervised encoder.
arXiv Detail & Related papers (2023-09-07T20:05:39Z)
Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval [55.122020263319634]
Video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain but the domain of interest only contains unannotated datasets. We propose a novel Multi-Modal Cross-Domain Alignment network to transfer the annotation knowledge from the source domain to the target domain.
arXiv Detail & Related papers (2022-09-23T12:58:20Z)
META: Mimicking Embedding via oThers' Aggregation for Generalizable Person Re-identification [68.39849081353704]
Domain generalizable (DG) person re-identification (ReID) aims to test across unseen domains without access to the target domain data at training time. This paper presents a new approach called Mimicking Embedding via oThers' Aggregation (META) for DG ReID.
arXiv Detail & Related papers (2021-12-16T08:06:50Z)
AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
Domain Conditioned Adaptation Network [90.63261870610211]
We propose a Domain Conditioned Adaptation Network (DCAN) to excite distinct convolutional channels with a domain conditioned channel attention mechanism. This is the first work to explore the domain-wise convolutional channel activation for deep DA networks.
arXiv Detail & Related papers (2020-05-14T04:23:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.