Summary
This week's theme centers on evaluating 3D reconstruction under realistic adverse conditions—noisy video, human-object interaction, and sparse or degraded observations. The representative papers argue that strong results in controlled or isolated tasks do not yet translate into reliable performance when motion, occlusion, sensing imperfections, or entity coupling become central.
Situation
The representative papers frame 3D reconstruction as increasingly constrained not by raw model capacity, but by the mismatch between simplified evaluation settings and real deployment conditions. In dense SLAM and scene reconstruction, existing methods have largely been developed and tested in clean, noise-free environments, while real use cases involve sensor degradation, dynamic motion, synchronization errors, and sparse or corrupted observations. In single-image face reconstruction, the problem remains inherently under-constrained because depth, lighting, albedo, identity, and expression must be disentangled from limited evidence, especially under occlusion or extreme expression.
A parallel concern appears in reconstruction benchmarks centered on interaction. The RHOBIN challenge highlights that human and object reconstruction have progressed substantially as separate problems, but joint human-object reconstruction remains difficult and still requires stronger correspondence estimation and optimization. Taken together, these papers show a field shifting toward evaluation protocols that deliberately expose ambiguity, perturbation, and coupling across entities, measuring progress by robustness rather than performance only in idealized settings.
Infographic (English)

Progress
NTIRE 2026 3D Restoration and Reconstruction in Real-world Adverse Conditions: RealX3D Challenge Results <See Details on Fugu-MT>
The NTIRE 2026 RealX3D challenge directly evaluates 3D reconstruction pipelines under real-world adverse capture conditions with 279 participants. Unlike cleaner prior benchmarks, it measures methods against explicitly degraded inputs and documents clear gains over existing baselines.
SmokeGS-R: Physics-Guided Pseudo-Clean 3DGS for Real-World Multi-View Smoke Restoration <See Details on Fugu-MT>
SmokeGS-R addresses multi-view 3D reconstruction when real smoke disrupts radiance and cross-view consistency. Rather than assuming clear observations, it introduces a physics-guided restoration pipeline for a concrete adverse-condition scenario within the NTIRE 3DRR setting.
Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation <See Details on Fugu-MT>
Stitch4D tackles 4D urban reconstruction from sparse multi-location captures with missing spatial coverage. It explicitly restores intermediate coverage before optimization to prevent geometric collapse, addressing a gap left by methods that assume denser spatial sampling.
FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos <See Details on Fugu-MT>
FunRec reconstructs functional 3D scenes from egocentric RGB-D interaction videos, linking geometry recovery to simulation-ready assets. Beyond static geometry, it couples scene reconstruction with hand-affordance mapping and downstream robot interaction, advancing interaction-aware evaluation.
Rendering Multi-Human and Multi-Object with 3D Gaussian Splatting <See Details on Fugu-MT>
MM-GS jointly reconstructs multiple interacting humans and objects from sparse views using a hierarchical 3DGS framework. Compared with treating humans and objects separately, it provides a unified multi-entity approach that directly addresses the coupling difficulty highlighted by interaction benchmarks.
Outlook
Near-term progress will likely continue pushing reconstruction evaluation away from idealized inputs toward explicitly adverse capture conditions. The Robust-Ego3D agenda points to adaptive perturbation selection, cross-domain and outdoor expansion, and more efficient processing; this week's challenge results, smoke-aware multi-view restoration, and sparse-coverage 4D urban recovery already move in that direction. Benchmark design and model pipelines are increasingly coupling robustness testing with restoration under realistic corruption, rather than treating reconstruction quality and data quality as separate concerns.
A second likely direction is tighter integration across time, views, and interacting entities. RHOBIN's future directions emphasize video inputs, motion priors, template-free object modeling, and multi-person or multi-object settings, while Pixel3DMM points toward multiview and video extensions with stronger disambiguation priors. This week's work on egocentric functional scenes and unified multi-human-object 3DGS fits that trajectory, suggesting continued movement toward systems that jointly reconstruct geometry, appearance, and interaction from sparse or occluded observations.
Infographic (English)

References
- RHOBIN Challenge: Reconstruction of Human Object Interaction - Authors: Xianghui Xie and Xi Wang and Nikos Athanasiou and Bharat Lal Bhatnagar and Chun-Hao P. Huang and Kaichun Mo and Hao Chen and Xia Jia and Zerui Zhang and Liangxian Cui and Xiao Lin and Bingqiao Qian and Jie Xiao and Wenfei Yang and Hyeongjin Nam and Daniel Sungho Jung and Kihoon Kim and Kyoung Mu Lee and Otmar Hilliges and Gerard Pons-Moll / <See Details on Fugu-MT> / License: CC-BY-4.0
- Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video - Authors: Xiaohao Xu, Tianyi Zhang, Shibo Zhao, Xiang Li, Sibo Wang, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Sebastian Scherer, Xiaonan Huang, / <See Details on Fugu-MT> / License: CC-BY-4.0
- Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction - Authors: Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner, / <See Details on Fugu-MT> / License: CC-BY-4.0