Emergent Outlier View Rejection in Visual Geometry Grounded Transformers
- URL: http://arxiv.org/abs/2512.04012v1
- Date: Wed, 03 Dec 2025 17:48:25 GMT
- Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers
- Authors: Jisang Han, Sunghwan Hong, Jaewoo Jung, Wooseok Jang, Honggyu An, Qianqian Wang, Seungryong Kim, Chen Feng,
- Abstract summary: "Noisy" images-irrelevant inputs with little or no view overlap with others hinder Reliable 3D reconstruction from in-the-wild image collections.<n>Traditional Structure-from-Motion pipelines handle such cases through geometric verification and outlier rejection.<n>In this paper, we discover that the existing feed-forward reconstruction model, VGGT, can inherently distinguish distractor images.
- Score: 45.83800698097105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable 3D reconstruction from in-the-wild image collections is often hindered by "noisy" images-irrelevant inputs with little or no view overlap with others. While traditional Structure-from-Motion pipelines handle such cases through geometric verification and outlier rejection, feed-forward 3D reconstruction models lack these explicit mechanisms, leading to degraded performance under in-the-wild conditions. In this paper, we discover that the existing feed-forward reconstruction model, e.g., VGGT, despite lacking explicit outlier-rejection mechanisms or noise-aware training, can inherently distinguish distractor images. Through an in-depth analysis under varying proportions of synthetic distractors, we identify a specific layer that naturally exhibits outlier-suppressing behavior. Further probing reveals that this layer encodes discriminative internal representations that enable an effective noise-filtering capability, which we simply leverage to perform outlier-view rejection in feed-forward 3D reconstruction without any additional fine-tuning or supervision. Extensive experiments on both controlled and in-the-wild datasets demonstrate that this implicit filtering mechanism is consistent and generalizes well across diverse scenarios.
Related papers
- Self-Aware Object Detection via Degradation Manifolds [3.8265249634979734]
In safety-critical settings, it is insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime.<n>We introduce a degradation-aware self-awareness framework based on degradation manifold.<n>Our method augments a standard detection backbone with a lightweight embedding head trained via contrastive learning.
arXiv Detail & Related papers (2026-02-20T17:58:46Z) - Rectifying Latent Space for Generative Single-Image Reflection Removal [16.341477336909765]
Single-image removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions.<n>This work reframes an editing-purpose latent diffusion model to effectively perceive and process highly ambiguous, layered image inputs.
arXiv Detail & Related papers (2025-12-06T09:16:14Z) - Revisiting Reconstruction-based AI-generated Image Detection: A Geometric Perspective [50.83711509908479]
We introduce the Jacobian-Spectral Lower Bound for reconstruction error from a geometric perspective.<n>We show that real images off the reconstruction manifold exhibit a non-trivial error lower bound, while generated images on the manifold have near-zero error.<n>We propose ReGap, a training-free method that computes dynamic reconstruction error by leveraging structured editing operations.
arXiv Detail & Related papers (2025-10-29T03:45:03Z) - RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions [67.48495052903534]
We propose a general and efficient multi-view feature enhancement module, RobustGS.<n>It substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions.<n>The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner.
arXiv Detail & Related papers (2025-08-05T04:50:29Z) - Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation [19.5984577708016]
We propose a multi-range representations-driven adversarial stego generation framework called MRAG for JPEG image hiding.<n>MRAG integrates the local-range characteristic of the convolution and the global-range modeling of the transformer.<n>It computes the adversarial loss between covers and stegos based on the surrogate steganalyzer's classified features.
arXiv Detail & Related papers (2025-07-11T06:45:07Z) - Transparency Distortion Robustness for SOTA Image Segmentation Tasks [4.1119273264193685]
We propose a method to synthetically augment existing datasets with spatially varying distortions.
Our experiments show, that these distortion effects degrade the performance of state-of-the-art segmentation models.
arXiv Detail & Related papers (2024-05-21T15:30:25Z) - Improved Cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement [14.973360669658561]
We propose a self-supervised variational autoencoder architecture called "HetACUMN" based on amortized inference.
Results on simulated datasets show that HetACUMN generated more accurate conformational classifications than other amortized or non-amortized methods.
arXiv Detail & Related papers (2023-08-09T13:41:30Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.