Related papers: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers

Emergent Outlier View Rejection in Visual Geometry Grounded Transformers

URL: http://arxiv.org/abs/2512.04012v1
Date: Wed, 03 Dec 2025 17:48:25 GMT
Title: Emergent Outlier View Rejection in Visual Geometry Grounded Transformers
Authors: Jisang Han, Sunghwan Hong, Jaewoo Jung, Wooseok Jang, Honggyu An, Qianqian Wang, Seungryong Kim, Chen Feng,
Abstract summary: "Noisy" images-irrelevant inputs with little or no view overlap with others hinder Reliable 3D reconstruction from in-the-wild image collections.<n>Traditional Structure-from-Motion pipelines handle such cases through geometric verification and outlier rejection.<n>In this paper, we discover that the existing feed-forward reconstruction model, VGGT, can inherently distinguish distractor images.
Score: 45.83800698097105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable 3D reconstruction from in-the-wild image collections is often hindered by "noisy" images-irrelevant inputs with little or no view overlap with others. While traditional Structure-from-Motion pipelines handle such cases through geometric verification and outlier rejection, feed-forward 3D reconstruction models lack these explicit mechanisms, leading to degraded performance under in-the-wild conditions. In this paper, we discover that the existing feed-forward reconstruction model, e.g., VGGT, despite lacking explicit outlier-rejection mechanisms or noise-aware training, can inherently distinguish distractor images. Through an in-depth analysis under varying proportions of synthetic distractors, we identify a specific layer that naturally exhibits outlier-suppressing behavior. Further probing reveals that this layer encodes discriminative internal representations that enable an effective noise-filtering capability, which we simply leverage to perform outlier-view rejection in feed-forward 3D reconstruction without any additional fine-tuning or supervision. Extensive experiments on both controlled and in-the-wild datasets demonstrate that this implicit filtering mechanism is consistent and generalizes well across diverse scenarios.

Related papers

Self-Aware Object Detection via Degradation Manifolds [3.8265249634979734]
In safety-critical settings, it is insufficient to produce predictions without assessing whether the input remains within the detector's nominal operating regime.<n>We introduce a degradation-aware self-awareness framework based on degradation manifold.<n>Our method augments a standard detection backbone with a lightweight embedding head trained via contrastive learning.
arXiv Detail & Related papers (2026-02-20T17:58:46Z)
Rectifying Latent Space for Generative Single-Image Reflection Removal [16.341477336909765]
Single-image removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions.<n>This work reframes an editing-purpose latent diffusion model to effectively perceive and process highly ambiguous, layered image inputs.
arXiv Detail & Related papers (2025-12-06T09:16:14Z)
Revisiting Reconstruction-based AI-generated Image Detection: A Geometric Perspective [50.83711509908479]
We introduce the Jacobian-Spectral Lower Bound for reconstruction error from a geometric perspective.<n>We show that real images off the reconstruction manifold exhibit a non-trivial error lower bound, while generated images on the manifold have near-zero error.<n>We propose ReGap, a training-free method that computes dynamic reconstruction error by leveraging structured editing operations.
arXiv Detail & Related papers (2025-10-29T03:45:03Z)
RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions [67.48495052903534]
We propose a general and efficient multi-view feature enhancement module, RobustGS.<n>It substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions.<n>The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner.
arXiv Detail & Related papers (2025-08-05T04:50:29Z)
Towards Imperceptible JPEG Image Hiding: Multi-range Representations-driven Adversarial Stego Generation [19.5984577708016]
We propose a multi-range representations-driven adversarial stego generation framework called MRAG for JPEG image hiding.<n>MRAG integrates the local-range characteristic of the convolution and the global-range modeling of the transformer.<n>It computes the adversarial loss between covers and stegos based on the surrogate steganalyzer's classified features.
arXiv Detail & Related papers (2025-07-11T06:45:07Z)
Transparency Distortion Robustness for SOTA Image Segmentation Tasks [4.1119273264193685]
We propose a method to synthetically augment existing datasets with spatially varying distortions. Our experiments show, that these distortion effects degrade the performance of state-of-the-art segmentation models.
arXiv Detail & Related papers (2024-05-21T15:30:25Z)
Improved Cryo-EM Pose Estimation and 3D Classification through Latent-Space Disentanglement [14.973360669658561]
We propose a self-supervised variational autoencoder architecture called "HetACUMN" based on amortized inference. Results on simulated datasets show that HetACUMN generated more accurate conformational classifications than other amortized or non-amortized methods.
arXiv Detail & Related papers (2023-08-09T13:41:30Z)
Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives. We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z)
SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same. We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters. Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.