Latent attention on masked patches for flow reconstruction
- URL: http://arxiv.org/abs/2603.02028v1
- Date: Mon, 02 Mar 2026 16:12:40 GMT
- Title: Latent attention on masked patches for flow reconstruction
- Authors: Ben Eze, Luca Magri, Andrea Nóvoa,
- Abstract summary: We introduce the Latent Attention on Masked Patches (LAMP) model, an interpretable regression-based modified vision transformer for masked flow reconstruction.<n>We show that the LAMP accurately reconstructs the full flow field from a 90%-masked and noisy input, across signal-to-noise ratios between 10 and 30,dB.
- Score: 8.69419238669827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers have demonstrated outstanding performance on image generation applications, but their adoption in scientific disciplines, like fluid dynamics, has been limited. We introduce the Latent Attention on Masked Patches (LAMP) model, an interpretable regression-based modified vision transformer designed for masked flow reconstruction. LAMP follows a three-fold strategy: (i) partition of each flow snapshot into patches, (ii) dimensionality reduction of each patch via patch-wise proper orthogonal decomposition, and (iii) reconstruction of the full field from a masked input using a single-layer transformer trained via closed-form linear regression. We test the method on two canonical 2D unsteady wakes: a wake past a bluff body, and a chaotic wake past a flat plate. We show that the LAMP accurately reconstructs the full flow field from a 90\%-masked and noisy input, across signal-to-noise ratios between 10 and 30\,dB. Incorporating nonlinear measurement states can reduce the prediction error by up to an order of magnitude. The learned attention matrix yields physically interpretable multi-fidelity optimal sensor-placement maps. The modularity of the framework enables nonlinear compression and deep attention blocks, thereby providing an efficient baseline for nonlinear and high-dimensional masked flow reconstruction.
Related papers
- MirrorLA: Reflecting Feature Map for Vision Linear Attention [49.41670925034762]
Linear attention significantly reduces the computational complexity of Transformers from quadratic to linear, yet it consistently lags behind softmax-based attention in performance.<n>We propose MirrorLA, a geometric framework that substitutes passive truncation with active reorientation.<n>MirrorLA achieves state-of-the-art performance across standard benchmarks, demonstrating that strictly linear efficiency can be achieved without compromising representational fidelity.
arXiv Detail & Related papers (2026-02-04T09:14:09Z) - Fast & Efficient Normalizing Flows and Applications of Image Generative Models [0.0]
thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges.<n>The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, 2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-
arXiv Detail & Related papers (2025-12-03T18:29:03Z) - Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications [99.72917069918485]
We propose a novel sparse model inversion strategy to speed up existing dense inversion methods.<n>Specifically, we invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations.
arXiv Detail & Related papers (2025-10-31T05:14:36Z) - Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z) - SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion [0.0]
We present a novel framework for dynamic 3D scene reconstruction that integrates three key components.<n>An explicit tri-plane deformation field, a view-conditioned canonical field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior.<n>Our method encodes 4D scenes using three 2D feature planes that evolve over time, enabling efficient compact representation.
arXiv Detail & Related papers (2025-05-22T11:25:38Z) - PanopticSplatting: End-to-End Panoptic Gaussian Splatting [20.04251473153725]
We propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction.<n>Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association.<n>Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets.
arXiv Detail & Related papers (2025-03-23T13:45:39Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - Adaptive Multi-step Refinement Network for Robust Point Cloud Registration [82.64560249066734]
Point Cloud Registration estimates the relative rigid transformation between two point clouds of the same scene.<n>We propose an adaptive multi-step refinement network that refines the registration quality at each step by leveraging the information from the preceding step.<n>Our method achieves state-of-the-art performance on both the 3DMatch/3DLoMatch and KITTI benchmarks.
arXiv Detail & Related papers (2023-12-05T18:59:41Z) - Cross-domain Self-supervised Framework for Photoacoustic Computed
Tomography Image Reconstruction [4.769412124596113]
We propose a cross-domain unsupervised reconstruction (CDUR) strategy with a pure transformer model.
We implement a self-supervised reconstruction in a model-based form and leverage the self-supervision to enforce the measurement and image consistency.
Experimental results on in-vivo PACT dataset of mice demonstrate the potential of our unsupervised framework.
arXiv Detail & Related papers (2023-01-17T03:47:01Z) - S^2-Transformer for Mask-Aware Hyperspectral Image Reconstruction [59.39343894089959]
A snapshot compressive imager (CASSI) with Transformer reconstruction backend remarks high-fidelity sensing performance.<n> dominant spatial and spectral attention designs show limitations in hyperspectral modeling.<n>We propose a spatial-spectral (S2-) Transformer implemented by a paralleled attention design and a mask-aware learning strategy.
arXiv Detail & Related papers (2022-09-24T19:26:46Z) - Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View
Tomography [58.60249163402822]
Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations.
The proposed OMR is more robust and performs significantly better than the previous state-of-the-art OMR approach.
arXiv Detail & Related papers (2022-07-06T21:40:59Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.