Related papers: Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

URL: http://arxiv.org/abs/2510.27186v1
Date: Fri, 31 Oct 2025 05:14:36 GMT
Title: Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Authors: Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Lei Li, Chun Yuan, Dacheng Tao,
Abstract summary: We propose a novel sparse model inversion strategy to speed up existing dense inversion methods.<n>Specifically, we invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations.
Score: 99.72917069918485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model inversion, which aims to reconstruct the original training data from pre-trained discriminative models, is especially useful when the original training data is unavailable due to privacy, usage rights, or size constraints. However, existing dense inversion methods attempt to reconstruct the entire image area, making them extremely inefficient when inverting high-resolution images from large-scale Vision Transformers (ViTs). We further identify two underlying causes of this inefficiency: the redundant inversion of noisy backgrounds and the unintended inversion of spurious correlations--a phenomenon we term "hallucination" in model inversion. To address these limitations, we propose a novel sparse model inversion strategy, as a plug-and-play extension to speed up existing dense inversion methods with no need for modifying their original loss functions. Specifically, we selectively invert semantic foregrounds while stopping the inversion of noisy backgrounds and potential spurious correlations. Through both theoretical and empirical studies, we validate the efficacy of our approach in achieving significant inversion acceleration (up to 3.79 faster) while maintaining comparable or even enhanced downstream performance in data-free model quantization and data-free knowledge transfer. Code is available at https://github.com/Egg-Hu/SMI.

Related papers

DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion [65.5172878666262]
Diffusion inversion is a challenging task due to the lack of viable supervision signals.<n>We propose a novel self-supervised diffusion inversion approach, termed Deep Inversion (DeepInv)<n>DeepInv is also equipped with an iterative and multi-scale training regime to train a parameterized inversion solver.
arXiv Detail & Related papers (2026-01-04T11:27:26Z)
Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers [6.7034293304862755]
Patch Rebirth Inversion (PRI) is a novel approach that incrementally detaches the most important patches during the inversion process.<n>PRI achieves up to 10x faster inversion than standard Dense Model Inversion.
arXiv Detail & Related papers (2025-09-27T10:35:44Z)
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations [41.87051958934507]
This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using rectified flow models (such as Flux) Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing.
arXiv Detail & Related papers (2024-10-14T17:56:24Z)
WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z)
Look-Around Before You Leap: High-Frequency Injected Transformer for Image Restoration [46.96362010335177]
In this paper, we propose HIT, a simple yet effective High-frequency Injected Transformer for image restoration. Specifically, we design a window-wise injection module (WIM), which incorporates abundant high-frequency details into the feature map, to provide reliable references for restoring high-quality images. In addition, we introduce a spatial enhancement unit (SEU) to preserve essential spatial relationships that may be lost due to the computations carried out across channel dimensions in the BIM.
arXiv Detail & Related papers (2024-03-30T08:05:00Z)
Multi-level Memory-augmented Appearance-Motion Correspondence Framework for Video Anomaly Detection [1.9511777443446219]
We propose a multi-level memory-augmented appearance-motion correspondence framework. The latent correspondence between appearance and motion is explored via appearance-motion semantics alignment and semantics replacement training. Our framework outperforms the state-of-the-art methods, achieving AUCs of 99.6%, 93.8%, and 76.3% on UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets.
arXiv Detail & Related papers (2023-03-09T08:43:06Z)
Curvature regularization for Non-line-of-sight Imaging from Under-sampled Data [5.591221518341613]
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight. We propose novel NLOS reconstruction models based on curvature regularization. We evaluate the proposed algorithms on both synthetic and real datasets.
arXiv Detail & Related papers (2023-01-01T14:10:43Z)
Dimensionality-Varying Diffusion Process [52.52681373641533]
Diffusion models learn to reverse a signal destruction process to generate new data. We make a theoretical generalization of the forward diffusion process via signal decomposition. We show that our strategy facilitates high-resolution image synthesis and improves FID of diffusion model trained on FFHQ at $1024times1024$ resolution from 52.40 to 10.46.
arXiv Detail & Related papers (2022-11-29T09:05:55Z)
Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold. We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples. We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z)
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms. Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z)
Designing Counterfactual Generators using Deep Model Inversion [31.1607056675927]
We develop a deep inversion approach to generate counterfactual explanations for a given query image. We find that, in addition to producing visually meaningful explanations, the counterfactuals from DISC are effective at learning decision boundaries and are robust to unknown test-time corruptions.
arXiv Detail & Related papers (2021-09-29T08:40:50Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.