EfficientIML: Efficient High-Resolution Image Manipulation Localization
- URL: http://arxiv.org/abs/2509.08583v1
- Date: Wed, 10 Sep 2025 13:32:02 GMT
- Title: EfficientIML: Efficient High-Resolution Image Manipulation Localization
- Authors: Jinhan Li, Haoyang He, Lei Xie, Jiangning Zhang,
- Abstract summary: We propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks.<n>We propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone.<n>Our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed.
- Score: 38.432078329653926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With imaging devices delivering ever-higher resolutions and the emerging diffusion-based forgery methods, current detectors trained only on traditional datasets (with splicing, copy-moving and object removal forgeries) lack exposure to this new manipulation type. To address this, we propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks. However, this also imposes a challenge on existing methods, as they face significant computational resource constraints due to their prohibitive computational complexities. Therefore, we propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone. EfficientRWKV's hybrid state-space and attention network captures global context and local details in parallel, while a multi-scale supervision strategy enforces consistency across hierarchical predictions. Extensive evaluations on our dataset and standard benchmarks demonstrate that our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed, underscoring its suitability for real-time forensic applications.
Related papers
- EUGens: Efficient, Unified, and General Dense Layers [56.498769704575544]
We propose a new class of dense layers that generalize standard fully-connected feedforward layers, textbfEfficient, textbfUnimat and textbfGeneral dense layers (EUGens)<n>EUGens leverage random features to approximate standard FFLs and go beyond them by incorporating a direct dependence on the input norms in their computations.
arXiv Detail & Related papers (2026-01-30T05:01:03Z) - From Passive Perception to Active Memory: A Weakly Supervised Image Manipulation Localization Framework Driven by Coarse-Grained Annotations [14.0185129202898]
BoxPromptIML is a novel weakly-supervised IML framework that balances annotation cost and localization performance.<n>Inspired by the human subconscious memory mechanism, our feature fusion module employs a dual-guidance strategy that actively contextualizes recalled patterns with real-time observational cues.
arXiv Detail & Related papers (2025-11-25T14:39:17Z) - Modest-Align: Data-Efficient Alignment for Vision-Language Models [67.48633659305592]
Cross-modal alignment models often suffer from overconfidence and degraded performance when operating in resource-constrained settings.<n>We propose Modest-Align, a lightweight alignment framework designed for robustness and efficiency.<n>Our method offers a practical and scalable solution for cross-modal alignment in real-world, low-resource scenarios.
arXiv Detail & Related papers (2025-10-24T16:11:10Z) - UGD-IML: A Unified Generative Diffusion-based Framework for Constrained and Unconstrained Image Manipulation Localization [19.797719494981923]
We propose a novel generative framework based on diffusion models, named UGD-IML, which unifies both IML and CIML tasks within a single framework.<n>We show that UGD-IML outperforms the SOTA methods by an average of 9.66 and 4.36 in terms of F1 metrics for IML and CIML tasks, respectively.
arXiv Detail & Related papers (2025-08-08T08:00:28Z) - Regularizing Subspace Redundancy of Low-Rank Adaptation [54.473090597164834]
We propose ReSoRA, a method that explicitly models redundancy between mapping subspaces and adaptively Regularizes Subspace redundancy of Low-Rank Adaptation.<n>Our proposed method consistently facilitates existing state-of-the-art PETL methods across various backbones and datasets in vision-language retrieval and standard visual classification benchmarks.<n>As a training supervision, ReSoRA can be seamlessly integrated into existing approaches in a plug-and-play manner, with no additional inference costs.
arXiv Detail & Related papers (2025-07-28T11:52:56Z) - AFLoRA: Adaptive Federated Fine-Tuning of Large Language Models with Resource-Aware Low-Rank Adaption [3.805501490912696]
Federated fine-tuning has emerged as a promising approach to adapt foundation models to downstream tasks using decentralized data.<n>We propose AFLoRA, an adaptive and lightweight federated fine-tuning framework for Large Language Models.
arXiv Detail & Related papers (2025-05-30T16:35:32Z) - RefiDiff: Refinement-Aware Diffusion for Efficient Missing Data Imputation [13.401822039640297]
Missing values in high-dimensional, mixed-type datasets pose significant challenges for data imputation.<n>We propose an innovative framework, RefiDiff, combining local machine learning predictions with a novel Mamba-based denoising network.<n>RefiDiff outperforms state-the-art (SOTA) methods across missing-value settings with a 4x faster training time than DDPM-based approaches.
arXiv Detail & Related papers (2025-05-20T14:51:07Z) - PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection [68.8373788348678]
Visual instruction tuning adapts pre-trained Multimodal Large Language Models to follow human instructions.<n>PRISM is the first training-free framework for efficient visual instruction selection.<n>It reduces the end-to-end time for data selection and model tuning to just 30% of conventional pipelines.
arXiv Detail & Related papers (2025-02-17T18:43:41Z) - Low-Light Image Enhancement via Generative Perceptual Priors [75.01646333310073]
We introduce a novel textbfLLIE framework with the guidance of vision-language models (VLMs)<n>We first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors.<n>To incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (textittextbfLPP-Attn) guided by global and local perceptual priors.
arXiv Detail & Related papers (2024-12-30T12:51:52Z) - Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.