Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring
- URL: http://arxiv.org/abs/2510.03317v2
- Date: Fri, 24 Oct 2025 11:24:57 GMT
- Title: Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring
- Authors: Günel Aghakishiyeva, Jiayi Zhou, Saagar Arya, Julian Dale, James David Poling, Holly R. Houliston, Jamie N. Womble, Gregory D. Larsen, David W. Johnston, Brinnae Bent,
- Abstract summary: We present an inpainting-guided explanation technique that produces perturbation, mask-localized edits that preserve scene context.<n>We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery.<n>The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights.
- Score: 3.4574594310498266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.
Related papers
- Agentic Retoucher for Text-To-Image Generation [48.80766311858762]
Agentic Retoucher is a hierarchical decision-driven framework that reformulates post-generation correction as a human-like perception-reasoning-action loop.<n>This design integrates perceptual evidence, linguistic reasoning, and controllable correction into a unified, self-corrective decision process.<n>Experiments demonstrate that Agentic Retoucher consistently outperforms state-of-the-art methods in perceptual quality, distortion localization and human preference alignment.
arXiv Detail & Related papers (2026-01-05T12:06:43Z) - INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts [0.0]
Current forensic systems degrade sharply under real-world conditions.<n>Most detectors operate as opaques, offering little insight into why an image is flagged as synthetic.<n>We introduce INSIGHT, a unified framework for robust detection and transparent explanation of AI-generated images.
arXiv Detail & Related papers (2025-11-27T11:43:50Z) - On Thin Ice: Towards Explainable Conservation Monitoring via Attribution and Perturbations [3.4574594310498266]
We train a Faster R-CNN to detect harbor seals using aerial imagery from Glacier Bay National Park.<n>We assess explanations along three axes relevant to field use.<n>We translate these findings into actionable next steps for model development.
arXiv Detail & Related papers (2025-10-24T17:46:24Z) - Semantic-Aware Reconstruction Error for Detecting AI-Generated Images [22.83053631078616]
We propose a novel representation, namely Semantic-Aware Reconstruction Error (SARE), that measures the semantic difference between an image and its caption-guided reconstruction.<n>SARE provides a robust and discriminative feature for detecting fake images across diverse generative models.<n>We also introduce a fusion module that integrates SARE into the backbone detector via a cross-attention mechanism.
arXiv Detail & Related papers (2025-08-13T04:37:36Z) - Underwater Diffusion Attention Network with Contrastive Language-Image Joint Learning for Underwater Image Enhancement [0.8747606955991707]
UDAN-CLIP is an image-to-image diffusion framework pre-trained on synthetic underwater datasets.<n>It is enhanced with a customized classifier based on vision-language model, a spatial attention module, and a novel CLIP-Diffusion loss.<n>The proposed contributions empower our UDAN-CLIP model to perform more effective underwater image enhancement.
arXiv Detail & Related papers (2025-05-26T12:24:56Z) - Pseudo-Label Guided Real-World Image De-weathering: A Learning Framework with Imperfect Supervision [57.5699142476311]
We propose a unified solution for real-world image de-weathering with non-ideal supervision.<n>Our method exhibits significant advantages when trained on imperfectly aligned de-weathering datasets.
arXiv Detail & Related papers (2025-04-14T07:24:03Z) - Bridging Knowledge Gap Between Image Inpainting and Large-Area Visible Watermark Removal [57.84348166457113]
We introduce a novel feature adapting framework that leverages the representation capacity of a pre-trained image inpainting model.<n>Our approach bridges the knowledge gap between image inpainting and watermark removal by fusing information of the residual background content beneath watermarks into the inpainting backbone model.<n>For relieving the dependence on high-quality watermark masks, we introduce a new training paradigm by utilizing coarse watermark masks to guide the inference process.
arXiv Detail & Related papers (2025-04-07T02:37:14Z) - PIGUIQA: A Physical Imaging Guided Perceptual Framework for Underwater Image Quality Assessment [59.9103803198087]
We propose a Physical Imaging Guided perceptual framework for Underwater Image Quality Assessment (UIQA)<n>By leveraging underwater radiative transfer theory, we integrate physics-based imaging estimations to establish quantitative metrics for these distortions.<n>The proposed model accurately predicts image quality scores and achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-12-20T03:31:45Z) - LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space [0.0]
Masked image modeling is a self-supervised learning technique that predicts the feature representation of masked regions in an image.
We propose a novel approach that leverages the characteristics of MIM to detect logical anomalies effectively.
We evaluate the proposed method on the MVTecLOCO dataset, achieving an average AUC of 0.867.
arXiv Detail & Related papers (2024-10-14T07:50:56Z) - Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems [0.0]
This paper examines the challenges and advancements in recognizing seals within their natural habitats using conventional photography.
We used the leopard seal, emphHydrurga leptonyx, a key species within Antarctic ecosystems, to review the different available methods found.
The advent of machine learning, particularly through the application of vision transformers, heralds a new era of efficiency and precision in species monitoring.
arXiv Detail & Related papers (2024-08-14T03:35:11Z) - Semantic Segmentation for Fully Automated Macrofouling Analysis on
Coatings after Field Exposure [13.732577711665877]
Biofouling is a major challenge for sustainable shipping, filter membranes, heat exchangers, and medical devices.
Here we present an approach for automatic image-based macrofouling analysis.
arXiv Detail & Related papers (2022-11-21T16:03:16Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed
Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image.
It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated.
Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.