OSCAR: Optical-aware Semantic Control for Aleatoric Refinement in Sar-to-Optical Translation
- URL: http://arxiv.org/abs/2601.06835v1
- Date: Sun, 11 Jan 2026 09:57:04 GMT
- Title: OSCAR: Optical-aware Semantic Control for Aleatoric Refinement in Sar-to-Optical Translation
- Authors: Hyunseo Lee, Sang Min Kim, Ho Kyung Shin, Taeheon Kim, Woo-Jeoung Nam,
- Abstract summary: A novel SAR-to-Optical (S2O) translation framework is proposed, integrating three core technical contributions.<n>Experiments demonstrate that the proposed method achieves superior perceptual quality and semantic consistency compared to state-of-the-art approaches.
- Score: 12.055938312320402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic Aperture Radar (SAR) provides robust all-weather imaging capabilities; however, translating SAR observations into photo-realistic optical images remains a fundamentally ill-posed problem. Current approaches are often hindered by the inherent speckle noise and geometric distortions of SAR data, which frequently result in semantic misinterpretation, ambiguous texture synthesis, and structural hallucinations. To address these limitations, a novel SAR-to-Optical (S2O) translation framework is proposed, integrating three core technical contributions: (i) Cross-Modal Semantic Alignment, which establishes an Optical-Aware SAR Encoder by distilling robust semantic priors from an Optical Teacher into a SAR Student (ii) Semantically-Grounded Generative Guidance, realized by a Semantically-Grounded ControlNet that integrates class-aware text prompts for global context with hierarchical visual prompts for local spatial guidance; and (iii) an Uncertainty-Aware Objective, which explicitly models aleatoric uncertainty to dynamically modulate the reconstruction focus, effectively mitigating artifacts caused by speckle-induced ambiguity. Extensive experiments demonstrate that the proposed method achieves superior perceptual quality and semantic consistency compared to state-of-the-art approaches.
Related papers
- ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting [63.138778159026934]
We propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO.<n> ERGO dynamically estimates the view-specific excess risk and adaptively adjust loss weights during optimization.<n>Experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.
arXiv Detail & Related papers (2026-02-10T20:44:43Z) - SynMind: Reducing Semantic Hallucination in fMRI-Based Image Reconstruction [52.34513874272676]
We argue that existing methods rely too heavily on entangled visual embeddings over explicit semantic identity.<n>We parse fMRI signals into rich, sentence-level semantic descriptions that mirror the hierarchical and compositional nature of human visual understanding.<n>We propose SynMind, a framework that integrates these explicit semantic encodings with visual priors to condition a pretrained diffusion model.
arXiv Detail & Related papers (2026-01-25T14:31:23Z) - Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding [54.05243949024302]
Existing robust MLLMs rely on implicit training/adaptation that focuses solely on visual encoder generalization.<n>We propose Robust-R1, a novel framework that explicitly models visual degradations through structured reasoning chains.<n>Our approach integrates: (i) supervised fine-tuning for degradation-aware reasoning foundations, (ii) reward-driven alignment for accurately perceiving degradation parameters, and (iii) dynamic reasoning depth scaling adapted to degradation intensity.
arXiv Detail & Related papers (2025-12-19T12:56:17Z) - INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts [0.0]
Current forensic systems degrade sharply under real-world conditions.<n>Most detectors operate as opaques, offering little insight into why an image is flagged as synthetic.<n>We introduce INSIGHT, a unified framework for robust detection and transparent explanation of AI-generated images.
arXiv Detail & Related papers (2025-11-27T11:43:50Z) - SRSR: Enhancing Semantic Accuracy in Real-World Image Super-Resolution with Spatially Re-Focused Text-Conditioning [59.013863248600046]
We propose a spatially re-focused super-resolution framework that refines text conditioning at inference time.<n>Second, we introduce a Spatially Targeted-Free Guidance mechanism that selectively bypasses text influences on ungrounded pixels to prevent hallucinations.
arXiv Detail & Related papers (2025-10-26T05:03:55Z) - Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z) - SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion [52.959716866316604]
Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems.<n>We propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC.<n>SPHERE integrates voxel and Gaussian representations for joint exploitation of semantic and physical information.
arXiv Detail & Related papers (2025-09-14T09:07:41Z) - Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z) - Annotation-Free Open-Vocabulary Segmentation for Remote-Sensing Images [51.74614065919118]
This paper introduces SegEarth-OV, the first framework for annotation-free open-vocabulary segmentation of RS images.<n>We propose SimFeatUp, a universal upsampler that robustly restores high-resolution spatial details from coarse features.<n>We also present a simple yet effective Global Bias Alleviation operation to subtract the inherent global context from patch features.
arXiv Detail & Related papers (2025-08-25T14:22:57Z) - CLUE: Leveraging Low-Rank Adaptation to Capture Latent Uncovered Evidence for Image Forgery Localization [35.73353140683283]
Increasing accessibility of image editing tools and generative AI has led to a proliferation of visually convincing forgeries.<n>In this paper, we repurpose the mechanism of a state-of-the-art (SOTA) text-to-image synthesis model by exploiting its internal generative process.<n>We propose CLUE, a framework that employs Low- Rank Adaptation (LoRA) to parameter-efficiently reconfigure Stable Diffusion 3 (SD3) as a forensic feature extractor.
arXiv Detail & Related papers (2025-08-10T16:22:30Z) - Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection [30.77558600436759]
ARAS is a language-conditioned, auto-regressive anomaly synthesis approach.<n>It injects local, text-specified defects into normal images via token-anchored latent editing.<n>It significantly enhances defect realism, preserves fine-grained material textures, and provides continuous semantic control over synthesized anomalies.
arXiv Detail & Related papers (2025-08-05T15:07:32Z) - Unpaired Object-Level SAR-to-Optical Image Translation for Aircraft with Keypoints-Guided Diffusion Models [4.6570959687411975]
Translating SAR images into optical images is a promising solution to enhance interpretation and support downstream tasks.<n>This study proposes a keypoint-guided diffusion model (KeypointDiff) for SAR-to-optical image translation of unpaired aircraft targets.
arXiv Detail & Related papers (2025-03-25T16:05:49Z) - Generative Adversarial Networks for Synthesizing InSAR Patches [15.260123615399035]
Generative Adversarial Networks (GANs) have been employed with certain success for image translation tasks between optical and real-valued SAR intensity imagery.
The synthesis of artificial complex-valued InSAR image stacks asks for, besides good perceptual quality, more stringent quality metrics like phase noise and phase coherence.
This paper provides a signal processing model of generative CNN structures, describes effects influencing those quality metrics and presents a mapping scheme of complex-valued data to given CNN structures.
arXiv Detail & Related papers (2020-08-03T20:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.