Related papers: Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting

Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting

URL: http://arxiv.org/abs/2410.08612v1
Date: Fri, 11 Oct 2024 08:27:25 GMT
Title: Synth-SONAR: Sonar Image Synthesis with Enhanced Diversity and Realism via Dual Diffusion Models and GPT Prompting
Authors: Purushothaman Natarajan, Kamal Basha, Athira Nambiar,
Abstract summary: This study proposes a new sonar image synthesis framework, Synth-SONAR. The key novelties are threefold: First, by integrating Generative AI-based style injection techniques along with publicly available real/simulated data. Second, a dual text-conditioning sonar diffusion model hierarchy synthesizes coarse and fine-grained sonar images with enhanced quality and diversity. Third, high-level (coarse) and low-level (detailed) text-based sonar generation methods leverage advanced semantic information available in visual language models (VLMs) and GPT-prompting.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sonar image synthesis is crucial for advancing applications in underwater exploration, marine biology, and defence. Traditional methods often rely on extensive and costly data collection using sonar sensors, jeopardizing data quality and diversity. To overcome these limitations, this study proposes a new sonar image synthesis framework, Synth-SONAR leveraging diffusion models and GPT prompting. The key novelties of Synth-SONAR are threefold: First, by integrating Generative AI-based style injection techniques along with publicly available real/simulated data, thereby producing one of the largest sonar data corpus for sonar research. Second, a dual text-conditioning sonar diffusion model hierarchy synthesizes coarse and fine-grained sonar images with enhanced quality and diversity. Third, high-level (coarse) and low-level (detailed) text-based sonar generation methods leverage advanced semantic information available in visual language models (VLMs) and GPT-prompting. During inference, the method generates diverse and realistic sonar images from textual prompts, bridging the gap between textual descriptions and sonar image generation. This marks the application of GPT-prompting in sonar imagery for the first time, to the best of our knowledge. Synth-SONAR achieves state-of-the-art results in producing high-quality synthetic sonar datasets, significantly enhancing their diversity and realism.

Related papers

LEGION: Learning to Ground and Explain for Synthetic Image Detection [49.958951540410816]
We introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. We propose LEGION, a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation.
arXiv Detail & Related papers (2025-03-19T14:37:21Z)
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models. Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z)
DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models [6.668241588219693]
Visual storytelling is increasingly desired beyond real-world imagery. Current techniques, which typically use autoregressive decoders, suffer from low inference speed and are not well-suited for synthetic scenes. We propose a novel diffusion-based system DiffuVST, which models a series of visual descriptions as a single conditional denoising process.
arXiv Detail & Related papers (2023-12-12T08:40:38Z)
Diversified in-domain synthesis with efficient fine-tuning for few-shot classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z)
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images [37.29348016920314]
We present a new framework leveraging off-the-shelf generative models to generate synthetic training images. We address class name ambiguity, lack of diversity in naive prompts, and domain shifts. Our framework consistently enhances recognition model performance with more synthetic data.
arXiv Detail & Related papers (2023-12-04T18:35:27Z)
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning [22.533225521726116]
malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images. We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
arXiv Detail & Related papers (2023-05-23T08:13:27Z)
Text-driven Visual Synthesis with Latent Diffusion Prior [37.736313030226654]
We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing.
arXiv Detail & Related papers (2023-02-16T18:59:58Z)
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis [54.39789900854696]
StyleGAN-T addresses the specific requirements of large-scale text-to-image synthesis. It significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed.
arXiv Detail & Related papers (2023-01-23T16:05:45Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators. We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output. Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.