Related papers: Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation

Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation

URL: http://arxiv.org/abs/2506.17409v1
Date: Fri, 20 Jun 2025 18:13:30 GMT
Title: Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation
Authors: Quoc Thinh Vo, Joe Woods, Priontu Chowdhury, David K. Han,
Abstract summary: Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment.<n>We propose a multi-branch network architecture designed to accurately predict the distance between a moving acoustic source and a receiver.<n>Our proposed method outperforms state-of-the-art (SOTA) approaches in similar settings.
Score: 8.017203108408973
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment. Factors such as high background noise, irregular underwater geometries, and varying acoustic properties make accurate localization difficult. To address these obstacles, we propose a multi-branch network architecture designed to accurately predict the distance between a moving acoustic source and a receiver, tested on real-world underwater signal arrays. The network leverages Convolutional Neural Networks (CNNs) for robust spatial feature extraction and integrates Conformers with self-attention mechanism to effectively capture temporal dependencies. Log-mel spectrogram and generalized cross-correlation with phase transform (GCC-PHAT) features are employed as input representations. To further enhance the model performance, we introduce an Adaptive Gain Control (AGC) layer, that adaptively adjusts the amplitude of input features, ensuring consistent energy levels across varying ranges, signal strengths, and noise conditions. We assess the model's generalization capability by training it in one domain and testing it in a different domain, using only a limited amount of data from the test domain for fine-tuning. Our proposed method outperforms state-of-the-art (SOTA) approaches in similar settings, establishing new benchmarks for underwater sound localization.

Related papers

Explainable Transformer-CNN Fusion for Noise-Robust Speech Emotion Recognition [2.0391237204597363]
Speech Emotion Recognition systems often degrade in performance when exposed to unpredictable acoustic interference.<n>We propose a Hybrid Transformer-CNN framework that unifies the contextual modeling of Wav2Vec 2.0 with the spectral stability of 1D-Convolutional Neural Networks.
arXiv Detail & Related papers (2025-12-20T10:05:58Z)
Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection [58.25418970608328]
Single-Domain Generalized Object Detection (Single-DGOD) aims to transfer a detector trained on one source domain to multiple unknown domains.<n>Existing methods for Single-DGOD typically rely on discrete data augmentation or static perturbation methods to expand data diversity.<n>We propose a new method, which simulates the progressive evolution of features from the source domain to simulated latent distributions.
arXiv Detail & Related papers (2025-11-13T03:10:39Z)
Ivan-ISTD: Rethinking Cross-domain Heteroscedastic Noise Perturbations in Infrared Small Target Detection [53.689841037081834]
Ivan-ISTD is designed to address the dual challenges of cross-domain shift and heteroscedastic noise perturbations in ISTD.<n>Ivan-ISTD demonstrates excellent robustness in cross-domain scenarios.
arXiv Detail & Related papers (2025-10-14T07:48:31Z)
Ecologically Valid Benchmarking and Adaptive Attention: Scalable Marine Bioacoustic Monitoring [2.558238597112103]
GetNetUPAM is a nested cross-validation framework to model stability under realistic variability.<n>Data are partitioned into distinct site-year segments, preserving recording and ensuring each validation fold reflects a unique environmental subset.<n>ARPA-N achieves a 14.4% gain in average precision over DenseNet baselines and a log2-scale order-of-magnitude drop in variability across all metrics.
arXiv Detail & Related papers (2025-09-04T22:03:05Z)
Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z)
TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments [3.3107717550009865]
TOAST (Task-Oriented Adaptive Semantic Transmission) is a unified framework designed to address the core challenge of multi-task optimization in wireless environments.<n>We formulate adaptive task balancing as a Markov decision process, employing deep reinforcement learning to dynamically adjust the trade-off between image reconstruction fidelity and semantic classification accuracy.<n>We integrate module-specific Low-Rank Adaptation (LoRA) mechanisms throughout our Swin Transformer-based joint source-channel coding architecture.
arXiv Detail & Related papers (2025-06-27T04:36:30Z)
DEMONet: Underwater Acoustic Target Recognition based on Multi-Expert Network and Cross-Temporal Variational Autoencoder [22.271499386492533]
Building a robust underwater acoustic recognition system in real-world scenarios is challenging due to the complex underwater environment. We propose DEMONet, which utilizes the detection of envelope modulation on noise (DEMON) to provide robust insights into the shaft frequency or blade counts of targets. To mitigate noise and spurious modulation spectra in DEMON features, we introduce a cross-temporal alignment strategy and employ a variational autoencoder (VAE) to reconstruct noise-resistant DEMON spectra to replace the raw DEMON features.
arXiv Detail & Related papers (2024-11-05T03:04:51Z)
DenoDet: Attention as Deformable Multi-Subspace Feature Denoising for Target Detection in SAR Images [20.11145540094807]
We propose a network aided by explicit frequency domain transform to calibrate convolutional biases and pay more attention to high-frequencies. We design TransDeno, a dynamic frequency domain attention module that performs as a transform domain soft thresholding operation. Our plug-and-play TransDeno sets state-of-the-art scores on multiple SAR target detection datasets.
arXiv Detail & Related papers (2024-06-05T01:05:26Z)
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection [40.532331552038485]
We present a novel Amplitude-Modulated Perturbation and Vortex Convolutional Network, AMSP-UOD. AMSP-UOD addresses the impact of non-ideal imaging factors on detection accuracy in complex underwater environments. Our method outperforms existing state-of-the-art methods in terms of accuracy and noise immunity.
arXiv Detail & Related papers (2023-08-23T05:03:45Z)
Histogram Layer Time Delay Neural Networks for Passive Sonar Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification. The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z)
Adaptive ship-radiated noise recognition with learnable fine-grained wavelet transform [25.887932248706218]
This work proposes an adaptive generalized recognition system - AGNet. By converting fixed wavelet parameters into fine-grained learnable parameters, AGNet learns the characteristics of underwater sound at different frequencies. Experiments reveal that our AGNet outperforms all baseline methods on several underwater acoustic datasets.
arXiv Detail & Related papers (2023-05-31T06:56:01Z)
PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation [67.41325356479229]
We propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix. In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient. This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner.
arXiv Detail & Related papers (2022-07-27T07:48:29Z)
AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks. Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo. Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z)
Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)
Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments. We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z)
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals. Two main challenges are the complex acoustic environment and the real-time processing requirement. We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.