Related papers: An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift

An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift

URL: http://arxiv.org/abs/2509.11168v1
Date: Sun, 14 Sep 2025 09:01:52 GMT
Title: An Entropy-Guided Curriculum Learning Strategy for Data-Efficient Acoustic Scene Classification under Domain Shift
Authors: Peihong Zhang, Yuxuan Liu, Zhixin Li, Rui Sang, Yiqiang Cai, Yizhou Tan, Shengchen Li,
Abstract summary: Acoustic Scene Classification (ASC) faces challenges in generalizing across recording devices.<n>The DCASE 2024 Challenge Task 1 highlights this issue by requiring models to learn from small labeled subsets recorded on a few devices.<n>We propose an entropy-guided curriculum learning strategy to address the domain shift problem in data-efficient ASC.
Score: 12.42019711058722
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Acoustic Scene Classification (ASC) faces challenges in generalizing across recording devices, particularly when labeled data is limited. The DCASE 2024 Challenge Task 1 highlights this issue by requiring models to learn from small labeled subsets recorded on a few devices. These models need to then generalize to recordings from previously unseen devices under strict complexity constraints. While techniques such as data augmentation and the use of pre-trained models are well-established for improving model generalization, optimizing the training strategy represents a complementary yet less-explored path that introduces no additional architectural complexity or inference overhead. Among various training strategies, curriculum learning offers a promising paradigm by structuring the learning process from easier to harder examples. In this work, we propose an entropy-guided curriculum learning strategy to address the domain shift problem in data-efficient ASC. Specifically, we quantify the uncertainty of device domain predictions for each training sample by computing the Shannon entropy of the device posterior probabilities estimated by an auxiliary domain classifier. Using entropy as a proxy for domain invariance, the curriculum begins with high-entropy samples and gradually incorporates low-entropy, domain-specific ones to facilitate the learning of generalizable representations. Experimental results on multiple DCASE 2024 ASC baselines demonstrate that our strategy effectively mitigates domain shift, particularly under limited labeled data conditions. Our strategy is architecture-agnostic and introduces no additional inference cost, making it easily integrable into existing ASC baselines and offering a practical solution to domain shift.

Related papers

Feature-Space Planes Searcher: A Universal Domain Adaptation Framework for Interpretability and Computational Efficiency [7.889121135601528]
Current unsupervised domain adaptation methods rely on fine-tuning feature extractors.<n>We propose Feature-space Planes Searcher (FPS) as a novel domain adaptation framework.<n>We show that FPS achieves competitive or superior performance to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-26T05:39:21Z)
Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction [20.1863553357121]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies various dense prediction tasks and diverse semantic class predictions.
arXiv Detail & Related papers (2025-05-18T07:39:17Z)
A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior [11.859145373647474]
We present the first large-scale benchmarking study designed to provide guidelines for domain shift strategies in seismic interpretation.<n>Our benchmark spans over 200 combinations of model architectures, datasets and training strategies, across three datasets.<n>Our analysis shows that common fine-tuning practices can lead to catastrophic forgetting when source and target datasets are disjoint.
arXiv Detail & Related papers (2025-05-13T13:56:43Z)
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass [3.0566617373924325]
Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field.<n>We propose CSE-SFP, an innovative method that exploits the structural characteristics of generative models.<n>We show that CSE-SFP not only produces higher-quality embeddings but also significantly reduces both training time and memory consumption.
arXiv Detail & Related papers (2025-05-01T08:27:14Z)
Adapting In-Domain Few-Shot Segmentation to New Domains without Retraining [53.963279865355105]
Cross-domain few-shot segmentation (CD-FSS) aims to segment objects of novel classes in new domains.<n>Most CD-FSS methods redesign and retrain in-domain FSS models using various domain-generalization techniques.<n>We propose adapting informative model structures of the well-trained FSS model for target domains by learning domain characteristics from few-shot labeled support samples.
arXiv Detail & Related papers (2025-04-30T08:16:33Z)
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z)
DiffClass: Diffusion-Based Class Incremental Learning [30.514281721324853]
Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data. We propose a novel exemplar-free CIL method to overcome these issues.
arXiv Detail & Related papers (2024-03-08T03:34:18Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Unsupervised and self-adaptative techniques for cross-domain person re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task. Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation. In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z)
Model-Based Domain Generalization [96.84818110323518]
We propose a novel approach for the domain generalization problem called Model-Based Domain Generalization. Our algorithms beat the current state-of-the-art methods on the very-recently-proposed WILDS benchmark by up to 20 percentage points.
arXiv Detail & Related papers (2021-02-23T00:59:02Z)
Improving speech recognition models with small samples for air traffic control systems [9.322392779428505]
In this work, a novel training approach based on pretraining and transfer learning is proposed to address the issue of small training samples. Three real ATC datasets are used to validate the proposed ASR model and training strategies. The experimental results demonstrate that the ASR performance is significantly improved on all three datasets.
arXiv Detail & Related papers (2021-02-16T08:28:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.