Advancing Test-Time Adaptation for Acoustic Foundation Models in
Open-World Shifts
- URL: http://arxiv.org/abs/2310.09505v1
- Date: Sat, 14 Oct 2023 06:22:08 GMT
- Title: Advancing Test-Time Adaptation for Acoustic Foundation Models in
Open-World Shifts
- Authors: Hongfu Liu, Hengguan Huang, Ye Wang
- Abstract summary: Test-Time Adaptation (TTA) is a critical paradigm for tackling distribution shifts during inference.
We introduce a learning-based adaptation enriched by confidence enhancement.
Our experiments on synthetic and real-world datasets affirm our method's superiority over existing baselines.
- Score: 29.28582280403953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-Time Adaptation (TTA) is a critical paradigm for tackling distribution
shifts during inference, especially in visual recognition tasks. However, while
acoustic models face similar challenges due to distribution shifts in test-time
speech, TTA techniques specifically designed for acoustic modeling in the
context of open-world data shifts remain scarce. This gap is further
exacerbated when considering the unique characteristics of acoustic foundation
models: 1) they are primarily built on transformer architectures with layer
normalization and 2) they deal with test-time speech data of varying lengths in
a non-stationary manner. These aspects make the direct application of
vision-focused TTA methods, which are mostly reliant on batch normalization and
assume independent samples, infeasible. In this paper, we delve into TTA for
pre-trained acoustic models facing open-world data shifts. We find that noisy,
high-entropy speech frames, often non-silent, carry key semantic content.
Traditional TTA methods might inadvertently filter out this information using
potentially flawed heuristics. In response, we introduce a heuristic-free,
learning-based adaptation enriched by confidence enhancement. Noting that
speech signals' short-term consistency, we also apply consistency
regularization during test-time optimization. Our experiments on synthetic and
real-world datasets affirm our method's superiority over existing baselines.
Related papers
- Unveiling and Mitigating Bias in Audio Visual Segmentation [9.427676046134374]
Community researchers have developed a range of advanced audio-visual segmentation models to improve the quality of sounding objects' masks.
While masks created by these models may initially appear plausible, they occasionally exhibit anomalies with incorrect grounding logic.
We attribute this to real-world inherent preferences and distributions as a simpler signal for learning than the complex audio-visual grounding.
arXiv Detail & Related papers (2024-07-23T16:55:04Z) - Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z) - Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-05-06T06:23:06Z) - Test-Time Domain Generalization for Face Anti-Spoofing [60.94384914275116]
Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks.
We introduce a novel Test-Time Domain Generalization framework for FAS, which leverages the testing data to boost the model's generalizability.
Our method, consisting of Test-Time Style Projection (TTSP) and Diverse Style Shifts Simulation (DSSS), effectively projects the unseen data to the seen domain space.
arXiv Detail & Related papers (2024-03-28T11:50:23Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation [16.85284386728494]
We propose to validate test-time adaptation methods using datasets for autonomous driving, namely CLAD-C and SHIFT.
We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift.
The proposed method, named AR-TTA, outperforms existing approaches on both synthetic and more real-world benchmarks.
arXiv Detail & Related papers (2023-09-18T19:34:23Z) - OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive
Learning [67.07363529640784]
We propose OpenSTL to categorize prevalent approaches into recurrent-based and recurrent-free models.
We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and forecasting weather.
We find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models.
arXiv Detail & Related papers (2023-06-20T03:02:14Z) - Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds [88.06545572893455]
In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent.
We first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators.
Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
arXiv Detail & Related papers (2023-06-05T13:43:29Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.