SemanticAC: Semantics-Assisted Framework for Audio Classification
- URL: http://arxiv.org/abs/2302.05940v1
- Date: Sun, 12 Feb 2023 15:30:28 GMT
- Title: SemanticAC: Semantics-Assisted Framework for Audio Classification
- Authors: Yicheng Xiao and Yue Ma and Shuyan Li and Hantao Zhou and Ran Liao and
Xiu Li
- Abstract summary: We propose SemanticAC, a semantics-assisted framework for Audio Classification.
We employ a language model to extract abundant semantics from labels and optimize the semantic consistency between audio signals and their labels.
Our proposed method consistently outperforms the compared audio classification methods.
- Score: 13.622344835167997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose SemanticAC, a semantics-assisted framework for
Audio Classification to better leverage the semantic information. Unlike
conventional audio classification methods that treat class labels as discrete
vectors, we employ a language model to extract abundant semantics from labels
and optimize the semantic consistency between audio signals and their labels.
We verify that simple textual information from labels and advanced pretraining
models enable more abundant semantic supervision for better performance.
Specifically, we design a text encoder to capture the semantic information from
the text extension of labels. Then we map the audio signals to align with the
semantics of corresponding class labels via an audio encoder and a similarity
calculation module so as to enforce the semantic consistency. Extensive
experiments on two audio datasets, ESC-50 and US8K demonstrate that our
proposed method consistently outperforms the compared audio classification
methods.
Related papers
- CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer [18.87311136671246]
We propose CLAP-ART, an AAC method that utilizes semantic-rich and discrete tokens as input.<n>We experimentally confirmed that CLAP-ART outperforms baseline EnCLAP on two AAC benchmarks.
arXiv Detail & Related papers (2025-06-01T03:01:16Z) - Label-template based Few-Shot Text Classification with Contrastive Learning [7.964862748983985]
We propose a simple and effective few-shot text classification framework.
Label templates are embedded into input sentences to fully utilize the potential value of class labels.
supervised contrastive learning is utilized to model the interaction information between support samples and query samples.
arXiv Detail & Related papers (2024-12-13T12:51:50Z) - Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling [21.82879779173242]
The lack of labeled data is a common challenge in speech classification tasks.
We propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method.
We evaluate our SSL framework on emotion recognition and dementia detection tasks.
arXiv Detail & Related papers (2024-09-25T13:51:19Z) - Label-anticipated Event Disentanglement for Audio-Visual Video Parsing [61.08434062821899]
We introduce a new decoding paradigm, underlinelabel sunderlineemunderlineantic-based underlineprojection (LEAP)
LEAP works by iteratively projecting encoded latent features of audio/visual segments onto semantically independent label embeddings.
To facilitate the LEAP paradigm, we propose a semantic-aware optimization strategy, which includes a novel audio-visual semantic similarity loss function.
arXiv Detail & Related papers (2024-07-11T01:57:08Z) - Learning Speech Representation From Contrastive Token-Acoustic
Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space.
The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Unsupervised Improvement of Audio-Text Cross-Modal Representations [19.960695758478153]
We study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
We show that when domain-specific curation is used in conjunction with a soft-labeled contrastive loss, we are able to obtain significant improvement in terms of zero-shot classification performance.
arXiv Detail & Related papers (2023-05-03T02:30:46Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Zero-Shot Recognition through Image-Guided Semantic Classification [9.291055558504588]
We present a new embedding-based framework for zero-shot learning (ZSL)
Motivated by the binary relevance method for multi-label classification, we propose to inversely learn the mapping between an image and a semantic classifier.
IGSC is conceptually simple and can be realized by a slight enhancement of an existing deep architecture for classification.
arXiv Detail & Related papers (2020-07-23T06:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.