Attentive Max Feature Map for Acoustic Scene Classification with Joint
Learning considering the Abstraction of Classes
- URL: http://arxiv.org/abs/2104.07213v1
- Date: Thu, 15 Apr 2021 03:14:15 GMT
- Title: Attentive Max Feature Map for Acoustic Scene Classification with Joint
Learning considering the Abstraction of Classes
- Authors: Hye-jin Shim, Ju-ho Kim, Jee-weon Jung, Ha-Jin Yu
- Abstract summary: We propose a mechanism referred to as the attentive max feature map which combines two effective techniques, attention and max feature map, to elaborate the attention mechanism and mitigate the phenomenon.
Applying two proposed techniques, our proposed system achieves state-of-the-art performance among single systems on subtask A.
- Score: 48.6946385738143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The attention mechanism has been widely adopted in acoustic scene
classification. However, we find that during the process of attention
exclusively emphasizing information, it tends to excessively discard
information although improving the performance. We propose a mechanism referred
to as the attentive max feature map which combines two effective techniques,
attention and max feature map, to further elaborate the attention mechanism and
mitigate the abovementioned phenomenon. Furthermore, we explore various joint
learning methods that utilize additional labels originally generated for
subtask B (3-classes) on top of existing labels for subtask A (10-classes) of
the DCASE2020 challenge. We expect that using two kinds of labels
simultaneously would be helpful because the labels of the two subtasks differ
in their degree of abstraction. Applying two proposed techniques, our proposed
system achieves state-of-the-art performance among single systems on subtask A.
In addition, because the model has a complexity comparable to subtask B's
requirement, it shows the possibility of developing a system that fulfills the
requirements of both subtasks; generalization on multiple devices and
low-complexity.
Related papers
- Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - IntenDD: A Unified Contrastive Learning Approach for Intent Detection
and Discovery [12.905097743551774]
We propose IntenDD, a unified approach leveraging a shared utterance encoding backbone.
IntenDD uses an entirely unsupervised contrastive learning strategy for representation learning.
We find that our approach consistently outperforms competitive baselines across all three tasks.
arXiv Detail & Related papers (2023-10-25T16:50:24Z) - Slot Induction via Pre-trained Language Model Probing and Multi-level
Contrastive Learning [62.839109775887025]
Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations.
We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit unsupervised semantic knowledge extracted from PLM.
Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets.
arXiv Detail & Related papers (2023-08-09T05:08:57Z) - Joint Alignment of Multi-Task Feature and Label Spaces for Emotion Cause
Pair Extraction [36.123715709125015]
Emotion cause pair extraction (ECPE) is one of the derived subtasks of emotion cause analysis (ECA)
ECPE shares rich inter-related features with emotion extraction (EE) and cause extraction (CE)
arXiv Detail & Related papers (2022-09-09T04:06:27Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Towards Joint Intent Detection and Slot Filling via Higher-order
Attention [47.78365472691051]
Intent detection (ID) and Slot filling (SF) are two major tasks in spoken language understanding (SLU)
We propose a Bilinear attention block, which exploits both the contextual and channel-wise bilinear attention distributions.
We show that our approach yields improvements compared with the state-of-the-art approach.
arXiv Detail & Related papers (2021-09-18T09:50:23Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Multi-Label Few-Shot Learning for Aspect Category Detection [23.92900196246631]
Aspect category detection (ACD) in sentiment analysis aims to identify the aspect categories mentioned in a sentence.
Existing few-shot learning approaches mainly focus on single-label predictions.
We propose a multi-label few-shot learning method based on the prototypical network.
arXiv Detail & Related papers (2021-05-29T01:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.