Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
- URL: http://arxiv.org/abs/2410.12861v2
- Date: Fri, 06 Dec 2024 19:24:54 GMT
- Title: Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
- Authors: Minhajur Rahman, Yasir Arafat,
- Abstract summary: We propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning.
We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15% in F1 score across various appliance types.
- Score: 0.0
- License:
- Abstract: Transformers have demonstrated exceptional performance across various domains due to their self-attention mechanism, which captures complex relationships in data. However, training on smaller datasets poses challenges, as standard attention mechanisms can over-smooth attention scores and overly prioritize intra-token relationships, reducing the capture of meaningful inter-token dependencies critical for tasks like Non-Intrusive Load Monitoring (NILM). To address this, we propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning. The inter-token relation enhancement mechanism removes diagonal entries in the similarity matrix to improve attention focus on inter-token relations. The dynamic temperature tuning mechanism, a learnable parameter, adapts attention sharpness during training, preventing over-smoothing and enhancing sensitivity to token relationships. We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15\% in F1 score across various appliance types, demonstrating its efficacy for training on smaller datasets.
Related papers
- CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction [77.8576094863446]
We propose a new detextbfCoupled dutextbfAl-interactive lineatextbfR atttextbfEntion (CARE) mechanism.
We first propose an asymmetrical feature decoupling strategy that asymmetrically decouples the learning process for local inductive bias and long-range dependencies.
By adopting a decoupled learning way and fully exploiting complementarity across features, our method can achieve both high efficiency and accuracy.
arXiv Detail & Related papers (2024-11-25T07:56:13Z) - Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail.
Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Computation and Parameter Efficient Multi-Modal Fusion Transformer for
Cued Speech Recognition [48.84506301960988]
Cued Speech (CS) is a pure visual coding method used by hearing-impaired people.
automatic CS recognition (ACSR) seeks to transcribe visual cues of speech into text.
arXiv Detail & Related papers (2024-01-31T05:20:29Z) - Correlated Attention in Transformers for Multivariate Time Series [22.542109523780333]
We propose a novel correlated attention mechanism, which efficiently captures feature-wise dependencies, and can be seamlessly integrated within the encoder blocks of existing Transformers.
In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level.
This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation.
arXiv Detail & Related papers (2023-11-20T17:35:44Z) - Associative Transformer [26.967506484952214]
We propose Associative Transformer (AiT) to enhance the association among sparsely attended input patches.
AiT requires significantly fewer parameters and attention layers while outperforming Vision Transformers and a broad range of sparse Transformers.
arXiv Detail & Related papers (2023-09-22T13:37:10Z) - DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition.
DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z) - Miti-DETR: Object Detection based on Transformers with Mitigatory
Self-Attention Convergence [17.854940064699985]
We propose a transformer architecture with a mitigatory self-attention mechanism.
Miti-DETR reserves the inputs of each single attention layer to the outputs of that layer so that the "non-attention" information has participated in attention propagation.
Miti-DETR significantly enhances the average detection precision and convergence speed towards existing DETR-based models.
arXiv Detail & Related papers (2021-12-26T03:23:59Z) - Relational Self-Attention: What's Missing in Attention for Video
Understanding [52.38780998425556]
We introduce a relational feature transform, dubbed the relational self-attention (RSA)
Our experiments and ablation studies show that the RSA network substantially outperforms convolution and self-attention counterparts.
arXiv Detail & Related papers (2021-11-02T15:36:11Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.