Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
- URL: http://arxiv.org/abs/2410.12861v1
- Date: Sat, 12 Oct 2024 18:58:45 GMT
- Title: Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
- Authors: Minhajur Rahman, Yasir Arafat,
- Abstract summary: We propose two novel mechanisms to enhance the attention mechanism of the original transformer to improve performance.
The first mechanism reduces the prioritization of intra-token relationships in the token similarity matrix during training, thereby increasing inter-token focus.
The second mechanism introduces a learnable temperature tuning for the token similarity matrix, mitigating the over-smoothing problem associated with fixed temperature values.
- Score: 0.0
- License:
- Abstract: Recent advancements in transformer models have yielded impressive results in Non-Intrusive Load Monitoring (NILM). However, effectively training a transformer on small-scale datasets remains a challenge. This paper addresses this issue by enhancing the attention mechanism of the original transformer to improve performance. We propose two novel mechanisms: the inter-token relation enhancement mechanism and the dynamic temperature tuning mechanism. The first mechanism reduces the prioritization of intra-token relationships in the token similarity matrix during training, thereby increasing inter-token focus. The second mechanism introduces a learnable temperature tuning for the token similarity matrix, mitigating the over-smoothing problem associated with fixed temperature values. Both mechanisms are supported by rigorous mathematical foundations. We evaluate our approach using the REDD residential NILM dataset, a relatively small-scale dataset and demonstrate that our methodology significantly enhances the performance of the original transformer model across multiple appliance types.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately.
Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail.
Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z) - PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture [46.266960248570086]
This study tackles the quadratic complexity of the self-attention mechanism by introducing a complexity local attention mechanism for effective feature aggregation.
We also introduce a parameter-free channel temperature adaptation mechanism that adaptively adjusts the attention weight distribution in each channel.
We show that PointMT achieves performance comparable to state-of-the-art methods while maintaining an optimal balance between performance and accuracy.
arXiv Detail & Related papers (2024-08-10T10:16:03Z) - Associative Transformer [26.967506484952214]
We propose Associative Transformer (AiT) to enhance the association among sparsely attended input patches.
AiT requires significantly fewer parameters and attention layers while outperforming Vision Transformers and a broad range of sparse Transformers.
arXiv Detail & Related papers (2023-09-22T13:37:10Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Automatic Rule Induction for Efficient Semi-Supervised Learning [56.91428251227253]
Semi-supervised learning has shown promise in allowing NLP models to generalize from small amounts of labeled data.
Pretrained transformer models act as black-box correlation engines that are difficult to explain and sometimes behave unreliably.
We propose tackling both of these challenges via Automatic Rule Induction (ARI), a simple and general-purpose framework.
arXiv Detail & Related papers (2022-05-18T16:50:20Z) - Universal Transformer Hawkes Process with Adaptive Recursive Iteration [4.624987488467739]
Asynchronous events sequences are widely distributed in the natural world and human activities, such as earthquakes records, users activities in social media and so on.
How to distill the information from these seemingly disorganized data is a persistent topic that researchers focus on.
The one of the most useful model is the point process model, and on the basis, the researchers obtain many noticeable results.
In recent years, point process models on the foundation of neural networks, especially recurrent neural networks (RNN) are proposed and compare with the traditional models, their performance are greatly improved.
arXiv Detail & Related papers (2021-12-29T09:55:12Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.