PIDformer: Transformer Meets Control Theory
- URL: http://arxiv.org/abs/2402.15989v1
- Date: Sun, 25 Feb 2024 05:04:51 GMT
- Title: PIDformer: Transformer Meets Control Theory
- Authors: Tam Nguyen, C\'esar A. Uribe, Tan M. Nguyen, Richard G. Baraniuk
- Abstract summary: We unveil self-attention as an autonomous state-space model that inherently promotes smoothness in its solutions.
We incorporate a Proportional-Integral-Derivative (PID) closed-loop feedback control system with a reference point into the model to improve robustness and representation capacity.
Motivated by this control framework, we derive a novel class of transformers, PID-controlled Transformer (PIDformer)
- Score: 28.10913642120948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we address two main shortcomings of transformer architectures:
input corruption and rank collapse in their output representation. We unveil
self-attention as an autonomous state-space model that inherently promotes
smoothness in its solutions, leading to lower-rank outputs and diminished
representation capacity. Moreover, the steady-state solution of the model is
sensitive to input perturbations. We incorporate a
Proportional-Integral-Derivative (PID) closed-loop feedback control system with
a reference point into the model to improve robustness and representation
capacity. This integration aims to preserve high-frequency details while
bolstering model stability, rendering it more noise-resilient. The resulting
controlled state-space model is theoretically proven robust and adept at
addressing the rank collapse. Motivated by this control framework, we derive a
novel class of transformers, PID-controlled Transformer (PIDformer), aimed at
improving robustness and mitigating the rank-collapse issue inherent in softmax
transformers. We empirically evaluate the model for advantages and robustness
against baseline transformers across various practical tasks, including object
classification, image segmentation, and language modeling.
Related papers
- Function Approximation for Reinforcement Learning Controller for Energy from Spread Waves [69.9104427437916]
Multi-generator Wave Energy Converters (WEC) must handle multiple simultaneous waves coming from different directions called spread waves.
These complex devices need controllers with multiple objectives of energy capture efficiency, reduction of structural stress to limit maintenance, and proactive protection against high waves.
In this paper, we explore different function approximations for the policy and critic networks in modeling the sequential nature of the system dynamics.
arXiv Detail & Related papers (2024-04-17T02:04:10Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - How Crucial is Transformer in Decision Transformer? [29.228813063916206]
Decision Transformer (DT) is a recently proposed architecture for Reinforcement Learning that frames the decision-making process as an auto-regressive sequence modeling problem.
We analyze how crucial the Transformer model is in the complete DT architecture on continuous control tasks.
arXiv Detail & Related papers (2022-11-26T20:13:22Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Decision Transformer: Reinforcement Learning via Sequence Modeling [102.86873656751489]
We present a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem.
We present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Despite its simplicity, Decision Transformer matches or exceeds the performance of state-of-the-art offline RL baselines on Atari, OpenAI Gym, and Key-to-Door tasks.
arXiv Detail & Related papers (2021-06-02T17:53:39Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Multimodal VAE Active Inference Controller [0.0]
We present a novel active inference torque controller for industrial arms.
We include multimodal state representation learning using a linearly coupled multimodal variational autoencoder.
Results showed improved tracking and control in goal-directed reaching due to the increased representation power.
arXiv Detail & Related papers (2021-03-07T18:00:27Z) - Transformer-based Conditional Variational Autoencoder for Controllable
Story Generation [39.577220559911055]
We investigate large-scale latent variable models (LVMs) for neural story generation with objectives in two threads: generation effectiveness and controllability.
We advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers.
Specifically, we integrate latent representation vectors with a Transformer-based pre-trained architecture to build conditional variational autoencoder (CVAE)
arXiv Detail & Related papers (2021-01-04T08:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.