Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
- URL: http://arxiv.org/abs/2505.11918v1
- Date: Sat, 17 May 2025 09:02:18 GMT
- Title: Transformers as Unsupervised Learning Algorithms: A study on Gaussian Mixtures
- Authors: Zhiheng Chen, Ruofan Wu, Guanhua Fang,
- Abstract summary: This paper investigates the capabilities of transformers in solving unsupervised learning problems.<n>We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks.<n>We prove that transformers can approximate both the EM algorithm and a core component of spectral methods.
- Score: 10.970776446566909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The transformer architecture has demonstrated remarkable capabilities in modern artificial intelligence, among which the capability of implicitly learning an internal model during inference time is widely believed to play a key role in the under standing of pre-trained large language models. However, most recent works have been focusing on studying supervised learning topics such as in-context learning, leaving the field of unsupervised learning largely unexplored. This paper investigates the capabilities of transformers in solving Gaussian Mixture Models (GMMs), a fundamental unsupervised learning problem through the lens of statistical estimation. We propose a transformer-based learning framework called TGMM that simultaneously learns to solve multiple GMM tasks using a shared transformer backbone. The learned models are empirically demonstrated to effectively mitigate the limitations of classical methods such as Expectation-Maximization (EM) or spectral algorithms, at the same time exhibit reasonable robustness to distribution shifts. Theoretically, we prove that transformers can approximate both the EM algorithm and a core component of spectral methods (cubic tensor power iterations). These results bridge the gap between practical success and theoretical understanding, positioning transformers as versatile tools for unsupervised learning.
Related papers
- Transformers versus the EM Algorithm in Multi-class Clustering [18.828993597590856]
We study the learning guarantees of Transformers in performing multi-class clustering of the Gaussian Mixture Models.<n>Our theory provides approximation bounds for the Expectation and Maximization steps.<n>Our simulations empirically verified our theory by revealing the strong learning capacities of Transformers even beyond the assumptions in the theory.
arXiv Detail & Related papers (2025-02-09T19:51:58Z) - Learning Spectral Methods by Transformers [18.869174453242383]
We show that multi-layered Transformers, given a sufficiently large set of pre-training instances, are able to learn the algorithms themselves.<n>This learning paradigm is distinct from the in-context learning setup and is similar to the learning procedure of human brains.
arXiv Detail & Related papers (2025-01-02T15:53:25Z) - One-Layer Transformer Provably Learns One-Nearest Neighbor In Context [48.4979348643494]
We study the capability of one-layer transformers learning the one-nearest neighbor rule.
A single softmax attention layer can successfully learn to behave like a one-nearest neighbor.
arXiv Detail & Related papers (2024-11-16T16:12:42Z) - Learning Linear Attention in Polynomial Time [115.68795790532289]
We provide the first results on learnability of single-layer Transformers with linear attention.
We show that linear attention may be viewed as a linear predictor in a suitably defined RKHS.
We show how to efficiently identify training datasets for which every empirical riskr is equivalent to the linear Transformer.
arXiv Detail & Related papers (2024-10-14T02:41:01Z) - On Limitation of Transformer for Learning HMMs [31.128172929754058]
This paper investigates the performance of Transformers in learning Hidden Markov Models (HMMs)
We show that Transformers consistently underperform Recurrent Neural Networks (RNNs) in both training speed and testing accuracy across all tested HMM models.
Our experiments further reveal the relation between the depth of Transformers and the longest sequence length it can effectively learn, based on the types and the complexity of HMMs.
arXiv Detail & Related papers (2024-06-06T13:59:51Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z) - Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining [25.669038513039357]
This paper provides a theoretical framework that analyzes supervised pretraining for in-context reinforcement learning.
We show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms.
arXiv Detail & Related papers (2023-10-12T17:55:02Z) - Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems.
We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action.
We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Transformers as Algorithms: Generalization and Implicit Model Selection
in In-context Learning [23.677503557659705]
In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of examples and performs inference on-the-fly.
We treat the transformer model as a learning algorithm that can be specialized via training to implement-at inference-time-another target algorithm.
We show that transformers can act as an adaptive learning algorithm and perform model selection across different hypothesis classes.
arXiv Detail & Related papers (2023-01-17T18:31:12Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.