Symmetry-Aware Transformer Training for Automated Planning
- URL: http://arxiv.org/abs/2508.07743v1
- Date: Mon, 11 Aug 2025 08:23:34 GMT
- Title: Symmetry-Aware Transformer Training for Automated Planning
- Authors: Markus Fritzsche, Elliot Gestrin, Jendrik Seipp,
- Abstract summary: transformers excel in many settings, but their application in the field of automated planning is limited.<n>PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems.<n>We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias.
- Score: 6.206127662604578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While transformers excel in many settings, their application in the field of automated planning is limited. Prior work like PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems. This in turn stems from problem symmetries: planning tasks can be represented with arbitrary variable names that carry no meaning beyond being identifiers. This causes a combinatorial explosion of equivalent representations that pure transformers cannot efficiently learn from. We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias. Combining this with architectural improvements, we show that transformers can be efficiently trained for either plan-generation or heuristic-prediction. Our results across multiple planning domains demonstrate that our symmetry-aware training effectively and efficiently addresses the limitations of PlanGPT.
Related papers
- Attention-based Adversarial Robust Distillation in Radio Signal Classifications for Low-Power IoT Devices [28.874452850832213]
We show that transformer-based radio signal classification is vulnerable to imperceptible and carefully crafted attacks called adversarial examples.<n>We propose a defense system against adversarial examples in transformer-based modulation classifications.<n>New method is aimed at transferring the adversarial attention map from the robustly trained large transformer to a compact transformer.
arXiv Detail & Related papers (2025-06-13T15:39:01Z) - One-Layer Transformer Provably Learns One-Nearest Neighbor In Context [48.4979348643494]
We study the capability of one-layer transformers learning the one-nearest neighbor rule.
A single softmax attention layer can successfully learn to behave like a one-nearest neighbor.
arXiv Detail & Related papers (2024-11-16T16:12:42Z) - Understanding In-Context Learning of Linear Models in Transformers Through an Adversarial Lens [23.737606860443705]
In this work, we investigate the adversarial robustness of in-context learning in transformers to hijacking attacks.<n>We show that both linear transformers and transformers with GPT-2 architectures are vulnerable to such hijacking attacks.<n> adversarial robustness to such attacks can be significantly improved through adversarial training.
arXiv Detail & Related papers (2024-11-07T21:25:58Z) - Transformers to Predict the Applicability of Symbolic Integration Routines [0.0]
We consider how machine learning may be used to optimise this task in a Computer System.
We train transformers that predict whether a particular integration method will be successful, and compare against the existing human-made Algebras.
We find the transformer can outperform these guards, gaining up to 30% accuracy and 70% precision.
arXiv Detail & Related papers (2024-10-31T14:03:37Z) - Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference [53.419249906014194]
We study generative modeling for planning with datasets repurposed from offline reinforcement learning.
We introduce the Latent Plan Transformer (), a novel model that leverages a latent variable to connect a Transformer-based trajectory generator and the final return.
arXiv Detail & Related papers (2024-02-07T08:18:09Z) - When can transformers reason with abstract symbols? [25.63285482210457]
We prove that for any relational reasoning task in a large family of tasks, transformers learn the abstract relations and generalize to the test set.
This is in contrast to classical fully-connected networks, which we prove fail to learn to reason.
arXiv Detail & Related papers (2023-10-15T06:45:38Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - The Parallelism Tradeoff: Limitations of Log-Precision Transformers [29.716269397142973]
We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens can be simulated by constant-depth logspace-uniform threshold circuits.
This provides insight on the power of transformers using known results in complexity theory.
arXiv Detail & Related papers (2022-07-02T03:49:34Z) - Thinking Like Transformers [64.96770952820691]
We propose a computational model for the transformer-encoder in the form of a programming language.
We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer.
We provide RASP programs for histograms, sorting, and Dyck-languages.
arXiv Detail & Related papers (2021-06-13T13:04:46Z) - Motion Planning Transformers: One Model to Plan Them All [15.82728888674882]
We propose a transformer-based approach for efficiently solving the complex motion planning problems.
Our approach first identifies regions on the map using transformers to provide attention to map areas likely to include the best path, and then applies local planners to generate the final collision-free path.
arXiv Detail & Related papers (2021-06-05T04:29:16Z) - Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation.
We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z) - Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models.
We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.