Related papers: Analyzing the Nuances of Transformers' Polynomial Simplification Abilities

Analyzing the Nuances of Transformers' Polynomial Simplification Abilities

URL: http://arxiv.org/abs/2104.14095v1
Date: Thu, 29 Apr 2021 03:52:46 GMT
Title: Analyzing the Nuances of Transformers' Polynomial Simplification Abilities
Authors: Vishesh Agarwal, Somak Aditya, Navin Goyal
Abstract summary: We show that Transformers consistently struggle with numeric multiplication. We explore two ways to mitigate this: Learning Curriculum and a Symbolic Calculator approach. Both approaches provide significant gains over the vanilla Transformers-based baseline.
Score: 11.552059052724907
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials which are ordered in a lexicographic order. For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i.e., in the normal form) polynomial. We propose a synthetic Polynomial dataset generation algorithm that generates polynomials with unique proof steps. Through varying coefficient configurations, input representation, proof granularity, and extensive hyper-parameter tuning, we observe that Transformers consistently struggle with numeric multiplication. We explore two ways to mitigate this: Curriculum Learning and a Symbolic Calculator approach (where the numeric operations are offloaded to a calculator). Both approaches provide significant gains over the vanilla Transformers-based baseline.

Related papers

From Exponential to Polynomial Complexity: Efficient Permutation Counting with Subword Constraints [0.0]
Counting distinct permutations with replacement, especially when involving multiple subwords, is a longstanding challenge in analysis. This paper introduces a novel framework that presents closed-form formulas for calculating distinct permutations with replacement. We then extend our foundational formula to handle multiple subwords through the development of an additional formula.
arXiv Detail & Related papers (2024-11-23T19:52:11Z)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
In-Context Learning with Representations: Contextual Generalization of Trained Transformers [66.78052387054593]
In-context learning (ICL) refers to a capability of pretrained large language models, which can learn a new task given a few examples during inference. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks.
arXiv Detail & Related papers (2024-08-19T16:47:46Z)
Complementary polynomials in quantum signal processing [0.0]
To implement a given $P$, one must first construct a corresponding complementary $Q$. Existing approaches to this problem employ numerical methods that are not amenable to explicit error analysis. We present a new approach to complementarys using complex analysis.
arXiv Detail & Related papers (2024-06-06T16:47:11Z)
Sumformer: Universal Approximation for Efficient Transformers [2.4832703558223725]
We introduce Sumformer, a novel and simple architecture capable of universally approxingimating sequence-to-sequence functions. We derive a new proof for Transformers, showing that just one attention layer is sufficient for universal approximation.
arXiv Detail & Related papers (2023-07-05T13:59:35Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton. We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
Simplifying Polylogarithms with Machine Learning [0.0]
In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from Feynman integrals. We consider both a reinforcement learning approach, where the identities are analogous to moves in a game, and a transformer network approach, where the problem is viewed analogously to a language-translation task.
arXiv Detail & Related papers (2022-06-08T18:20:21Z)
Iterative Decoding for Compositional Generalization in Transformers [5.269770493488338]
In sequence-to-sequence learning, transformers are often unable to predict correct outputs for even marginally longer examples. This paper introduces iterative decoding, an alternative to seq2seq learning. We show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset.
arXiv Detail & Related papers (2021-10-08T14:52:25Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching [87.97265483696613]
We propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels. Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks.
arXiv Detail & Related papers (2020-01-20T20:02:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.