Can Transformers Do Enumerative Geometry?
- URL: http://arxiv.org/abs/2408.14915v2
- Date: Fri, 03 Jan 2025 14:21:20 GMT
- Title: Can Transformers Do Enumerative Geometry?
- Authors: Baran Hashemi, Roderic G. Corominas, Alessandro Giacchetto,
- Abstract summary: We introduce a Transformer-based approach to computational enumerative geometry.
We compute intersection numbers across a range from $10-45$ to $1045$.
We explore the enumerative "world-model" of Transformers.
- Score: 44.99833362998488
- License:
- Abstract: How can Transformers model and learn enumerative geometry? What is a robust procedure for using Transformers in abductive knowledge discovery within a mathematician-machine collaboration? In this work, we introduce a Transformer-based approach to computational enumerative geometry, specifically targeting the computation of $\psi$-class intersection numbers on the moduli space of curves. By reformulating the problem as a continuous optimization task, we compute intersection numbers across a wide value range from $10^{-45}$ to $10^{45}$. To capture the recursive nature inherent in these intersection numbers, we propose the Dynamic Range Activator (DRA), a new activation function that enhances the Transformer's ability to model recursive patterns and handle severe heteroscedasticity. Given precision requirements for computing the intersections, we quantify the uncertainty of the predictions using Conformal Prediction with a dynamic sliding window adaptive to the partitions of equivalent number of marked points. To the best of our knowledge, there has been no prior work on modeling recursive functions with such a high-variance and factorial growth. Beyond simply computing intersection numbers, we explore the enumerative "world-model" of Transformers. Our interpretability analysis reveals that the network is implicitly modeling the Virasoro constraints in a purely data-driven manner. Moreover, through abductive hypothesis testing, probing, and causal inference, we uncover evidence of an emergent internal representation of the the large-genus asymptotic of $\psi$-class intersection numbers. These findings suggest that the network internalizes the parameters of the asymptotic closed-form and the polynomiality phenomenon of $\psi$-class intersection numbers in a non-linear manner.
Related papers
- (How) Can Transformers Predict Pseudo-Random Numbers? [7.201095605457193]
We study the ability of Transformers to learn pseudo-random number sequences from linear congruential generators (LCGs)
Our analysis reveals that Transformers can perform in-context prediction of LCG sequences with unseen moduli ($m$) and parameters ($a,c$)
arXiv Detail & Related papers (2025-02-14T18:59:40Z) - Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape.
This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z) - Towards Understanding Inductive Bias in Transformers: A View From Infinity [9.00214539845063]
We argue transformers tend to be biased towards more permutation symmetric functions in sequence space.
We show that the representation theory of the symmetric group can be used to give quantitative analytical predictions.
We argue WikiText dataset, does indeed possess a degree of permutation symmetry.
arXiv Detail & Related papers (2024-02-07T19:00:01Z) - Transolver: A Fast Transformer Solver for PDEs on General Geometries [66.82060415622871]
We present Transolver, which learns intrinsic physical states hidden behind discretized geometries.
By calculating attention to physics-aware tokens encoded from slices, Transovler can effectively capture intricate physical correlations.
Transolver achieves consistent state-of-the-art with 22% relative gain across six standard benchmarks and also excels in large-scale industrial simulations.
arXiv Detail & Related papers (2024-02-04T06:37:38Z) - Efficient Nonparametric Tensor Decomposition for Binary and Count Data [27.02813234958821]
We propose ENTED, an underlineEfficient underlineNon underlineTEnsor underlineDecomposition for binary and count tensors.
arXiv Detail & Related papers (2024-01-15T14:27:03Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - Scalable Transformer for PDE Surrogate Modeling [9.438207505148947]
Transformer has emerged as a promising tool for surrogate modeling of partial differential equations (PDEs)
We propose Factorized Transformer (FactFormer), which is based on an axial factorized kernel integral.
We showcase that the proposed model is able to simulate 2D Kolmogorov flow on a $256times 256$ grid and 3D smoke buoyancy on a $64times64times64$ grid with good accuracy and efficiency.
arXiv Detail & Related papers (2023-05-27T19:23:00Z) - Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton.
We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$.
We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z) - $O(n)$ Connections are Expressive Enough: Universal Approximability of
Sparse Transformers [71.31712741938837]
We show that sparse Transformers with only $O(n)$ connections per attention layer can approximate the same function class as the dense model with $n2$ connections.
We also present experiments comparing different patterns/levels of sparsity on standard NLP tasks.
arXiv Detail & Related papers (2020-06-08T18:30:12Z) - Deep neural networks for inverse problems with pseudodifferential
operators: an application to limited-angle tomography [0.4110409960377149]
We propose a novel convolutional neural network (CNN) designed for learning pseudodifferential operators ($Psi$DOs) in the context of linear inverse problems.
We show that, under rather general assumptions on the forward operator, the unfolded iterations of ISTA can be interpreted as the successive layers of a CNN.
In particular, we prove that, in the case of LA-CT, the operations of upscaling, downscaling and convolution, can be exactly determined by combining the convolutional nature of the limited angle X-ray transform and basic properties defining a wavelet system.
arXiv Detail & Related papers (2020-06-02T14:03:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.