NeuRaLaTeX: A machine learning library written in pure LaTeX
- URL: http://arxiv.org/abs/2503.24187v2
- Date: Wed, 02 Apr 2025 10:46:42 GMT
- Title: NeuRaLaTeX: A machine learning library written in pure LaTeX
- Authors: James A. D. Gardner, Will Rowan, William A. P. Smith,
- Abstract summary: We introduce NeuRaLa, which we believe to be the first deep learning library written entirely in rhyme.<n>As part of your document you can specify the architecture of a neural network and its loss functions.<n>When the document is compiled, the compiler will generate or load training data, train the network, run experiments, and generate figures.<n>The paper took 48 hours to compile and the entire source code for NeuRaLa is contained within the source code of the paper.
- Score: 15.978130916451295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce NeuRaLaTeX, which we believe to be the first deep learning library written entirely in LaTeX. As part of your LaTeX document you can specify the architecture of a neural network and its loss functions, define how to generate or load training data, and specify training hyperparameters and experiments. When the document is compiled, the LaTeX compiler will generate or load training data, train the network, run experiments, and generate figures. This paper generates a random 100 point spiral dataset, trains a two layer MLP on it, evaluates on a different random spiral dataset, produces plots and tables of results. The paper took 48 hours to compile and the entire source code for NeuRaLaTeX is contained within the source code of the paper. We propose two new metrics: the Written In Latex (WIL) metric measures the proportion of a machine learning library that is written in pure LaTeX, while the Source Code Of Method in Source Code of Paper (SCOMISCOP) metric measures the proportion of a paper's implementation that is contained within the paper source. We are state-of-the-art for both metrics, outperforming the ResNet and Transformer papers, as well as the PyTorch and Tensorflow libraries. Source code, documentation, videos, crypto scams and an invitation to invest in the commercialisation of NeuRaLaTeX are available at https://www.neuralatex.com
Related papers
- Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.
PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.
We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z) - LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement [11.931911831112357]
The source and rendered PDF images look drastically different, especially for formulae and tables.<n>Prior work generates sources in a single iteration and struggles with complex formulae.<n>This paper proposes LATTE, the first iterative refinement framework for recognition.
arXiv Detail & Related papers (2024-09-21T17:18:49Z) - TeXBLEU: Automatic Metric for Evaluate LaTeX Format [4.337656290539519]
We propose BLEU, a metric for evaluating mathematical expressions in the format built on the n-gram-based BLEU metric.
The proposed BLEU consists of a tokenizer trained on the arXiv paper dataset and a fine-tuned embedding model with positional encoding.
arXiv Detail & Related papers (2024-09-10T16:54:32Z) - Transformer In-Context Learning for Categorical Data [51.23121284812406]
We extend research on understanding Transformers through the lens of in-context learning with functional data by considering categorical outcomes, nonlinear underlying models, and nonlinear attention.
We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset.
arXiv Detail & Related papers (2024-05-27T15:03:21Z) - MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition [2.325171167252542]
We present an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one.
Second, we introduce the real-world dataset realFormula, with MEs extracted from papers.
Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets.
arXiv Detail & Related papers (2024-04-21T14:03:34Z) - TopoX: A Suite of Python Packages for Machine Learning on Topological Domains [89.43928198132942]
TopoX is a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains.<n>TopoX consists of three packages: TopoNetX, TopoEmbedX and TopoModelX.
arXiv Detail & Related papers (2024-02-04T10:41:40Z) - MathPile: A Billion-Token-Scale Pretraining Corpus for Math [45.163340937419214]
We introduce MathPile, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens.
Our meticulous data collection and processing efforts included a complex suite of preprocessing.
We aim for our MathPile to boost language models' mathematical reasoning abilities and open-source its different versions and processing scripts to advance the field.
arXiv Detail & Related papers (2023-12-28T16:55:40Z) - DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation.
Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z) - Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$ [118.04625413322827]
$texttt5x$ and $texttseqio$ are open source software libraries for building and training language models.
These libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data.
arXiv Detail & Related papers (2022-03-31T17:12:13Z) - Efficient Graph Deep Learning in TensorFlow with tf_geometric [53.237754811019464]
We introduce tf_geometric, an efficient and friendly library for graph deep learning.
tf_geometric provides kernel libraries for building Graph Neural Networks (GNNs) as well as implementations of popular GNNs.
The kernel libraries consist of infrastructures for building efficient GNNs, including graph data structures, graph map-reduce framework, graph mini-batch strategy, etc.
arXiv Detail & Related papers (2021-01-27T17:16:36Z) - Machine Translation of Mathematical Text [0.0]
We have implemented a machine translation system, the PolyMath Translator, for documents containing mathematical text.
The current implementation translates English to French, attaining a BLEU score of 53.5 on a held-out test corpus of mathematical sentences.
It produces documents that can be compiled to PDF without further editing.
arXiv Detail & Related papers (2020-10-11T11:59:40Z) - Reproducible Science with LaTeX [4.09920839425892]
This paper proposes a procedure to execute external source codes from a document.
It includes the calculation outputs in the resulting Portable Document Format (pdf) file automatically.
arXiv Detail & Related papers (2020-10-04T04:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.