Related papers: A Makefile for Developing Containerized LaTeX Technical Documents

Related papers

LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination [46.53643691093418]
We introduce MTTrans, a collaborative multi-agent system designed to translate structured-formatted documents.<n>Trans ensures format preservation, structural fidelity, and consistency through six specialized agents.
arXiv Detail & Related papers (2025-08-26T08:17:26Z)
TeXpert: A Multi-Level Benchmark for Evaluating LaTeX Code Generation by LLMs [0.0]
Large Language Models (LLMs) present a promising opportunity for researchers to produce publication-ready material.<n>Our benchmark dataset with natural language prompts for generating code focused on components of scientific documents.<n>Our evaluation across open and closed-source LLMs highlights multiple key findings.
arXiv Detail & Related papers (2025-06-20T13:39:16Z)
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [57.09163579304332]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories. PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files. We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
NeuRaLaTeX: A machine learning library written in pure LaTeX [15.978130916451295]
We introduce NeuRaLa, which we believe to be the first deep learning library written entirely in rhyme. As part of your document you can specify the architecture of a neural network and its loss functions. When the document is compiled, the compiler will generate or load training data, train the network, run experiments, and generate figures. The paper took 48 hours to compile and the entire source code for NeuRaLa is contained within the source code of the paper.
arXiv Detail & Related papers (2025-03-31T15:05:19Z)
Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion [20.44433450426808]
Docling is an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion. It can parse several types of popular document formats into a unified, richly structured representation. Docling is released as a Python package and can be used as a Python API or as a CLI tool.
arXiv Detail & Related papers (2025-01-27T19:40:00Z)
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks [57.589795399265945]
We introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We also introduce BigDocs-Bench, a benchmark suite with 10 novel tasks. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o.
arXiv Detail & Related papers (2024-12-05T21:41:20Z)
LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement [11.931911831112357]
LATTE improves the source extraction accuracy of both formulae and tables, outperforming existing techniques as well as GPT-4V. This paper proposes LATTE, the first iterative refinement framework for recognition.
arXiv Detail & Related papers (2024-09-21T17:18:49Z)
Towards Semantic Markup of Mathematical Documents via User Interaction [0.0]
We present an approach to semantic markup of formulas by (semi-)automatically generating grammars from existing s macro definitions and parsing formulas with them. We also present a GUI-based tool for the disambiguation of parse results and showcase its potential using a grammar for parsing untyped $lambda$-terms.
arXiv Detail & Related papers (2024-08-05T12:36:40Z)
RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation [79.83270415843857]
We introduce RepoAgent, a large language model powered open-source framework aimed at proactively generating, maintaining, and updating code documentation. We have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation.
arXiv Detail & Related papers (2024-02-26T15:39:52Z)
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond [17.853066545805554]
DocXChain is a powerful open-source toolchain for document parsing. It automatically converts the rich information embodied in unstructured documents, such as text, tables and charts, into structured representations.
arXiv Detail & Related papers (2023-10-19T02:49:09Z)
DocMAE: Document Image Rectification via Self-supervised Representation Learning [144.44748607192147]
We present DocMAE, a novel self-supervised framework for document image rectification. We first mask random patches of the background-excluded document images and then reconstruct the missing pixels. With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents.
arXiv Detail & Related papers (2023-04-20T14:27:15Z)
Augraphy: A Data Augmentation Library for Document Images [59.457999432618614]
Augraphy is a Python library for constructing data augmentation pipelines. It provides strategies to produce augmented versions of clean document images that appear to have been altered by standard office operations.
arXiv Detail & Related papers (2022-08-30T22:36:19Z)
DocCoder: Generating Code by Retrieving and Reading Docs [87.88474546826913]
We introduce DocCoder, an approach that explicitly leverages code manuals and documentation. Our approach is general, can be applied to any programming language, and is agnostic to the underlying neural model.
arXiv Detail & Related papers (2022-07-13T06:47:51Z)
You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source [11.472707084860875]
Academic trade requires juggling multiple variants of the same content published in different formats. We propose to significantly reduce this burden by maintaining a single source document in a version-controlled environment. We offer a proof-of-concept workflow that composes Jupyter Book (an online document), Jupyter Notebook (a computational narrative) and reveal.js slides from a single markdown source file.
arXiv Detail & Related papers (2021-07-02T21:02:09Z)
Named Tensor Notation [117.30373263410507]
We propose a notation for tensors with named axes. It relieves the author, reader, and future implementers from the burden of keeping track of the order of axes. It also makes it easy to extend operations on low-order tensors to higher order ones.
arXiv Detail & Related papers (2021-02-25T22:21:30Z)
Reproducible Science with LaTeX [4.09920839425892]
This paper proposes a procedure to execute external source codes from a document. It includes the calculation outputs in the resulting Portable Document Format (pdf) file automatically.
arXiv Detail & Related papers (2020-10-04T04:04:07Z)
DocBank: A Benchmark Dataset for Document Layout Analysis [114.81155155508083]
We present textbfDocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents.
arXiv Detail & Related papers (2020-06-01T16:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.