Related papers: MMFormalizer: Multimodal Autoformalization in the Wild

MMFormalizer: Multimodal Autoformalization in the Wild

URL: http://arxiv.org/abs/2601.03017v1
Date: Tue, 06 Jan 2026 13:42:51 GMT
Title: MMFormalizer: Multimodal Autoformalization in the Wild
Authors: Jing Xiong, Qi Han, Yunta Hsieh, Hui Shen, Huajian Xin, Chaofan Tao, Chenyang Zhao, Hengyuan Zhang, Taiqiang Wu, Zhen Zhang, Haochen Wang, Zhongwei Wan, Lingpeng Kong, Ngai Wong,
Abstract summary: MMFormalizer extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains.<n>We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry.<n>Results show GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning.
Score: 79.24853896733154
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Autoformalization, which translates natural language mathematics into formal statements to enable machine reasoning, faces fundamental challenges in the wild due to the multimodal nature of the physical world, where physics requires inferring hidden constraints (e.g., mass or energy) from visual elements. To address this, we propose MMFormalizer, which extends autoformalization beyond text by integrating adaptive grounding with entities from real-world mathematical and physical domains. MMFormalizer recursively constructs formal propositions from perceptually grounded primitives through recursive grounding and axiom composition, with adaptive recursive termination ensuring that every abstraction is supported by visual evidence and anchored in dimensional or axiomatic grounding. We evaluate MMFormalizer on a new benchmark, PhyX-AF, comprising 115 curated samples from MathVerse, PhyX, Synthetic Geometry, and Analytic Geometry, covering diverse multimodal autoformalization tasks. Results show that frontier models such as GPT-5 and Gemini-3-Pro achieve the highest compile and semantic accuracy, with GPT-5 excelling in physical reasoning, while geometry remains the most challenging domain. Overall, MMFormalizer provides a scalable framework for unified multimodal autoformalization, bridging perception and formal reasoning. To the best of our knowledge, this is the first multimodal autoformalization method capable of handling classical mechanics (derived from the Hamiltonian), as well as relativity, quantum mechanics, and thermodynamics. More details are available on our project page: MMFormalizer.github.io

Related papers

Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds [0.0]
This study introduces the Intrinsic-Metric Physics-Informed Neural Network (IM-PINN)<n>It is a mesh-free geometric deep learning framework that solves partial differential equations directly in the continuous parametric domain.<n>The framework offers a memory-efficient, resolution-independent paradigm for simulating biological pattern formation on evolving surfaces.
arXiv Detail & Related papers (2025-12-26T12:41:05Z)
Learning Geometry: A Framework for Building Adaptive Manifold Models through Metric Optimization [8.201374511929538]
This paper proposes a novel paradigm for machine learning that moves beyond traditional parameter optimization.<n>We optimize the metric tensor field on a manifold with a predefined topology, thereby dynamically shaping the geometric structure of the model space.<n>This work lays a solid foundation for constructing fully dynamic "meta-learners" capable of autonomously evolving their geometry and topology.
arXiv Detail & Related papers (2025-10-30T01:53:32Z)
Automated Formalization via Conceptual Retrieval-Augmented LLMs [19.328918823576153]
CRAMF is a Concept-driven Retrieval-Augmented Mathematical Formalization framework.<n>We introduce a framework for automatically constructing a concept-definition knowledge base from Mathlib4.<n>Experiments on miniF2F, ProofNet, and our newly proposed AdvancedMath benchmark show that CRAMF can be seamlessly integrated into LLM-based autoformalizers.
arXiv Detail & Related papers (2025-08-09T10:54:25Z)
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem [11.81434494801394]
We formalize the task of Large Language Models (LLMs)-powered real-world mathematical modeling.<n>We propose MM-Agent, an expert-inspired framework that decomposes modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation.<n> MM-Agent significantly outperforms baseline agents, achieving an 11.88% improvement over human expert solutions.
arXiv Detail & Related papers (2025-05-20T09:55:31Z)
GeoMFormer: A General Architecture for Geometric Molecular Representation Learning [84.02083170392764]
We introduce a novel Transformer-based molecular model called GeoMFormer to achieve this goal. We show that GeoMFormer achieves strong performance on both invariant and equivariant tasks of different types and scales.
arXiv Detail & Related papers (2024-06-24T17:58:13Z)
Autoformalizing Euclidean Geometry [74.72212706513318]
We introduce a neuro-symbolic framework for autoformalizing Euclidean geometry. One challenge is that informal proofs rely on diagrams, leaving gaps in texts that are hard to formalize. We provide automatic semantic evaluation for autoformalized theorem statements.
arXiv Detail & Related papers (2024-05-27T14:35:10Z)
Tempered Calculus for ML: Application to Hyperbolic Model Embedding [70.61101116794549]
Most mathematical distortions used in ML are fundamentally integral in nature. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vsean scale.
arXiv Detail & Related papers (2024-02-06T17:21:06Z)
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models [68.65075559137608]
We propose TRIGO, an ATP benchmark that not only requires a model to reduce a trigonometric expression with step-by-step proofs but also evaluates a generative LM's reasoning ability on formulas. We gather trigonometric expressions and their reduced forms from the web, annotate the simplification process manually, and translate it into the Lean formal language system. We develop an automatic generator based on Lean-Gym to create dataset splits of varying difficulties and distributions in order to thoroughly analyze the model's generalization ability.
arXiv Detail & Related papers (2023-10-16T08:42:39Z)
Generative Language Modeling for Automated Theorem Proving [94.01137612934842]
This work is motivated by the possibility that a major limitation of automated theorem provers compared to humans might be addressable via generation from language models. We present an automated prover and proof assistant, GPT-f, for the Metamath formalization language, and analyze its performance.
arXiv Detail & Related papers (2020-09-07T19:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.