Related papers: LANS: A Layout-Aware Neural Solver for Plane Geometry Problem

LANS: A Layout-Aware Neural Solver for Plane Geometry Problem

URL: http://arxiv.org/abs/2311.16476v2
Date: Tue, 20 Feb 2024 03:35:46 GMT
Title: LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu
Abstract summary: We propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module and layout-aware fusion attention (LA-FA) Experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers.
Score: 43.192629815250285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement global relationship modeling, and point-match pre-training (PMP) to achieve alignment between visual points and textual points. LA-FA employs a layout-aware attention mask to realize point-guided cross-modal fusion for further boosting layout awareness of LANS. Extensive experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers. The code will be made public available soon.

Related papers

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration [57.95306827012784]
We propose GeoGen, a pipeline that can automatically generate step-wise reasoning paths for geometry diagrams. By leveraging the precise symbolic reasoning, textbfGeoGen produces large-scale, high-quality question-answer pairs. We train textbfGeoLogic, a Large Language Model (LLM), using synthetic data generated by GeoGen.
arXiv Detail & Related papers (2025-04-17T09:13:46Z)
Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z)
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information [25.13992124041851]
This paper presents Pi-GPS, a novel framework that unleashes the power of diagrammatic information to resolve textual ambiguities. We employ MLLMs to disambiguate text based on the diagrammatic context, while the verifier ensures the rectified output adherence to geometric rules. Empirical results demonstrate that Pi-GPS surpasses state-of-the-art models, achieving a nearly 10% improvement on theorem3K over prior neural-symbolic approaches.
arXiv Detail & Related papers (2025-03-07T16:15:00Z)
GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models [10.443672399225983]
Vision-parametric models (VLMs) have made significant progress in various multimodal tasks. They still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training. We present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library.
arXiv Detail & Related papers (2024-10-17T12:56:52Z)
Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram [78.79651421493058]
We propose a neural-symbolic model for plane geometry problem solving (PGPS) with three key steps: modal fusion, reasoning process and knowledge verification. For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively. We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge solvers.
arXiv Detail & Related papers (2024-07-10T02:45:22Z)
GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation [52.65506307440127]
We propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation. We employ V&L BERT to learn a cross-modal representation that incorporate both language and vision informations.
arXiv Detail & Related papers (2023-05-26T17:15:22Z)
A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram [33.62866585222121]
We propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation. We build a new large-scale and fine-annotated GPS dataset named PGPS9K.
arXiv Detail & Related papers (2023-02-22T02:38:25Z)
Multi-Resource Allocation for On-Device Distributed Federated Learning Systems [79.02994855744848]
This work poses a distributed multi-resource allocation scheme for minimizing the weighted sum of latency and energy consumption in the on-device distributed federated learning (FL) system. Each mobile device in the system engages the model training process within the specified area and allocates its computation and communication resources for deriving and uploading parameters, respectively.
arXiv Detail & Related papers (2022-11-01T14:16:05Z)
AMS-Net: Adaptive Multiscale Sparse Neural Network with Interpretable Basis Expansion for Multiphase Flow Problems [8.991619150027267]
We propose an adaptive sparse learning algorithm that can be applied to learn the physical processes and obtain a sparse representation of the solution given a large snapshot space. The information of the basis functions are incorporated in the loss function, which minimizes the differences between the downscaled reduced order solutions and reference solutions at multiple time steps. More numerical tests are performed on two-phase multiscale flow problems to show the capability and interpretability of the proposed method on complicated applications.
arXiv Detail & Related papers (2022-07-24T13:12:43Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence [48.67755344239951]
We provide a comprehensive survey, and propose a new taxonomy for localization and mapping using deep learning. A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping. It is our hope that this work can connect emerging works from robotics, computer vision and machine learning communities.
arXiv Detail & Related papers (2020-06-22T19:01:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.