LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
- URL: http://arxiv.org/abs/2311.16476v2
- Date: Tue, 20 Feb 2024 03:35:46 GMT
- Title: LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
- Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu
- Abstract summary: We propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module and layout-aware fusion attention (LA-FA)
Experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers.
- Score: 43.192629815250285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geometry problem solving (GPS) is a challenging mathematical reasoning task
requiring multi-modal understanding, fusion, and reasoning. Existing neural
solvers take GPS as a vision-language task but are short in the representation
of geometry diagrams that carry rich and complex layout information. In this
paper, we propose a layout-aware neural solver named LANS, integrated with two
new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and
layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic
pre-training (SSP) to implement global relationship modeling, and point-match
pre-training (PMP) to achieve alignment between visual points and textual
points. LA-FA employs a layout-aware attention mask to realize point-guided
cross-modal fusion for further boosting layout awareness of LANS. Extensive
experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the
layout-aware modules and superior problem-solving performance of our LANS
solver, over existing symbolic and neural solvers. The code will be made public
available soon.
Related papers
- GeoCoder: Solving Geometry Problems by Generating Modular Code through Vision-Language Models [10.443672399225983]
Vision-parametric models (VLMs) have made significant progress in various multimodal tasks.
They still struggle with geometry problems and are significantly limited by their inability to perform mathematical operations not seen during pre-training.
We present GeoCoder, which leverages modular code-finetuning to generate and execute code using a predefined geometry function library.
arXiv Detail & Related papers (2024-10-17T12:56:52Z) - Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram [78.79651421493058]
We propose a neural-symbolic model for plane geometry problem solving (PGPS) with three key steps: modal fusion, reasoning process and knowledge verification.
For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively.
We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge solvers.
arXiv Detail & Related papers (2024-07-10T02:45:22Z) - GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot
Attention for Vision-and-Language Navigation [52.65506307440127]
We propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation.
We employ V&L BERT to learn a cross-modal representation that incorporate both language and vision informations.
arXiv Detail & Related papers (2023-05-26T17:15:22Z) - A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from
Diagram [33.62866585222121]
We propose a new neural solver called PGPSNet to fuse multi-modal information efficiently.
PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation.
We build a new large-scale and fine-annotated GPS dataset named PGPS9K.
arXiv Detail & Related papers (2023-02-22T02:38:25Z) - Multi-Resource Allocation for On-Device Distributed Federated Learning
Systems [79.02994855744848]
This work poses a distributed multi-resource allocation scheme for minimizing the weighted sum of latency and energy consumption in the on-device distributed federated learning (FL) system.
Each mobile device in the system engages the model training process within the specified area and allocates its computation and communication resources for deriving and uploading parameters, respectively.
arXiv Detail & Related papers (2022-11-01T14:16:05Z) - AMS-Net: Adaptive Multiscale Sparse Neural Network with Interpretable
Basis Expansion for Multiphase Flow Problems [8.991619150027267]
We propose an adaptive sparse learning algorithm that can be applied to learn the physical processes and obtain a sparse representation of the solution given a large snapshot space.
The information of the basis functions are incorporated in the loss function, which minimizes the differences between the downscaled reduced order solutions and reference solutions at multiple time steps.
More numerical tests are performed on two-phase multiscale flow problems to show the capability and interpretability of the proposed method on complicated applications.
arXiv Detail & Related papers (2022-07-24T13:12:43Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - A Survey on Deep Learning for Localization and Mapping: Towards the Age
of Spatial Machine Intelligence [48.67755344239951]
We provide a comprehensive survey, and propose a new taxonomy for localization and mapping using deep learning.
A wide range of topics are covered, from learning odometry estimation, mapping, to global localization and simultaneous localization and mapping.
It is our hope that this work can connect emerging works from robotics, computer vision and machine learning communities.
arXiv Detail & Related papers (2020-06-22T19:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.