Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning
- URL: http://arxiv.org/abs/2506.16931v1
- Date: Fri, 20 Jun 2025 11:51:52 GMT
- Title: Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning
- Authors: Jiaqi Chen, Mingfeng Fan, Xuefeng Zhang, Jingsong Liang, Yuhong Cao, Guohua Wu, Guillaume Adrien Sartoretti,
- Abstract summary: We propose a Multimodal Fused Learning framework to solve the Generalized Traveling Salesman Problem (GTSP)<n>We first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations.<n>We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module.
- Score: 11.697279328699489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.
Related papers
- T-Rex: Task-Adaptive Spatial Representation Extraction for Robotic Manipulation with Vision-Language Models [35.83717913117858]
We introduce T-Rex, a Task-Adaptive Framework for Spatial Representation Extraction.<n>We show that our approach delivers significant advantages in spatial understanding, efficiency, and stability without additional training.
arXiv Detail & Related papers (2025-06-24T10:36:15Z) - Towards Unified Modeling in Federated Multi-Task Learning via Subspace Decoupling [23.642760378344335]
Federated Multi-Task Learning (FMTL) enables multiple clients performing heterogeneous tasks without exchanging their local data.<n>Most existing FMTL methods focus on building personalized models for each client and unable to support the aggregation of multiple heterogeneous tasks into a unified model.<n>We propose FedDEA, an update-structure-aware aggregation method specifically designed for multi-task model integration.
arXiv Detail & Related papers (2025-05-30T03:53:21Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation [16.244250979166214]
Generalizable Planning-Guided Diffusion Policy Learning (GLIDE) is an approach that learns to solve contact-rich bimanual manipulation tasks.<n>We propose a set of essential design options in feature extraction, task representation, action prediction, and data augmentation.<n>Our approach can enable a bimanual robotic system to effectively manipulate objects of diverse geometries, dimensions, and physical properties.
arXiv Detail & Related papers (2024-12-03T18:51:39Z) - Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem [47.40841984849682]
Route planning for a fleet of vehicles is an important task in applications such as package delivery, surveillance, or transportation.<n>We introduce TOP-Former, a multi-agent route planning neural network designed to efficiently and accurately solve the Team Orienteering Problem.
arXiv Detail & Related papers (2023-11-30T16:10:35Z) - A Transformer Framework for Data Fusion and Multi-Task Learning in Smart
Cities [99.56635097352628]
This paper proposes a Transformer-based AI system for emerging smart cities.
It supports virtually any input data and output task types present S&CCs.
It is demonstrated through learning diverse task sets representative of S&CC environments.
arXiv Detail & Related papers (2022-11-18T20:43:09Z) - Simultaneous Navigation and Construction Benchmarking Environments [73.0706832393065]
We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design.
In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS.
We benchmark the performance of a handcrafted policy with basic localization and planning, and state-of-the-art deep reinforcement learning methods.
arXiv Detail & Related papers (2021-03-31T00:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.