MuraNet: Multi-task Floor Plan Recognition with Relation Attention
- URL: http://arxiv.org/abs/2309.00348v1
- Date: Fri, 1 Sep 2023 09:10:04 GMT
- Title: MuraNet: Multi-task Floor Plan Recognition with Relation Attention
- Authors: Lingxiao Huang, Jung-Hsuan Wu, Chiching Wei, Wilson Li
- Abstract summary: We introduce MuraNet, an attention-based multi-task model for segmentation and detection tasks in floor plan data.
By jointly training the model on both detection and segmentation tasks, we believe MuraNet can effectively extract and utilize relevant features for both tasks.
- Score: 9.295218599901249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recognition of information in floor plan data requires the use of
detection and segmentation models. However, relying on several single-task
models can result in ineffective utilization of relevant information when there
are multiple tasks present simultaneously. To address this challenge, we
introduce MuraNet, an attention-based multi-task model for segmentation and
detection tasks in floor plan data. In MuraNet, we adopt a unified encoder
called MURA as the backbone with two separated branches: an enhanced
segmentation decoder branch and a decoupled detection head branch based on
YOLOX, for segmentation and detection tasks respectively. The architecture of
MuraNet is designed to leverage the fact that walls, doors, and windows usually
constitute the primary structure of a floor plan's architecture. By jointly
training the model on both detection and segmentation tasks, we believe MuraNet
can effectively extract and utilize relevant features for both tasks. Our
experiments on the CubiCasa5k public dataset show that MuraNet improves
convergence speed during training compared to single-task models like U-Net and
YOLOv3. Moreover, we observe improvements in the average AP and IoU in
detection and segmentation tasks, respectively.Our ablation experiments
demonstrate that the attention-based unified backbone of MuraNet achieves
better feature extraction in floor plan recognition tasks, and the use of
decoupled multi-head branches for different tasks further improves model
performance. We believe that our proposed MuraNet model can address the
disadvantages of single-task models and improve the accuracy and efficiency of
floor plan data recognition.
Related papers
- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - A Point-Based Approach to Efficient LiDAR Multi-Task Perception [49.91741677556553]
PAttFormer is an efficient multi-task architecture for joint semantic segmentation and object detection in point clouds.
Unlike other LiDAR-based multi-task architectures, our proposed PAttFormer does not require separate feature encoders for task-specific point cloud representations.
Our evaluations show substantial gains from multi-task learning, improving LiDAR semantic segmentation by +1.7% in mIou and 3D object detection by +1.7% in mAP.
arXiv Detail & Related papers (2024-04-19T11:24:34Z) - Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking [55.13878429987136]
We propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets.
Our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
arXiv Detail & Related papers (2023-11-17T08:17:49Z) - Body Segmentation Using Multi-task Learning [1.0832844764942349]
We present a novel multi-task model for human segmentation/parsing that involves three tasks.
The main idea behind the proposed--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks.
The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model.
arXiv Detail & Related papers (2022-12-13T13:06:21Z) - Simultaneous Multiple Object Detection and Pose Estimation using 3D
Model Infusion with Monocular Vision [21.710141497071373]
Multiple object detection and pose estimation are vital computer vision tasks.
We propose simultaneous neural modeling of both using monocular vision and 3D model infusion.
Our Simultaneous Multiple Object detection and Pose Estimation network (SMOPE-Net) is an end-to-end trainable multitasking network.
arXiv Detail & Related papers (2022-11-21T05:18:56Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - SG-Net: Spatial Granularity Network for One-Stage Video Instance
Segmentation [7.544917072241684]
Video instance segmentation (VIS) is a new and critical task in computer vision.
We propose a one-stage spatial granularity network (SG-Net) for VIS.
We show that our method can achieve improved performance in both accuracy and inference speed.
arXiv Detail & Related papers (2021-03-18T14:31:15Z) - Multi-object Tracking with a Hierarchical Single-branch Network [31.680667324595557]
We propose an online multi-object tracking framework based on a hierarchical single-branch network.
Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance.
Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance.
arXiv Detail & Related papers (2021-01-06T12:14:58Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.