Related papers: GeoFormer: A Multi-Polygon Segmentation Transformer

GeoFormer: A Multi-Polygon Segmentation Transformer

URL: http://arxiv.org/abs/2411.16616v1
Date: Mon, 25 Nov 2024 17:54:44 GMT
Title: GeoFormer: A Multi-Polygon Segmentation Transformer
Authors: Maxim Khomiakov, Michael Riis Andersen, Jes Frellsen,
Abstract summary: In remote sensing there exists a common need for learning scale invariant shapes of objects like buildings. We introduce the GeoFormer, a novel architecture which presents a remedy to the said challenges, learning to generate multipolygons end-to-end. By modeling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery.
Score: 10.097953939411868
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In remote sensing there exists a common need for learning scale invariant shapes of objects like buildings. Prior works relies on tweaking multiple loss functions to convert segmentation maps into the final scale invariant representation, necessitating arduous design and optimization. For this purpose we introduce the GeoFormer, a novel architecture which presents a remedy to the said challenges, learning to generate multipolygons end-to-end. By modeling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery. We evaluate the robustness of the GeoFormer against former methods through a variety of parameter ablations and highlight the advantages of optimizing a single likelihood function. Our study presents the first successful application of auto-regressive transformer models for multi-polygon predictions in remote sensing, suggesting a promising methodological alternative for building vectorization.

Related papers

Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation [19.117822086210513]
INKL-Pose is a novel category-level object pose estimation framework. It enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Experiments on CAMERA25, REAL275, and HouseCat6D demonstrate that INKL-Pose achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-04-21T14:37:37Z)
Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [52.11275397911693]
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-made articulated objects from a single RGBD image. We depart from previous works that rely on learning instance-level latent space, focusing on man-made articulated objects with predefined part counts. Our method successfully reconstructs variously structured multiple instances that previous works cannot handle, and outperforms prior works in shape reconstruction and kinematics estimation.
arXiv Detail & Related papers (2025-04-04T05:08:04Z)
Poly2Vec: Polymorphic Encoding of Geospatial Objects for Spatial Reasoning with Deep Neural Networks [6.1981153537308336]
Poly2Vec is an encoding framework that unifies the modeling of different geospatial objects. We leverage the power of the 2D Fourier transform to encode useful spatial properties, such as shape and location. This unified approach eliminates the need to develop and train separate models for each distinct spatial type.
arXiv Detail & Related papers (2024-08-27T06:28:35Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
Learning Geometric Invariant Features for Classification of Vector Polygons with Graph Message-passing Neural Network [3.804240190982697]
We propose a simple graph message-passing framework, PolyMP, to learn more expressive and robust latent representations of polygons.<n>This framework hierarchically captures self-looped graph information and learns geometric-invariant features for polygon shape classification.<n>Our findings indicate that PolyMP and PolyMP-DSC effectively capture expressive geometric features that remain invariant under common transformations.
arXiv Detail & Related papers (2024-07-05T08:19:36Z)
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects [27.194896819729113]
Transformer-based architecture for planning interactions with elastoplastic objects. DoughNet allows to plan robotic manipulation; selecting a suited tool, its pose and opening width to recreate robot- or human-made goals.
arXiv Detail & Related papers (2024-04-18T21:55:23Z)
HiT: Building Mapping with Hierarchical Transformers [43.31497052507252]
We propose a simple and novel building mapping method with Hierarchical Transformers, called HiT. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. Our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-18T10:24:25Z)
Automatic Parameterization for Aerodynamic Shape Optimization via Deep Geometric Learning [60.69217130006758]
We propose two deep learning models that fully automate shape parameterization for aerodynamic shape optimization. Both models are optimized to parameterize via deep geometric learning to embed human prior knowledge into learned geometric patterns. We perform shape optimization experiments on 2D airfoils and discuss the applicable scenarios for the two models.
arXiv Detail & Related papers (2023-05-03T13:45:40Z)
Towards General-Purpose Representation Learning of Polygonal Geometries [62.34832826705641]
We develop a general-purpose polygon encoding model, which can encode a polygonal geometry into an embedding space. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins.
arXiv Detail & Related papers (2022-09-29T15:59:23Z)
Learning to Complete Object Shapes for Object-level Mapping in Dynamic Scenes [30.500198859451434]
We propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.
arXiv Detail & Related papers (2022-08-09T22:56:33Z)
AutoPoly: Predicting a Polygonal Mesh Construction Sequence from a Silhouette Image [17.915067368873018]
AutoPoly is a hybrid method that generates a polygonal mesh construction sequence from a silhouette image. Our method can alter topology, whereas the recently proposed inverse shape estimation methods using differentiable rendering can only handle a fixed topology.
arXiv Detail & Related papers (2022-03-29T04:48:47Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z)
Gated Path Selection Network for Semantic Segmentation [72.44994579325822]
We develop a novel network named Gated Path Selection Network (GPSNet), which aims to learn adaptive receptive fields. In GPSNet, we first design a two-dimensional multi-scale network - SuperNet, which densely incorporates features from growing receptive fields. To dynamically select desirable semantic context, a gate prediction module is further introduced.
arXiv Detail & Related papers (2020-01-19T12:32:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.