MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes
- URL: http://arxiv.org/abs/2307.01115v1
- Date: Mon, 3 Jul 2023 15:45:14 GMT
- Title: MeT: A Graph Transformer for Semantic Segmentation of 3D Meshes
- Authors: Giuseppe Vecchio, Luca Prezzavento, Carmelo Pino, Francesco Rundo,
Simone Palazzo, Concetto Spampinato
- Abstract summary: We propose a transformer-based method for semantic segmentation of 3D mesh.
We perform positional encoding by means of the Laplacian eigenvectors of the adjacency matrix.
We show how the proposed approach yields state-of-the-art performance on semantic segmentation of 3D meshes.
- Score: 10.667492516216887
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Polygonal meshes have become the standard for discretely approximating 3D
shapes, thanks to their efficiency and high flexibility in capturing
non-uniform shapes. This non-uniformity, however, leads to irregularity in the
mesh structure, making tasks like segmentation of 3D meshes particularly
challenging. Semantic segmentation of 3D mesh has been typically addressed
through CNN-based approaches, leading to good accuracy. Recently, transformers
have gained enough momentum both in NLP and computer vision fields, achieving
performance at least on par with CNN models, supporting the long-sought
architecture universalism. Following this trend, we propose a transformer-based
method for semantic segmentation of 3D mesh motivated by a better modeling of
the graph structure of meshes, by means of global attention mechanisms. In
order to address the limitations of standard transformer architectures in
modeling relative positions of non-sequential data, as in the case of 3D
meshes, as well as in capturing the local context, we perform positional
encoding by means the Laplacian eigenvectors of the adjacency matrix, replacing
the traditional sinusoidal positional encodings, and by introducing
clustering-based features into the self-attention and cross-attention
operators. Experimental results, carried out on three sets of the Shape COSEG
Dataset, on the human segmentation dataset proposed in Maron et al., 2017 and
on the ShapeNet benchmark, show how the proposed approach yields
state-of-the-art performance on semantic segmentation of 3D meshes.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - Spherical Transformer: Adapting Spherical Signal to CNNs [53.18482213611481]
Spherical Transformer can transform spherical signals into vectors that can be directly processed by standard CNNs.
We evaluate our approach on the tasks of spherical MNIST recognition, 3D object classification and omnidirectional image semantic segmentation.
arXiv Detail & Related papers (2021-01-11T12:33:16Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z) - Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications.
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv)
Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.