Symmetria: A Synthetic Dataset for Learning in Point Clouds
- URL: http://arxiv.org/abs/2510.23414v1
- Date: Mon, 27 Oct 2025 15:18:26 GMT
- Title: Symmetria: A Synthetic Dataset for Learning in Point Clouds
- Authors: Ivan Sipiran, Gustavo Santelices, Lucas OyarzĂșn, Andrea Ranieri, Chiara Romanengo, Silvia Biasotti, Bianca Falcidieno,
- Abstract summary: We present a formula-driven dataset that can be generated at any arbitrary scale.<n>We create shapes with known structure and high variability, enabling neural networks to learn point cloud features effectively.<n>Our results demonstrate that this dataset is highly effective for point cloud self-supervised pre-training.
- Score: 3.940178181041262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike image or text domains that benefit from an abundance of large-scale datasets, point cloud learning techniques frequently encounter limitations due to the scarcity of extensive datasets. To overcome this limitation, we present Symmetria, a formula-driven dataset that can be generated at any arbitrary scale. By construction, it ensures the absolute availability of precise ground truth, promotes data-efficient experimentation by requiring fewer samples, enables broad generalization across diverse geometric settings, and offers easy extensibility to new tasks and modalities. Using the concept of symmetry, we create shapes with known structure and high variability, enabling neural networks to learn point cloud features effectively. Our results demonstrate that this dataset is highly effective for point cloud self-supervised pre-training, yielding models with strong performance in downstream tasks such as classification and segmentation, which also show good few-shot learning capabilities. Additionally, our dataset can support fine-tuning models to classify real-world objects, highlighting our approach's practical utility and application. We also introduce a challenging task for symmetry detection and provide a benchmark for baseline comparisons. A significant advantage of our approach is the public availability of the dataset, the accompanying code, and the ability to generate very large collections, promoting further research and innovation in point cloud learning.
Related papers
- Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data [1.9290392443571385]
Imbalanced classification datasets pose significant challenges in machine learning.<n>We propose a novel framework that leverages the rich semantic latent space of Vision Foundation Models to generate synthetic data and train a simple linear classifier.<n>Our method sets a new state-of-the-art for the CIFAR-100-LT benchmark and demonstrates strong performance on the Places-LT benchmark.
arXiv Detail & Related papers (2025-09-19T10:52:31Z) - SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Exploiting Local Features and Range Images for Small Data Real-Time Point Cloud Semantic Segmentation [4.02235104503587]
In this paper, we harness the information from the three-dimensional representation to proficiently capture local features.
A GPU-based KDTree allows for rapid building, querying, and enhancing projection with straightforward operations.
We show that a reduced version of our model not only demonstrates strong competitiveness against full-scale state-of-the-art models but also operates in real-time.
arXiv Detail & Related papers (2024-10-14T13:49:05Z) - A Survey of Label-Efficient Deep Learning for 3D Point Clouds [109.07889215814589]
This paper presents the first comprehensive survey of label-efficient learning of point clouds.
We propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels.
For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges.
arXiv Detail & Related papers (2023-05-31T12:54:51Z) - Effective Utilisation of Multiple Open-Source Datasets to Improve
Generalisation Performance of Point Cloud Segmentation Models [0.0]
Semantic segmentation of aerial point cloud data can be utilised to differentiate which points belong to classes such as ground, buildings, or vegetation.
Point clouds generated from aerial sensors mounted to drones or planes can utilise LIDAR sensors or cameras along with photogrammetry.
We show that a naive combination of datasets produces a model with improved generalisation performance as expected.
arXiv Detail & Related papers (2022-11-29T02:31:01Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - What Can be Seen is What You Get: Structure Aware Point Cloud
Augmentation [0.966840768820136]
We present novel point cloud augmentation methods to artificially diversify a dataset.
Our sensor-centric methods keep the data structure consistent with the lidar sensor capabilities.
We show that our methods enable the use of very small datasets, saving annotation time, training time and the associated costs.
arXiv Detail & Related papers (2022-06-20T09:10:59Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.