Annotation Free Spacecraft Detection and Segmentation using Vision Language Models
- URL: http://arxiv.org/abs/2602.04699v1
- Date: Wed, 04 Feb 2026 16:07:29 GMT
- Title: Annotation Free Spacecraft Detection and Segmentation using Vision Language Models
- Authors: Samet Hicsonmez, Jose Sosa, Dan Pineau, Inder Pal Singh, Arunkumar Rathinam, Abd El Rahman Shabayek, Djamila Aouada,
- Abstract summary: Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition.<n>We propose an annotation-free detection and segmentation pipeline for space targets usingVLMs.<n>Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM.<n>Despite the inherent noise in the pseudo-labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference.
- Score: 14.77089626655396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision Language Models (VLMs) have demonstrated remarkable performance in open-world zero-shot visual recognition. However, their potential in space-related applications remains largely unexplored. In the space domain, accurate manual annotation is particularly challenging due to factors such as low visibility, illumination variations, and object blending with planetary backgrounds. Developing methods that can detect and segment spacecraft and orbital targets without requiring extensive manual labeling is therefore of critical importance. In this work, we propose an annotation-free detection and segmentation pipeline for space targets using VLMs. Our approach begins by automatically generating pseudo-labels for a small subset of unlabeled real data with a pre-trained VLM. These pseudo-labels are then leveraged in a teacher-student label distillation framework to train lightweight models. Despite the inherent noise in the pseudo-labels, the distillation process leads to substantial performance gains over direct zero-shot VLM inference. Experimental evaluations on the SPARK-2024, SPEED+, and TANGO datasets on segmentation tasks demonstrate consistent improvements in average precision (AP) by up to 10 points. Code and models are available at https://github.com/giddyyupp/annotation-free-spacecraft-segmentation.
Related papers
- Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels [28.96799571666756]
We introduce the novel task Domain Generalization for LiDAR under Noisy Labels (DGLSS-NL)<n>We find that existing noisy-label learning approaches adapt poorly to LiDAR data.<n>We propose DuNe, a dual-view framework with strong and weak branches that enforce feature-level consistency and apply cross-entropy loss based on confidence-aware filtering of predictions.
arXiv Detail & Related papers (2025-10-10T06:11:34Z) - VESPA: Towards un(Human)supervised Open-World Pointcloud Labeling for Autonomous Driving [1.623951368574041]
We introduce VESPA, a multimodal autolabeling pipeline that fuses the geometric precision of LiDAR with the semantic richness of camera images.<n> VESPA supports the discovery of novel categories and produces high-quality 3D pseudolabels without requiring ground-truth annotations or HD maps.<n>On Nuscenes dataset, VESPA achieves an AP of 52.95% for object discovery and up to 46.54% for multiclass object detection.
arXiv Detail & Related papers (2025-07-27T19:39:29Z) - DISCOVER: Data-driven Identification of Sub-activities via Clustering and Visualization for Enhanced Activity Recognition in Smart Homes [46.86909768552777]
We introduce DISCOVER, a method to discover fine-grained human sub-activities from unlabeled sensor data without relying on pre-segmentation.<n>We demonstrate its effectiveness through a re-annotation exercise on widely used HAR datasets.
arXiv Detail & Related papers (2025-02-11T20:02:24Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - SSTD: Stripe-Like Space Target Detection Using Single-Point Weak Supervision [3.1531267517553587]
Stripe-like space target detection (SSTD) plays a key role in enhancing space situational awareness and assessing spacecraft behaviour.
We introduce AstroStripeSet', a pioneering dataset designed for SSTD, aiming to bridge the gap in academic resources and advance research in SSTD.
We propose a novel teacher-student label evolution framework with single-point weak supervision, providing a new solution to the challenges of manual labeling.
arXiv Detail & Related papers (2024-07-25T15:02:24Z) - MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D
Object Detection [59.1417156002086]
MixSup is a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision.
MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations.
arXiv Detail & Related papers (2024-01-29T17:05:19Z) - Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label
Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction.
Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data.
Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds [62.49198183539889]
We propose a label-efficient semantic segmentation pipeline for outdoor scenes with LiDAR point clouds.
Our method co-designs an efficient labeling process with semi/weakly supervised learning.
Our proposed method is even highly competitive compared to the fully supervised counterpart with 100% labels.
arXiv Detail & Related papers (2022-10-14T19:13:36Z) - ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework
for LiDAR Point Cloud Segmentation [111.56730703473411]
Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations.
Simulation-to-real domain adaptation (SRDA) trains a DNN using unlimited synthetic data with automatically generated labels.
ePointDA consists of three modules: self-supervised dropout noise rendering, statistics-invariant and spatially-adaptive feature alignment, and transferable segmentation learning.
arXiv Detail & Related papers (2020-09-07T23:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.