Related papers: Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data

Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data

URL: http://arxiv.org/abs/2411.03082v1
Date: Tue, 05 Nov 2024 13:26:31 GMT
Title: Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data
Authors: Irum Mehboob, Li Sun, Alireza Astegarpanah, Rustam Stolkin,
Abstract summary: We show how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable.
Score: 6.892494758401737
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality, so that the student can output confidence scores with each categorization. The resulting student significantly outperforms the same YOLO architecture trained directly on the same amount of labelled data. Our GP-based approach yields robust and meaningful uncertainty estimations for complex industrial object classifications. The end-to-end network is also capable of real-time processing, needed for robotics applications. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable. In this paper, we demonstrate an example of detection, localisation, and object category recognition of nuclear mixed-waste materials in highly cluttered and unstructured scenes. This is critical for robotic sorting and handling of legacy nuclear waste, which poses complex environmental remediation challenges in many nuclearised nations.

Related papers

Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction [13.314778587751588]
We propose a formal definition unify the analysis of LNGT LNGT in the context of different machine learning tasks. We provide in-depth analysis on memorization effect and insightful discussion about potential future research opportunities.
arXiv Detail & Related papers (2024-06-23T02:21:48Z)
Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization. It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z)
Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment [55.11291053011696]
This work presents a framework for dealing with 3D scene understanding when the labeled scenes are quite limited. To extract knowledge for novel categories from the pre-trained vision-language models, we propose a hierarchical feature-aligned pre-training and knowledge distillation strategy. In the limited reconstruction case, our proposed approach, termed WS3D++, ranks 1st on the large-scale ScanNet benchmark.
arXiv Detail & Related papers (2023-12-01T15:47:04Z)
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection [90.32180043449263]
State-of-the-art 3D object detectors are usually trained on large-scale datasets with high-quality 3D annotations. A natural remedy is to adopt semi-supervised learning (SSL) by leveraging a limited amount of labeled samples and abundant unlabeled samples. This paper introduces a novel approach of Hierarchical Supervision and Shuffle Data Augmentation (HSSDA), which is a simple yet effective teacher-student framework.
arXiv Detail & Related papers (2023-04-04T02:09:32Z)
The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection [62.1185839286255]
Low-cost monocular 3D object detection plays a fundamental role in autonomous driving. We introduce a Dynamic Feature Reflecting Network, named DFR-Net. We rank 1st among all the monocular 3D object detectors in the KITTI test set.
arXiv Detail & Related papers (2021-12-28T07:31:18Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
Lifelong 3D Object Recognition and Grasp Synthesis Using Dual Memory Recurrent Self-Organization Networks [0.0]
Humans learn to recognize and manipulate new objects in lifelong settings without forgetting the previously gained knowledge. In most conventional deep neural networks, this is not possible due to the problem of catastrophic forgetting. We propose a hybrid model architecture consisting of a dual-memory recurrent neural network and an autoencoder to tackle object recognition and grasping simultaneously.
arXiv Detail & Related papers (2021-09-23T11:14:13Z)
3D Spatial Recognition without Spatially Labeled 3D [127.6254240158249]
We introduce WyPR, a Weakly-supervised framework for Point cloud Recognition. We show that WyPR can detect and segment objects in point cloud data without access to any spatial labels at training time.
arXiv Detail & Related papers (2021-05-13T17:58:07Z)
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z)
Building Robust Industrial Applicable Object Detection Models Using Transfer Learning and Single Pass Deep Learning Architectures [1.1816942730023883]
We explore how deep convolutional neural networks dedicated to the task of object detection can improve our industrial-oriented object detection pipelines. By using a deep learning architecture that integrates region proposals, classification and probability estimation in a single run, we aim at obtaining real-time performance. We apply these algorithms to two industrially relevant applications, one being the detection of promotion boards in eye tracking data and the other detecting and recognizing packages of warehouse products for augmented advertisements.
arXiv Detail & Related papers (2020-07-09T09:50:45Z)
SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.