Related papers: Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches

Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches

URL: http://arxiv.org/abs/2104.11619v1
Date: Fri, 23 Apr 2021 14:13:59 GMT
Title: Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches
Authors: Jose L. G\'omez, Gabriel Villalonga, Antonio M. L\'opez
Abstract summary: We focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs) In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D) Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Top-performing computer vision models are powered by convolutional neural networks (CNNs). Training an accurate CNN highly depends on both the raw sensor data and their associated ground truth (GT). Collecting such GT is usually done through human labeling, which is time-consuming and does not scale as we wish. This data labeling bottleneck may be intensified due to domain shifts among image sensors, which could force per-sensor data labeling. In this paper, we focus on the use of co-training, a semi-supervised learning (SSL) method, for obtaining self-labeled object bounding boxes (BBs), i.e., the GT to train deep object detectors. In particular, we assess the goodness of multi-modal co-training by relying on two different views of an image, namely, appearance (RGB) and estimated depth (D). Moreover, we compare appearance-based single-modal co-training with multi-modal. Our results suggest that in a standard SSL setting (no domain shift, a few human-labeled data) and under virtual-to-real domain shift (many virtual-world labeled data, no human-labeled data) multi-modal co-training outperforms single-modal. In the latter case, by performing GAN-based domain translation both co-training modalities are on pair; at least, when using an off-the-shelf depth estimation model not specifically trained on the translated images.

Related papers

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline [56.790045049514326]
Two major forms of deception dominate: human-crafted misinformation and AI-generated content.<n>We propose Unified Multimodal Fake Content Detection (UMFDet), a framework designed to handle both forms of deception.<n>UMFDet achieves robust and consistent performance across both misinformation types, outperforming specialized baselines.
arXiv Detail & Related papers (2025-09-30T09:26:32Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
VLMine: Long-Tail Data Mining with Vision Language Models [18.412533708652102]
This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM) Our experiments consistently show large improvements (between 10% and 50%) over the baseline techniques.
arXiv Detail & Related papers (2024-09-23T19:13:51Z)
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers [88.72203975896558]
It is crucial to ensure that knowledge gained from multimodal sensing is effectively shared. Similar samples across different modalities have more knowledge to share than otherwise. We propose a method for RGB-X tracker during inference, with an average +3% precision improvement over the current SOTA.
arXiv Detail & Related papers (2024-05-28T03:00:58Z)
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks. Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data. In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z)
MoCo-Transfer: Investigating out-of-distribution contrastive learning for limited-data domains [52.612507614610244]
We analyze the benefit of transferring self-supervised contrastive representations from moment contrast (MoCo) pretraining to settings with limited data. We find that depending on quantity of labeled and unlabeled data, contrastive pretraining on larger out-of-distribution datasets can perform nearly as well or better than MoCo pretraining in-domain.
arXiv Detail & Related papers (2023-11-15T21:56:47Z)
Transfer Learning between Motor Imagery Datasets using Deep Learning -- Validation of Framework and Comparison of Datasets [0.0]
We present a simple deep learning-based framework commonly used in computer vision. We demonstrate its effectiveness for cross-dataset transfer learning in mental imagery decoding tasks.
arXiv Detail & Related papers (2023-09-04T20:58:57Z)
Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection [20.761672725633936]
Training deep learning-based change detection model heavily depends on labeled data. Recent trend is using remote sensing (RS) data to obtain in-domain representations via supervised or self-supervised learning (SSL) We propose dense semantic-aware pre-training for RS image CD via sampling multiple class-balanced points.
arXiv Detail & Related papers (2022-05-27T06:08:33Z)
Multi-Domain Joint Training for Person Re-Identification [51.73921349603597]
Deep learning-based person Re-IDentification (ReID) often requires a large amount of training data to achieve good performance. It appears that collecting more training data from diverse environments tends to improve the ReID performance. We propose an approach called Domain-Camera-Sample Dynamic network (DCSD) whose parameters can be adaptive to various factors.
arXiv Detail & Related papers (2022-01-06T09:20:59Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving [45.23708547617418]
Self-supervised learning might be a promising way to improve model performance. Existing SSL methods usually rely on the single-centric-object guarantee. We propose Multi-instance Siamese Network (MultiSiam) to improve generalization ability and achieve state-of-the-art transfer performance.
arXiv Detail & Related papers (2021-08-27T08:47:01Z)
Self domain adapted network [6.040230864736051]
Domain shift is a major problem for deploying deep networks in clinical practice. We propose a novel self domain adapted network (SDA-Net) that can rapidly adapt itself to a single test subject.
arXiv Detail & Related papers (2020-07-07T01:41:34Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.