Knowledge Distillation for Object Detection: from generic to remote
sensing datasets
- URL: http://arxiv.org/abs/2307.09264v1
- Date: Tue, 18 Jul 2023 13:49:00 GMT
- Title: Knowledge Distillation for Object Detection: from generic to remote
sensing datasets
- Authors: Ho\`ang-\^An L\^e and Minh-Tan Pham
- Abstract summary: We evaluate various off-the-shelf object knowledge distillation methods which have been originally developed on generic computer vision datasets.
In particular, methods covering both logit and feature imitation approaches are applied for vehicle detection using the well-known benchmarks as xView and VEDAI datasets.
- Score: 7.872075562968697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation, a well-known model compression technique, is an
active research area in both computer vision and remote sensing communities. In
this paper, we evaluate in a remote sensing context various off-the-shelf
object detection knowledge distillation methods which have been originally
developed on generic computer vision datasets such as Pascal VOC. In
particular, methods covering both logit mimicking and feature imitation
approaches are applied for vehicle detection using the well-known benchmarks
such as xView and VEDAI datasets. Extensive experiments are performed to
compare the relative performance and interrelationships of the methods.
Experimental results show high variations and confirm the importance of result
aggregation and cross validation on remote sensing datasets.
Related papers
- UAV-Based Human Body Detector Selection and Fusion for Geolocated Saliency Map Generation [0.2499907423888049]
The problem of reliably detecting and geolocating objects of different classes in soft real-time is essential in many application areas, such as Search and Rescue performed using Unmanned Aerial Vehicles (UAVs)
This research addresses the complementary problems of system contextual vision-based detector selection, allocation, and execution.
The detection results are fused using a method for building maps of salient locations which takes advantage of a novel sensor model for vision-based detections for both positive and negative observations.
arXiv Detail & Related papers (2024-08-29T13:00:37Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Object-centric Cross-modal Feature Distillation for Event-based Object
Detection [87.50272918262361]
RGB detectors still outperform event-based detectors due to sparsity of the event data and missing visual details.
We develop a novel knowledge distillation approach to shrink the performance gap between these two modalities.
We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector.
arXiv Detail & Related papers (2023-11-09T16:33:08Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - A Functional Data Perspective and Baseline On Multi-Layer
Out-of-Distribution Detection [30.499548939422194]
Methods that explore the multiple layers either require a special architecture or a supervised objective to do so.
This work adopts an original approach based on a functional view of the network that exploits the sample's trajectories through the various layers and their statistical dependencies.
We validate our method and empirically demonstrate its effectiveness in OOD detection compared to strong state-of-the-art baselines on computer vision benchmarks.
arXiv Detail & Related papers (2023-06-06T09:14:05Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Towards an efficient framework for Data Extraction from Chart Images [27.114170963444074]
We adopt state-of-the-art computer vision techniques for the data extraction stage in a data mining system.
For building a robust point detector, a fully convolutional network with feature fusion module is adopted.
For data conversion, we translate the detected element into data with semantic value.
arXiv Detail & Related papers (2021-05-05T13:18:53Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z) - Sensor Data for Human Activity Recognition: Feature Representation and
Benchmarking [27.061240686613182]
The field of Human Activity Recognition (HAR) focuses on obtaining and analysing data captured from monitoring devices (e.g. sensors)
We address the issue of accurately recognising human activities using different Machine Learning (ML) techniques.
arXiv Detail & Related papers (2020-05-15T00:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.