Source Feature Compression for Object Classification in Vision-Based
Underwater Robotics
- URL: http://arxiv.org/abs/2112.13953v1
- Date: Tue, 28 Dec 2021 00:45:35 GMT
- Title: Source Feature Compression for Object Classification in Vision-Based
Underwater Robotics
- Authors: Xueyuan Zhao, Mehdi Rahmati, Dario Pompili
- Abstract summary: Proposal is based on a two-stage Walsh-Hadamard Transform (WHT) for Convolutional Neural Network (CNN)-based object classification in underwater robotics.
It is demonstrated and verified that the proposals reduce the training time effectively for learning-based underwater object classification task.
- Score: 11.328151008009257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New efficient source feature compression solutions are proposed based on a
two-stage Walsh-Hadamard Transform (WHT) for Convolutional Neural Network
(CNN)-based object classification in underwater robotics. The object images are
firstly transformed by WHT following a two-stage process. The transform-domain
tensors have large values concentrated in the upper left corner of the matrices
in the RGB channels. By observing this property, the transform-domain matrix is
partitioned into inner and outer regions. Consequently, two novel partitioning
methods are proposed in this work: (i) fixing the size of inner and outer
regions; and (ii) adjusting the size of inner and outer regions adaptively per
image. The proposals are evaluated with an underwater object dataset captured
from the Raritan River in New Jersey, USA. It is demonstrated and verified that
the proposals reduce the training time effectively for learning-based
underwater object classification task and increase the accuracy compared with
the competing methods. The object classification is an essential part of a
vision-based underwater robot that can sense the environment and navigate
autonomously. Therefore, the proposed method is well-suited for efficient
computer vision-based tasks in underwater robotics applications.
Related papers
- On Vision Transformers for Classification Tasks in Side-Scan Sonar Imagery [0.0]
Side-scan sonar (SSS) imagery presents unique challenges in the classification of man-made objects on the seafloor.
This paper rigorously compares the performance of ViT models alongside commonly used CNN architectures for binary classification tasks in SSS imagery.
ViT-based models exhibit superior classification performance across f1-score, precision, recall, and accuracy metrics.
arXiv Detail & Related papers (2024-09-18T14:36:50Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Depth-Adapted CNNs for RGB-D Semantic Segmentation [2.341385717236931]
We propose a novel framework to incorporate the depth information in the RGB convolutional neural network (CNN)
Specifically, our Z-ACN generates a 2D depth-adapted offset which is fully constrained by low-level features to guide the feature extraction on RGB images.
With the generated offset, we introduce two intuitive and effective operations to replace basic CNN operators.
arXiv Detail & Related papers (2022-06-08T14:59:40Z) - IFOR: Iterative Flow Minimization for Robotic Object Rearrangement [92.97142696891727]
IFOR, Iterative Flow Minimization for Robotic Object Rearrangement, is an end-to-end method for the problem of object rearrangement for unknown objects.
We show that our method applies to cluttered scenes, and in the real world, while training only on synthetic data.
arXiv Detail & Related papers (2022-02-01T20:03:56Z) - Scale Normalized Image Pyramids with AutoFocus for Object Detection [75.71320993452372]
A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
arXiv Detail & Related papers (2021-02-10T18:57:53Z) - Underwater object detection using Invert Multi-Class Adaboost with deep
learning [37.14538666012363]
We propose a novel neural network architecture, namely Sample-WeIghted hyPEr Network (SWIPENet), for small object detection.
We show that the proposed SWIPENet+IMA framework achieves better performance in detection accuracy against several state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-23T15:30:38Z) - When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D
Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks.
To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed.
Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z) - A New Dataset, Poisson GAN and AquaNet for Underwater Object Grabbing [33.580474181751676]
We propose a new dataset (UDD) consisting of three categories (seacucumber, seaurchin, and scallop) with 2,227 images.
We also propose a novel Poisson-blending Generative Adversarial Network (Poisson GAN) and an efficient object detection network (AquaNet) to address two common issues within related datasets.
arXiv Detail & Related papers (2020-03-03T10:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.