Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection
- URL: http://arxiv.org/abs/2205.03346v1
- Date: Fri, 6 May 2022 16:27:14 GMT
- Title: Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection
- Authors: Ziteng Cui, Guo-Jun Qi, Lin Gu, Shaodi You, Zenghui Zhang, Tatsuya
Harada
- Abstract summary: We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
- Score: 84.52197307286681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dark environment becomes a challenge for computer vision algorithms owing to
insufficient photons and undesirable noise. To enhance object detection in a
dark environment, we propose a novel multitask auto encoding transformation
(MAET) model which is able to explore the intrinsic pattern behind illumination
translation. In a self-supervision manner, the MAET learns the intrinsic visual
structure by encoding and decoding the realistic illumination-degrading
transformation considering the physical noise model and image signal processing
(ISP).
Based on this representation, we achieve the object detection task by
decoding the bounding box coordinates and classes. To avoid the
over-entanglement of two tasks, our MAET disentangles the object and degrading
features by imposing an orthogonal tangent regularity. This forms a parametric
manifold along which multitask predictions can be geometrically formulated by
maximizing the orthogonality between the tangents along the outputs of
respective tasks. Our framework can be implemented based on the mainstream
object detection architecture and directly trained end-to-end using normal
target detection datasets, such as VOC and COCO. We have achieved the
state-of-the-art performance using synthetic and real-world datasets. Code is
available at https://github.com/cuiziteng/MAET.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - ObjFormer: Learning Land-Cover Changes From Paired OSM Data and Optical High-Resolution Imagery via Object-Guided Transformer [31.46969412692045]
This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery.
We propose an object-guided Transformer (Former) by naturally combining the object-based image analysis (OBIA) technique with the advanced vision Transformer architecture.
A large-scale benchmark dataset called OpenMapCD is constructed to conduct detailed experiments.
arXiv Detail & Related papers (2023-10-04T09:26:44Z) - Feature Shrinkage Pyramid for Camouflaged Object Detection with
Transformers [34.42710399235461]
Vision transformers have recently shown strong global context modeling capabilities in camouflaged object detection.
They suffer from two major limitations: less effective locality modeling and insufficient feature aggregation in decoders.
We propose a novel transformer-based Feature Shrinkage Pyramid Network (FSPNet), which aims to hierarchically decode locality-enhanced neighboring transformer features.
arXiv Detail & Related papers (2023-03-26T20:50:58Z) - Self-Supervised Object Detection via Generative Image Synthesis [106.65384648377349]
We present the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.
We use collections of real world images without bounding box annotations to learn to synthesize and detect objects.
Our work advances the field of self-supervised object detection by introducing a successful new paradigm of using controllable GAN-based image synthesis for it.
arXiv Detail & Related papers (2021-10-19T11:04:05Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.