Geometry Constrained Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2007.09727v1
- Date: Sun, 19 Jul 2020 17:33:42 GMT
- Title: Geometry Constrained Weakly Supervised Object Localization
- Authors: Weizeng Lu, Xi Jia, Weicheng Xie, Linlin Shen, Yicong Zhou, Jinming
Duan
- Abstract summary: We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization.
The detector predicts the object location defined by a set of coefficients describing a geometric shape.
The generator takes the resulting masked images as input and performs two complementary classification tasks for the object and background.
In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing.
- Score: 55.17224813345206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a geometry constrained network, termed GC-Net, for weakly
supervised object localization (WSOL). GC-Net consists of three modules: a
detector, a generator and a classifier. The detector predicts the object
location defined by a set of coefficients describing a geometric shape (i.e.
ellipse or rectangle), which is geometrically constrained by the mask produced
by the generator. The classifier takes the resulting masked images as input and
performs two complementary classification tasks for the object and background.
To make the mask more compact and more complete, we propose a novel multi-task
loss function that takes into account area of the geometric shape, the
categorical cross-entropy and the negative entropy. In contrast to previous
approaches, GC-Net is trained end-to-end and predict object location without
any post-processing (e.g. thresholding) that may require additional tuning.
Extensive experiments on the CUB-200-2011 and ILSVRC2012 datasets show that
GC-Net outperforms state-of-the-art methods by a large margin. Our source code
is available at https://github.com/lwzeng/GC-Net.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - ReFit: A Framework for Refinement of Weakly Supervised Semantic
Segmentation using Object Border Fitting for Medical Images [4.945138408504987]
Weakly Supervised Semantic (WSSS) relying only on image-level supervision is a promising approach to deal with the need for networks.
We propose our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques.
By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging.
arXiv Detail & Related papers (2023-03-14T12:46:52Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - RBGNet: Ray-based Grouping for 3D Object Detection [104.98776095895641]
We propose the RBGNet framework, a voting-based 3D detector for accurate 3D object detection from point clouds.
We propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays.
Our model achieves state-of-the-art 3D detection performance on ScanNet V2 and SUN RGB-D with remarkable performance gains.
arXiv Detail & Related papers (2022-04-05T14:42:57Z) - GaTector: A Unified Framework for Gaze Object Prediction [11.456242421204298]
We build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way.
To better consider the specificity of inputs and tasks, GaTector introduces two input-specific blocks before the shared backbone and three task-specific blocks after the shared backbone.
In the end, we propose a novel wUoC metric that can reveal the difference between boxes even when they share no overlapping area.
arXiv Detail & Related papers (2021-12-07T07:50:03Z) - Learnable Triangulation for Deep Learning-based 3D Reconstruction of
Objects of Arbitrary Topology from Single RGB Images [12.693545159861857]
We propose a novel deep reinforcement learning-based approach for 3D object reconstruction from monocular images.
The proposed method outperforms the state-of-the-art in terms of visual quality, reconstruction accuracy, and computational time.
arXiv Detail & Related papers (2021-09-24T09:44:22Z) - PIG-Net: Inception based Deep Learning Architecture for 3D Point Cloud
Segmentation [0.9137554315375922]
We propose a inception based deep network architecture called PIG-Net, that effectively characterizes the local and global geometric details of the point clouds.
We perform an exhaustive experimental analysis of the PIG-Net architecture on two state-of-the-art datasets.
arXiv Detail & Related papers (2021-01-28T13:27:55Z) - Boundary-Aware Geometric Encoding for Semantic Segmentation of Point
Clouds [45.270215729464056]
Boundary information plays a significant role in 2D image segmentation, while usually being ignored in 3D point cloud segmentation.
We propose a Boundary Prediction Module (BPM) to predict boundary points.
Based on the predicted boundary, a boundary-aware Geometric.
GEM is designed to encode geometric information and aggregate features with discrimination in a neighborhood.
arXiv Detail & Related papers (2021-01-07T05:38:19Z) - Learning Geometry-Disentangled Representation for Complementary
Understanding of 3D Object Point Cloud [50.56461318879761]
We propose Geometry-Disentangled Attention Network (GDANet) for 3D image processing.
GDANet disentangles point clouds into contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Experiments on 3D object classification and segmentation benchmarks demonstrate that GDANet achieves the state-of-the-arts with fewer parameters.
arXiv Detail & Related papers (2020-12-20T13:35:00Z) - GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted
Primitives [68.8204255655161]
We propose an object reconstruction apparatus that uses the so-called Generic Primitives (GP) to complete shapes.
We show that GFPNet competes with state of the art shape completion methods by providing performance results on the ModelNet and KITTI benchmarking datasets.
arXiv Detail & Related papers (2020-06-03T08:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.