GeoGraph: Learning graph-based multi-view object detection with
geometric cues end-to-end
- URL: http://arxiv.org/abs/2003.10151v2
- Date: Tue, 24 Mar 2020 14:38:07 GMT
- Title: GeoGraph: Learning graph-based multi-view object detection with
geometric cues end-to-end
- Authors: Ahmed Samy Nassar, Stefano D'Aronco, S\'ebastien Lef\`evre, and Jan D.
Wegner
- Abstract summary: We propose an end-to-end learnable approach that detects static urban objects from multiple views.
Our method relies on a Graph Neural Network (GNN) to, detect all objects and output their geographic positions.
Our GNN simultaneously models relative pose and image evidence, and is further able to deal with an arbitrary number of input views.
- Score: 10.349116753411742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose an end-to-end learnable approach that detects static
urban objects from multiple views, re-identifies instances, and finally assigns
a geographic position per object. Our method relies on a Graph Neural Network
(GNN) to, detect all objects and output their geographic positions given images
and approximate camera poses as input. Our GNN simultaneously models relative
pose and image evidence, and is further able to deal with an arbitrary number
of input views. Our method is robust to occlusion, with similar appearance of
neighboring objects, and severe changes in viewpoints by jointly reasoning
about visual image appearance and relative pose. Experimental evaluation on two
challenging, large-scale datasets and comparison with state-of-the-art methods
show significant and systematic improvements both in accuracy and efficiency,
with 2-6% gain in detection and re-ID average precision as well as 8x reduction
of training time.
Related papers
- UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Masked Contrastive Graph Representation Learning for Age Estimation [44.96502862249276]
This paper utilizes the property of graph representation learning in dealing with image redundancy information.
We propose a novel Masked Contrastive Graph Representation Learning (MCGRL) method for age estimation.
Experimental results on real-world face image datasets demonstrate the superiority of our proposed method over other state-of-the-art age estimation approaches.
arXiv Detail & Related papers (2023-06-16T15:53:21Z) - LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D
Signals [9.201550006194994]
Learnable matchers often underperform when there exists only small regions of co-visibility between image pairs.
We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks.
We show that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs.
arXiv Detail & Related papers (2023-03-22T17:46:27Z) - Object Detection in Aerial Images with Uncertainty-Aware Graph Network [61.02591506040606]
We propose a novel uncertainty-aware object detection framework with a structured-graph, where nodes and edges are denoted by objects.
We refer to our model as Uncertainty-Aware Graph network for object DETection (UAGDet)
arXiv Detail & Related papers (2022-08-23T07:29:03Z) - PoserNet: Refining Relative Camera Poses Exploiting Object Detections [14.611595909419297]
We use objectness regions to guide the pose estimation problem rather than explicit semantic object detections.
We propose Pose Refiner Network (PoserNet) a light-weight Graph Network to refine the approximate pair-wise relative camera poses.
We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms.
arXiv Detail & Related papers (2022-07-19T17:58:33Z) - End-to-end learning of keypoint detection and matching for relative pose
estimation [1.8352113484137624]
We propose a new method for estimating the relative pose between two images.
We jointly learn keypoint detection, description extraction, matching and robust pose estimation.
We demonstrate our method for the task of visual localization of a query image within a database of images with known pose.
arXiv Detail & Related papers (2021-04-02T15:16:17Z) - Joint Deep Multi-Graph Matching and 3D Geometry Learning from
Inhomogeneous 2D Image Collections [57.60094385551773]
We propose a trainable framework for learning a deformable 3D geometry model from inhomogeneous image collections.
We in addition obtain the underlying 3D geometry of the objects depicted in the 2D images.
arXiv Detail & Related papers (2021-03-31T17:25:36Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection.
The proposed method constructs graph signals leveraging both local image features and global shape features.
Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.