Epipolar Transformers
- URL: http://arxiv.org/abs/2005.04551v1
- Date: Sun, 10 May 2020 02:22:54 GMT
- Title: Epipolar Transformers
- Authors: Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu
- Abstract summary: A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps.
The 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D.
We propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation.
- Score: 39.98487207625999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common approach to localize 3D human joints in a synchronized and
calibrated multi-view setup consists of two-steps: (1) apply a 2D detector
separately on each view to localize joints in 2D, and (2) perform robust
triangulation on 2D detections from each view to acquire the 3D joint
locations. However, in step 1, the 2D detector is limited to solving
challenging cases which could potentially be better resolved in 3D, such as
occlusions and oblique viewing angles, purely in 2D without leveraging any 3D
information. Therefore, we propose the differentiable "epipolar transformer",
which enables the 2D detector to leverage 3D-aware features to improve 2D pose
estimation. The intuition is: given a 2D location p in the current view, we
would like to first find its corresponding point p' in a neighboring view, and
then combine the features at p' with the features at p, thus leading to a
3D-aware feature at p. Inspired by stereo matching, the epipolar transformer
leverages epipolar constraints and feature matching to approximate the features
at p'. Experiments on InterHand and Human3.6M show that our approach has
consistent improvements over the baselines. Specifically, in the condition
where no external data is used, our Human3.6M model trained with ResNet-50
backbone and image size 256 x 256 outperforms state-of-the-art by 4.23 mm and
achieves MPJPE 26.9 mm.
Related papers
- Rigid Single-Slice-in-Volume registration via rotation-equivariant 2D/3D feature matching [3.041742847777409]
We propose a self-supervised 2D/3D registration approach to match a single 2D slice to the corresponding 3D volume.
Results demonstrate the robustness of the proposed slice-in-volume registration on the NSCLC-Radiomics CT and KIRBY21 MRI datasets.
arXiv Detail & Related papers (2024-10-24T12:24:27Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale
Visual Localization [44.05930316729542]
We propose EP2P-Loc, a novel large-scale visual localization method for 3D point clouds.
To increase the number of inliers, we propose a simple algorithm to remove invisible 3D points in the image.
For the first time in this task, we employ a differentiable for end-to-end training.
arXiv Detail & Related papers (2023-09-14T07:06:36Z) - Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
Pre-training [65.75399500494343]
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
arXiv Detail & Related papers (2023-02-27T17:56:18Z) - TransFusion: Cross-view Fusion with Transformer for 3D Human Pose
Estimation [21.37032015978738]
We introduce a transformer framework for multi-view 3D pose estimation.
Inspired by previous multi-modal transformers, we design a unified transformer architecture, named TransFusion.
We propose the concept of epipolar field to encode 3D positional information into the transformer model.
arXiv Detail & Related papers (2021-10-18T18:08:18Z) - Weakly-supervised Cross-view 3D Human Pose Estimation [16.045255544594625]
We propose a simple yet effective pipeline for weakly-supervised cross-view 3D human pose estimation.
Our method can achieve state-of-the-art performance in a weakly-supervised manner.
We evaluate our method on the standard benchmark dataset, Human3.6M.
arXiv Detail & Related papers (2021-05-23T08:16:25Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.