Related papers: Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation

URL: http://arxiv.org/abs/2305.12704v3
Date: Wed, 15 Nov 2023 09:02:23 GMT
Title: Rotation-Constrained Cross-View Feature Fusion for Multi-View Appearance-based Gaze Estimation
Authors: Yoichiro Hisadome, Tianyi Wu, Jiawei Qin, Yusuke Sugano
Abstract summary: This work proposes a generalizable multi-view gaze estimation task and a cross-view feature fusion method to address this issue. In addition to paired images, our method takes the relative rotation matrix between two cameras as additional input. The proposed network learns to extract rotatable feature representation by using relative rotation as a constraint.
Score: 16.43119580796718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Appearance-based gaze estimation has been actively studied in recent years. However, its generalization performance for unseen head poses is still a significant limitation for existing methods. This work proposes a generalizable multi-view gaze estimation task and a cross-view feature fusion method to address this issue. In addition to paired images, our method takes the relative rotation matrix between two cameras as additional input. The proposed network learns to extract rotatable feature representation by using relative rotation as a constraint and adaptively fuses the rotatable features via stacked fusion modules. This simple yet efficient approach significantly improves generalization performance under unseen head poses without significantly increasing computational cost. The model can be trained with random combinations of cameras without fixing the positioning and can generalize to unseen camera pairs during inference. Through experiments using multiple datasets, we demonstrate the advantage of the proposed method over baseline methods, including state-of-the-art domain generalization approaches. The code will be available at https://github.com/ut-vision/Rot-MVGaze.

Related papers

One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding. It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps. OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z)
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets [20.767590006724117]
We introduce and validate a view-combination score to indicate the effectiveness of the input view combination. To achieve this, we apply cross-view matching transformers to model interactions between source images and build correlation frustums. Our proposed framework significantly outperforms previous methods in terms of view-combination generalizability.
arXiv Detail & Related papers (2024-03-08T06:27:13Z)
PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment [21.98302129015761]
We propose to formulate the Structure from Motion (SfM) problem inside a probabilistic diffusion framework. We show that our method PoseDiffusion significantly improves over the classic SfM pipelines. It is observed that our method can generalize across datasets without further training.
arXiv Detail & Related papers (2023-06-27T17:59:07Z)
Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem. In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images. The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z)
RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object. We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z)
Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues. We construct a global map to measure the connections of pixels between the input source images. Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
Deep Keypoint-Based Camera Pose Estimation with Geometric Constraints [80.60538408386016]
Estimating relative camera poses from consecutive frames is a fundamental problem in visual odometry. We propose an end-to-end trainable framework consisting of learnable modules for detection, feature extraction, matching and outlier rejection.
arXiv Detail & Related papers (2020-07-29T21:41:31Z)
Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid. Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation. We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.