Related papers: COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

URL: http://arxiv.org/abs/2601.09698v1
Date: Wed, 14 Jan 2026 18:50:17 GMT
Title: COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation
Authors: Tony Danjun Wang, Tolga Birdal, Nassir Navab, Lennart Bastian,
Abstract summary: 3D pose estimation from sparse multi-views is a critical task for action recognition, sports analysis, and human-robot interaction.<n>We propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph problem.<n> COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods.
Score: 58.47973015036709
License: http://creativecommons.org/licenses/by/4.0/
Abstract: 3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interaction. Optimization-based methods typically follow a two-stage pipeline, first detecting 2D keypoints in each view and then associating these detections across views to triangulate the 3D pose. Existing methods rely on mere pairwise associations to model this correspondence problem, treating global consistency between views (i.e., cycle consistency) as a soft constraint. Yet, reconciling these constraints for multiple views becomes brittle when spurious associations propagate errors. We thus propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph partitioning problem rather than through pairwise association. While the complexity of the resulting integer linear program grows exponentially in theory, we introduce an efficient geometric pruning strategy to substantially reduce the search space. COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods, offering a promising solution to a widely studied problem.

Related papers

H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction [39.22287224290769]
H3R is a hybrid framework that integrates latent fusion with attention-based feature aggregation.<n>By integrating both paradigms, our approach enhances generalization while converging 2$times$ faster than existing methods.<n>Our method supports variable-number and high-resolution input views while demonstrating robust cross-dataset generalization.
arXiv Detail & Related papers (2025-08-05T05:56:30Z)
A Framework for Reducing the Complexity of Geometric Vision Problems and its Application to Two-View Triangulation with Approximation Bounds [14.419727000332717]
Triangulation is the task of estimating a 3D point from noisy 2D projections across multiple images.<n>We present a new framework for reducing the computational complexity of geometric vision problems through targeted reweighting of the cost functions used to minimize reprojection errors.<n>Although this work focuses on two-view triangulation, the framework generalizes to other geometric vision problems.
arXiv Detail & Related papers (2025-03-11T08:00:51Z)
Occ$^2$Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions [14.217367037250296]
Occ$2$Net is an image matching method that models occlusion relations using 3D occupancy and infers matching points in occluded regions. We evaluate our method on both real-world and simulated datasets and demonstrate its superior performance over state-of-the-art methods on several metrics.
arXiv Detail & Related papers (2023-08-14T13:09:41Z)
Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation. And we propose three task-specific graph neural networks for effective message passing. Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z)
Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels. We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z)
Isometric Multi-Shape Matching [50.86135294068138]
Finding correspondences between shapes is a fundamental problem in computer vision and graphics. While isometries are often studied in shape correspondence problems, they have not been considered explicitly in the multi-matching setting. We present a suitable optimisation algorithm for solving our formulation and provide a convergence and complexity analysis.
arXiv Detail & Related papers (2020-12-04T15:58:34Z)
Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization [44.85008070868851]
Blind Perspective-n-Point is the problem estimating the position of a camera relative to a scene. We propose the first fully end-to-end trainable network for solving the blind geometric problem efficiently globally.
arXiv Detail & Related papers (2020-07-29T06:35:45Z)
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
PnP-Net: A hybrid Perspective-n-Point Network [2.66512000865131]
We consider the robust Perspective-n-Point problem using a hybrid approach that combines deep learning with model based algorithms. We demonstrate both synthetic parameters and real world data with low computational requirements.
arXiv Detail & Related papers (2020-03-10T10:43:14Z)
Learning multiview 3D point cloud registration [74.39499501822682]
We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm. Our approach outperforms the state-of-the-art by a significant margin, while being end-to-end trainable and computationally less costly.
arXiv Detail & Related papers (2020-01-15T03:42:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.