Related papers: Orient Anything V2: Unifying Orientation and Rotation Understanding

Orient Anything V2: Unifying Orientation and Rotation Understanding

URL: http://arxiv.org/abs/2601.05573v1
Date: Fri, 09 Jan 2026 06:43:59 GMT
Title: Orient Anything V2: Unifying Orientation and Rotation Understanding
Authors: Zehan Wang, Ziang Zhang, Jiayang Xu, Jialei Wang, Tianyu Pang, Chao Du, HengShuang Zhao, Zhou Zhao,
Abstract summary: Orient Anything V2 is an enhanced model for unified understanding of object 3D orientation and rotation from single or paired images.<n>V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations.<n>It achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks.
Score: 106.90704703054115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anything V1, which defines orientation via a single unique front face, V2 extends this capability to handle objects with diverse rotational symmetries and directly estimate relative rotations. These improvements are enabled by four key innovations: 1) Scalable 3D assets synthesized by generative models, ensuring broad category coverage and balanced data distribution; 2) An efficient, model-in-the-loop annotation system that robustly identifies 0 to N valid front faces for each object; 3) A symmetry-aware, periodic distribution fitting objective that captures all plausible front-facing orientations, effectively modeling object rotational symmetry; 4) A multi-frame architecture that directly predicts relative object rotations. Extensive experiments show that Orient Anything V2 achieves state-of-the-art zero-shot performance on orientation estimation, 6DoF pose estimation, and object symmetry recognition across 11 widely used benchmarks. The model demonstrates strong generalization, significantly broadening the applicability of orientation estimation in diverse downstream tasks.

Related papers

Orientation Matters: Making 3D Generative Models Orientation-Aligned [39.941774172257105]
Existing 3D generative models often produce misaligned results due to inconsistent training data.<n>We introduce the task of orientation-aligned 3D object generation, producing 3D objects with consistent orientations across categories.<n>We fine-tune two representative 3D generative models based on multi-view diffusion and 3D variational autoencoder frameworks to produce intuitively aligned objects.
arXiv Detail & Related papers (2025-06-10T09:54:37Z)
Right Side Up? Disentangling Orientation Understanding in MLLMs with Fine-grained Multi-axis Perception Tasks [17.357441373079382]
We introduce DORI (Discriminative Orientation Reasoning Intelligence), a benchmark establishing object orientation perception as a primary evaluation target.<n>DORI assesses four dimensions of orientation comprehension: frontal alignment, rotational transformations, relative directional relationships, and canonical orientation understanding.<n>Our evaluation of 15 state-of-the-art vision-language models reveals critical limitations.<n>DORI offers implications for improving robotic control, 3D scene reconstruction, and human-AI interaction in physical environments.
arXiv Detail & Related papers (2025-05-27T18:22:44Z)
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection [48.11373832295736]
This paper focuses on rotation symmetry, where objects remain unchanged when rotated around a central axis.<n>Traditional methods relied on hand-crafted feature matching, while recent segmentation models based on convolutional neural networks detect rotation centers but struggle with 3D geometric consistency.<n>We propose a model that directly predicts rotation centers and vertices in 3D space and projects the results back to 2D while preserving structural integrity.
arXiv Detail & Related papers (2025-03-26T05:02:16Z)
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models [79.96917782423219]
Orient Anything is the first expert and foundational model designed to estimate object orientation in a single image.<n>By developing a pipeline to annotate the front face of 3D objects, we collect 2M images with precise orientation annotations.<n>Our model achieves state-of-the-art orientation estimation accuracy in both rendered and real images.
arXiv Detail & Related papers (2024-12-24T18:58:43Z)
GRA: Detecting Oriented Objects through Group-wise Rotating and Attention [64.21917568525764]
Group-wise Rotating and Attention (GRA) module is proposed to replace the convolution operations in backbone networks for oriented object detection. GRA can adaptively capture fine-grained features of objects with diverse orientations, comprising two key components: Group-wise Rotating and Group-wise Attention. GRA achieves a new state-of-the-art (SOTA) on the DOTA-v2.0 benchmark, while saving the parameters by nearly 50% compared to the previous SOTA method.
arXiv Detail & Related papers (2024-03-17T07:29:32Z)
VI-Net: Boosting Category-level 6D Object Pose Estimation via Learning Decoupled Rotations on the Spherical Representations [55.25238503204253]
We propose a novel rotation estimation network, termed as VI-Net, to make the task easier. To process the spherical signals, a Spherical Feature Pyramid Network is constructed based on a novel design of SPAtial Spherical Convolution. Experiments on the benchmarking datasets confirm the efficacy of our method, which outperforms the existing ones with a large margin in the regime of high precision.
arXiv Detail & Related papers (2023-08-19T05:47:53Z)
Category-Level 6D Object Pose Estimation with Flexible Vector-Based Rotation Representation [51.67545893892129]
We propose a novel 3D graph convolution based pipeline for category-level 6D pose and size estimation from monocular RGB-D images. We first design an orientation-aware autoencoder with 3D graph convolution for latent feature learning. Then, to efficiently decode the rotation information from the latent feature, we design a novel flexible vector-based decomposable rotation representation.
arXiv Detail & Related papers (2022-12-09T02:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.