Learning and Optimization with 3D Orientations
- URL: http://arxiv.org/abs/2509.17274v1
- Date: Sun, 21 Sep 2025 23:11:03 GMT
- Title: Learning and Optimization with 3D Orientations
- Authors: Alexandros Ntagkas, Constantinos Tsakonas, Chairi Kiourt, Konstantinos Chatzilygeroudis,
- Abstract summary: This paper presents clearly, concisely and with unified notation all available representations, and "tricks" related to 3D orientations.<n>We benchmark them in representative scenarios.<n>We provide guidelines depending on the scenario, and make available a reference implementation of all the orientation math described.
- Score: 39.146761527401424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There exist numerous ways of representing 3D orientations. Each representation has both limitations and unique features. Choosing the best representation for one task is often a difficult chore, and there exist conflicting opinions on which representation is better suited for a set of family of tasks. Even worse, when dealing with scenarios where we need to learn or optimize functions with orientations as inputs and/or outputs, the set of possibilities (representations, loss functions, etc.) is even larger and it is not easy to decide what is best for each scenario. In this paper, we attempt to a) present clearly, concisely and with unified notation all available representations, and "tricks" related to 3D orientations (including Lie Group algebra), and b) benchmark them in representative scenarios. The first part feels like it is missing from the robotics literature as one has to read many different textbooks and papers in order have a concise and clear understanding of all possibilities, while the benchmark is necessary in order to come up with recommendations based on empirical evidence. More precisely, we experiment with the following settings that attempt to cover most widely used scenarios in robotics: 1) direct optimization, 2) imitation/supervised learning with a neural network controller, 3) reinforcement learning, and 4) trajectory optimization using differential dynamic programming. We finally provide guidelines depending on the scenario, and make available a reference implementation of all the orientation math described.
Related papers
- Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation [90.90219129619344]
This paper presents a novel R-prior-S, Recurrent Geometric-priormodal Policy with Spiking features.<n>To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases.<n>For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network.
arXiv Detail & Related papers (2026-01-13T23:36:30Z) - Learning from Streaming Video with Orthogonal Gradients [62.51504086522027]
We address the challenge of representation learning from a continuous stream of video as input, in a self-supervised manner.<n>This differs from the standard approaches to video learning where videos are chopped and shuffled during training in order to create a non-redundant batch.<n>We demonstrate the drop in performance when moving from shuffled to sequential learning on three tasks.
arXiv Detail & Related papers (2025-04-02T17:59:57Z) - Aligning Multimodal LLM with Human Preference: A Survey [62.89722942008262]
Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training.<n>Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data.<n>However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment with human preference remain insufficiently addressed.
arXiv Detail & Related papers (2025-03-18T17:59:56Z) - Reinforcement Learning with Lie Group Orientations for Robotics [4.342261315851938]
We propose a simple modification of the network's input and output that adheres to the Lie group structure of orientations.
As a result, we obtain an easy and efficient implementation that is directly usable with existing learning libraries.
We briefly introduce Lie theory specifically for orientations in robotics to motivate and outline our approach.
arXiv Detail & Related papers (2024-09-18T12:50:28Z) - DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning [27.705230758809094]
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots.
We propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences.
DecisionNCE provides an embodied representation learning framework that elegantly extracts both local and global task progression features.
arXiv Detail & Related papers (2024-02-28T07:58:24Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation [73.78984332354636]
CorNav is a novel zero-shot framework for vision-and-language navigation.
It incorporates environmental feedback for refining future plans and adjusting its actions.
It consistently outperforms all baselines in a zero-shot multi-task setting.
arXiv Detail & Related papers (2023-06-17T11:44:04Z) - Active Representation Learning for General Task Space with Applications
in Robotics [44.36398212117328]
We propose an algorithmic framework for textitactive representation learning, where the learner optimally chooses which source tasks to sample from.
We provide several instantiations under this framework, from bilinear and feature-based nonlinear to general nonlinear cases.
Our algorithms outperform baselines by $20%-70%$ on average.
arXiv Detail & Related papers (2023-06-15T08:27:50Z) - Learning with a Mole: Transferable latent spatial representations for
navigation without reconstruction [12.845774297648736]
In most end-to-end learning approaches the representation is latent and usually does not have a clearly defined interpretation.
In this work we propose to learn an actionable representation of the scene independently of the targeted downstream task.
The learned representation is optimized by a blind auxiliary agent trained to navigate with it on multiple short sub episodes branching out from a waypoint.
arXiv Detail & Related papers (2023-06-06T16:51:43Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Learning Downstream Task by Selectively Capturing Complementary
Knowledge from Multiple Self-supervisedly Learning Pretexts [20.764378638979704]
We propose a novel solution by leveraging the attention mechanism to adaptively squeeze suitable representations for the tasks.
Our scheme significantly exceeds current popular pretext-matching based methods in gathering knowledge.
arXiv Detail & Related papers (2022-04-11T16:46:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.