Related papers: Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion

Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion

URL: http://arxiv.org/abs/2602.06550v1
Date: Fri, 06 Feb 2026 09:55:05 GMT
Title: Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion
Authors: Jan Benad, Pradeep Kr. Banerjee, Frank Röder, Nihat Ay, Martin V. Butz, Manfred Eppe,
Abstract summary: We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights.<n>This shared modulation imparts an inductive bias matched to actuator inversion, while input/output normalization and random input masking stabilize context inference.<n>For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate discontinuous context-to-dynamics interactions.
Score: 3.335249027791264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode is actuator inversion, where identical actions produce opposite physical effects under a latent binary context. We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights shared across the dynamics model, policy, and action-value function. This shared modulation imparts an inductive bias matched to actuator inversion, while input/output normalization and random input masking stabilize context inference, promoting directionally concentrated representations. We provide theoretical support via an expressivity separation result for hypernetwork modulation, and a variance decomposition with policy-gradient variance bounds that formalize how within-mode compression improves learning under actuator inversion. For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate discontinuous context-to-dynamics interactions. On AIB's held-out actuator-inversion tasks, DMA*-SH achieves zero-shot generalization, outperforming domain randomization by 111.8% and surpassing a standard context-aware baseline by 16.1%.

Related papers

ROAST: Rollout-based On-distribution Activation Steering Technique [16.632201561391366]
Activation steering provides parameter-efficient control over large language models at inference time.<n>We propose ROAST (Rollout-based On-distribution Activation Steering Technique), which estimates steering directions from the model's own on-distribution rollouts via ROC.<n>Our empirical analysis reveals that while activation magnitude correlates moderately with directional consistency, the variance in magnitude is significant and often disproportionate to semantic quality.
arXiv Detail & Related papers (2026-02-15T13:30:26Z)
Spherical Steering: Geometry-Aware Activation Rotation for Language Models [15.078810641141295]
Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining.<n>In this work, we explore Spherical Steering, a training-free primitive that resolves this trade-off through activation rotation.<n>Our method rotates activations along a geodesic toward a target direction, guiding the activation toward the target concept while preserving the integrity of the signal.
arXiv Detail & Related papers (2026-02-09T00:15:47Z)
Joint Embedding Variational Bayes [0.08594140167290097]
Variational Joint Embedding (VJE) is a framework that synthesizes joint embedding and variational inference.<n>VJE enables self-supervised learning of probabilistic representations in a reconstruction-free, non-contrastive setting.
arXiv Detail & Related papers (2026-02-05T13:18:53Z)
UniRoute: Unified Routing Mixture-of-Experts for Modality-Adaptive Remote Sensing Change Detection [6.323154336421137]
UniRoute is a unified framework for modality-adaptive learning.<n>We introduce an Adaptive Receptive Field Routing MoE module to disentangle local spatial details from global semantic context.<n>We also propose a Consistency-Aware Self-Distillation strategy that stabilizes unified training under data-scarce heterogeneous settings.
arXiv Detail & Related papers (2026-01-21T09:21:25Z)
$\mathcal{E}_0$: Enhancing Generalization and Fine-Grained Control in VLA Models via Continuized Discrete Diffusion [65.77755100137728]
We introduce E0, a continuized discrete diffusion framework that formulates action generation as iterative denoising over quantized action tokens.<n>E0 achieves state-of-the-art performance across 14 diverse environments, outperforming strong baselines by 10.7% on average.
arXiv Detail & Related papers (2025-11-26T16:14:20Z)
Balance Equation-based Distributionally Robust Offline Imitation Learning [8.607736795429638]
Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible.<n>Standard IL methods implicitly assume that the environment dynamics remain fixed between training and deployment.<n>We address this challenge through Balance Equation-based Distributionally Robust Offline Learning.<n>We formulate the problem as a distributionally robust optimization over an uncertainty set of transition models, seeking a policy that minimizes the imitation loss under the worst-case transition distribution.
arXiv Detail & Related papers (2025-11-11T07:48:09Z)
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z)
Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
A Unified Approach for Learning the Dynamics of Power System Generators and Inverter-based Resources [12.723995633698514]
inverter-based resources (IBRs) for renewable energy integration and electrification greatly challenges power system dynamic analysis. To account for both synchronous generators (SGs) and IBRs, this work presents an approach for learning the model of an individual dynamic component.
arXiv Detail & Related papers (2024-09-22T14:07:10Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.