Learning Emergent Gaits with Decentralized Phase Oscillators: on the
role of Observations, Rewards, and Feedback
- URL: http://arxiv.org/abs/2402.08662v2
- Date: Sat, 17 Feb 2024 17:50:28 GMT
- Title: Learning Emergent Gaits with Decentralized Phase Oscillators: on the
role of Observations, Rewards, and Feedback
- Authors: Jenny Zhang, Steve Heim, Se Hwan Jeon, Sangbae Kim
- Abstract summary: We present a minimal phase oscillator model for learning quadrupedal locomotion.
We show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences.
- Score: 16.290816894141003
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We present a minimal phase oscillator model for learning quadrupedal
locomotion. Each of the four oscillators is coupled only to itself and its
corresponding leg through local feedback of the ground reaction force, which
can be interpreted as an observer feedback gain. We interpret the oscillator
itself as a latent contact state-estimator. Through a systematic ablation
study, we show that the combination of phase observations, simple phase-based
rewards, and the local feedback dynamics induces policies that exhibit emergent
gait preferences, while using a reduced set of simple rewards, and without
prescribing a specific gait. The code is open-source, and a video synopsis
available at https://youtu.be/1NKQ0rSV3jU.
Related papers
- Motion-induced directionality of collective emission in a non-chiral waveguide [0.0]
Raman-induced two-level emitters with spatially oscillating phases of the transition dipole enable thermally induced, but controllable directionality of the collective emission.<n>We employ numerical simulations based on the Truncated Wigner Approximation for spins and find good agreement.<n>Our results will enable studies of collective, nonreciprocal interactions in non-chiral systems.
arXiv Detail & Related papers (2026-03-03T14:21:45Z) - Self-Refining Video Sampling [91.0784344916165]
We present self-refining video sampling, a simple method that uses a pre-trained video generator trained on large-scale datasets as its own self-refiner.<n> Experiments on state-of-the-art video generators demonstrate significant improvements in motion coherence and physics alignment.
arXiv Detail & Related papers (2026-01-26T15:22:27Z) - ReViP: Reducing False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance [50.05984919728878]
We present ReViP, a novel VLA framework with Vision-Proprioception Rebalance to enhance visual grounding and robustness under perturbations.<n>Specifically, we use an external VLM as a task-stage observer to extract real-time task-centric visual cues from visual observations.<n>To evaluate false completion, we propose the first False-Completion Benchmark Suite built on LIBERO with controlled settings such as Object-Drop.
arXiv Detail & Related papers (2026-01-23T11:31:07Z) - Phasor Agents: Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning [0.0]
Phasor Agents are dynamical systems whose internal state is a Phasor Graph.<n>Phasor Agents are dynamical systems whose internal state is a Phasor Graph.
arXiv Detail & Related papers (2026-01-07T19:57:02Z) - Autonomous Learning of Attractors for Neuromorphic Computing with Wien Bridge Oscillator Networks [0.0]
We present an energy-based neuromorphic primitive with tunable resistive couplings.<n>We show that learned phase patterns form attractor states and validate this behavior in simulation and hardware.
arXiv Detail & Related papers (2025-12-16T19:33:28Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - Generative Model Inversion Through the Lens of the Manifold Hypothesis [98.37040155914595]
Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models.<n>Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process.
arXiv Detail & Related papers (2025-09-24T14:39:25Z) - Unveiling the Self-Orthogonality at Exceptional Points in Driven $\mathcal{PT}$-Symmetric Systems [79.16635054977068]
We explore the effect of self-orthogonality at exceptional points (EPs) in non-Hermitian Parity-Time-symmetric systems.<n>Using a driven three-band lattice model, we show that the Rabi frequency diverges as the system approaches an EP due to the coalescence of eigenstates.
arXiv Detail & Related papers (2025-07-14T12:53:10Z) - Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? [58.80794196076336]
Distilling large language models (LLMs) typically involves transferring the teacher model's responses through supervised fine-tuning (SFT)
We propose a novel distillation pipeline that transfers both responses and rewards.
Our method generates pseudo-rewards through a self-supervised mechanism that leverages the inherent structure of both teacher and student responses.
arXiv Detail & Related papers (2025-02-26T20:50:11Z) - Reinforcement Learning with Segment Feedback [56.54271464134885]
We consider a model named RL with segment feedback, which offers a general paradigm filling the gap between per-state-action feedback and trajectory feedback.
Under binary feedback, increasing the number of segments $m$ decreases the regret at an exponential rate; in contrast, surprisingly, under sum feedback, increasing $m$ does not reduce the regret significantly.
arXiv Detail & Related papers (2025-02-03T23:08:42Z) - Probing entanglement of a continuous basis system [0.0]
We propose a method to probe entanglement in a non-accessible continuous basis quantum system.
The method is based on our observations about the conservation of entanglement found in a 4 partite system set up constituted by a (qubit-oscillator)-(qubit-oscillator) sub-systems.
arXiv Detail & Related papers (2024-09-12T19:59:06Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Evolution of many-body systems under ancilla quantum measurements [58.720142291102135]
We study the concept of implementing quantum measurements by coupling a many-body lattice system to an ancillary degree of freedom.
We find evidence of a disentangling-entangling measurement-induced transition as was previously observed in more abstract models.
arXiv Detail & Related papers (2023-03-13T13:06:40Z) - Orbit quantization in a retarded harmonic oscillator [0.0]
We analytically predict the value of the first Hopf bifurcation, unleashing a self-oscillatory motion.
When the system is driven very far from equilibrium, a multiscale strange attractor displaying intrinsic and robust intermittency is uncovered.
arXiv Detail & Related papers (2023-01-25T04:47:06Z) - Chiral state transfer under dephasing [0.228438857884398]
We study the effects of dephasing on the encircling dynamics, adopting the full Lindblad master equation.
We show that gaps emerge in the eigenspectra landscape of the corresponding Liouvillian superoperator.
While our results are applicable to several recent experiments, we examine a recent cold-atom experiment in particular, and show that the observed long-time chirality is but limited to the special encircling path therein.
arXiv Detail & Related papers (2022-12-25T07:18:33Z) - Isolation and Impartial Aggregation: A Paradigm of Incremental Learning
without Interference [61.11137714507445]
This paper focuses on the prevalent performance imbalance in the stages of incremental learning.
We propose a stage-isolation based incremental learning framework.
We evaluate the proposed method on four large benchmarks.
arXiv Detail & Related papers (2022-11-29T06:57:48Z) - CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion [4.56877715768796]
We present a method for integrating central pattern generators (CPGs) into the deep reinforcement learning framework to produce robust quadruped locomotion.
We train our policies in simulation and perform a sim-to-real transfer to the Unitree A1 quadruped, where we observe robust behavior to disturbances unseen during training.
arXiv Detail & Related papers (2022-11-01T13:41:13Z) - Critically slow operator dynamics in constrained many-body systems [0.0]
We show that in certain constrained many-body systems the structure of conservation laws can cause a drastic modification of this universal behavior.
We identify a critical point with sub-ballistically moving OTOC front, that separates a ballistic from a dynamically frozen phase.
arXiv Detail & Related papers (2021-06-09T18:00:04Z) - Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning [107.70165026669308]
In offline reinforcement learning (RL) an optimal policy is learned solely from a priori collected observational data.
We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form.
We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of the conditional moment restriction.
arXiv Detail & Related papers (2021-02-19T13:01:40Z) - Reinforcement Learning with Trajectory Feedback [76.94405309609552]
In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as emphtrajectory feedback.
Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory.
We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret.
arXiv Detail & Related papers (2020-08-13T17:49:18Z) - Feedback-induced instabilities and dynamics in the Jaynes-Cummings model [62.997667081978825]
We investigate the coherence and steady-state properties of the Jaynes-Cummings model subjected to time-delayed coherent feedback.
The introduced feedback qualitatively modifies the dynamical response and steady-state quantum properties of the system.
arXiv Detail & Related papers (2020-06-20T10:07:01Z) - Cavityless self-organization of ultracold atoms due to the
feedback-induced phase transition [0.0]
We propose and theoretically investigate a system possessing such a feedback-induced phase transition.
The system contains a Bose-Einstein condensate placed in an optical potential with the depth that is feedback-controlled according to the intensity of the Bragg-reflected probe light.
We show that there is a critical value of the feedback gain where the uniform gas distribution loses its stability and the ordered periodic density distribution emerges.
arXiv Detail & Related papers (2020-02-29T06:42:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.