Related papers: Goal inference with Rao-Blackwellized Particle Filters

Goal inference with Rao-Blackwellized Particle Filters

URL: http://arxiv.org/abs/2512.09269v1
Date: Wed, 10 Dec 2025 02:48:55 GMT
Title: Goal inference with Rao-Blackwellized Particle Filters
Authors: Yixuan Wang, Dan P. Guralnik, Warren E. Dixon,
Abstract summary: Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem.<n>We study such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF)<n>We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics.
Score: 5.633221187382381
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.

Related papers

Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z)
Distributional Reinforcement Learning with Diffusion Bridge Critics [57.70134665595571]
We propose a novel distributional reinforcement learning method with Diffusion Bridge Critics (DBC)<n>DBC directly models the inverse cumulative distribution function (CDF) of the Q value.<n>We derive an analytic integral formula to address discretization errors in DBC.
arXiv Detail & Related papers (2026-02-05T15:40:14Z)
Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies [4.249024052507976]
We propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples.<n>By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample.<n>We show that existing noise-expectation and gradient-expectation methods are two specific instances within this broader class.
arXiv Detail & Related papers (2026-01-13T01:58:24Z)
Chicken Swarm Kernel Particle Filter: A Structured Rejuvenation Approach with KLD-Efficient Sampling [0.0]
Particle filters (PFs) are often combined with swarm intelligence (SI) algorithms, such as Chicken Swarm Optimization (CSO)<n>This paper investigates the theoretical interaction between SI-based rejuvenation kernels and Kullback--Leibler divergence (KLD) sampling.
arXiv Detail & Related papers (2025-11-15T13:55:29Z)
G$^2$RPO: Granular GRPO for Precise Reward in Flow Models [74.21206048155669]
We propose a novel Granular-GRPO (G$2$RPO) framework that achieves precise and comprehensive reward assessments of sampling directions.<n>We introduce a Multi-Granularity Advantage Integration module that aggregates advantages computed at multiple diffusion scales.<n>Our G$2$RPO significantly outperforms existing flow-based GRPO baselines.
arXiv Detail & Related papers (2025-10-02T12:57:12Z)
Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling [70.8832906871441]
We study how to steer generation toward desired rewards without retraining the models.<n>Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement.<n>We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity.
arXiv Detail & Related papers (2025-07-11T08:00:47Z)
Training-Free Stein Diffusion Guidance: Posterior Correction for Sampling Beyond High-Density Regions [46.59494117137471]
Training free diffusion guidance provides a flexible way to leverage off-the-shelf classifiers without additional training.<n>We introduce Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective.<n>Experiments on molecular low-density sampling tasks suggest that SDG consistently surpasses standard training-free guidance methods.
arXiv Detail & Related papers (2025-07-07T21:14:27Z)
Rectified Diffusion Guidance for Conditional Generation [94.83538269086613]
We revisit the theory behind CFG and rigorously confirm that the improper combination coefficients (textiti.e.) brings about expectation shift the generative distribution.<n>We show that our approach enjoys a textbftextitform solution given the strength.<n> Empirical evidence on real-world data demonstrate the compatibility of our design with existing state-of-the-art diffusion models.
arXiv Detail & Related papers (2024-10-24T13:41:32Z)
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks.<n>In this paper, we propose that the intrinsicity in the DBP process is the primary factor driving robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z)
An analysis of the noise schedule for score-based generative models [7.180235086275926]
Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target.<n>Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances.<n>We establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule.
arXiv Detail & Related papers (2024-02-07T08:24:35Z)
A Bayesian Semiparametric Method For Estimating Causal Quantile Effects [1.1118668841431563]
We propose a semiparametric conditional distribution regression model that allows inference on any functionals of counterfactual distributions. We show via simulations that the use of double balancing score for confounding adjustment improves performance over adjusting for any single score alone. We apply the proposed method to the North Carolina birth weight dataset to analyze the effect of maternal smoking on infant's birth weight.
arXiv Detail & Related papers (2022-11-03T05:15:18Z)
On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively. Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks. Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.