Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection
- URL: http://arxiv.org/abs/2509.02579v2
- Date: Fri, 10 Oct 2025 05:26:40 GMT
- Title: Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection
- Authors: Mazyar Taghavi, Rahman Farnoosh,
- Abstract summary: This paper introduces a novel Expectation-Maximization based latent variable modeling approach in the context of wildlife protection.<n>By modeling hidden environmental factors and inter-agent dynamics through latent variables, our method enhances exploration and coordination under uncertainty.<n>We implement and evaluate our EM-MARL framework using a custom simulation involving 10 UAVs tasked with patrolling protected habitats of the endangered Iranian leopard.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protecting endangered wildlife from illegal poaching presents a critical challenge, particularly in vast and partially observable environments where real-time response is essential. This paper introduces a novel Expectation-Maximization (EM) based latent variable modeling approach in the context of Multi-Agent Reinforcement Learning (MARL) for Unmanned Aerial Vehicle (UAV) coordination in wildlife protection. By modeling hidden environmental factors and inter-agent dynamics through latent variables, our method enhances exploration and coordination under uncertainty.We implement and evaluate our EM-MARL framework using a custom simulation involving 10 UAVs tasked with patrolling protected habitats of the endangered Iranian leopard. Extensive experimental results demonstrate superior performance in detection accuracy, adaptability, and policy convergence when compared to standard algorithms such as Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG). Our findings underscore the potential of combining EM inference with MARL to improve decentralized decisionmaking in complex, high-stakes conservation scenarios. The full implementation, simulation environment, and training scripts are publicly available on GitHub.
Related papers
- BATIS: Bayesian Approaches for Targeted Improvement of Species Distribution Models [15.029163153558533]
Species distribution models (SDMs) aim to predict species occurrence based on environmental variables.<n>Recent deep learning advances for SDMs have been shown to perform well on complex and heterogeneous datasets.<n>We introduce BATIS, a novel and practical framework wherein prior predictions are updated iteratively using limited observational data.
arXiv Detail & Related papers (2025-10-22T16:42:46Z) - Vision-driven River Following of UAV via Safe Reinforcement Learning using Semantic Dynamics Model [11.29011178752037]
Vision-driven autonomous river following by Unmanned Aerial Vehicles is critical for applications such as rescue, surveillance, and environmental monitoring.<n>We formalize river following as a coverage control problem in which the reward function is submodular, yielding diminishing returns as more unique river segments are visited.<n>We present the Constrained Actor Dynamics Estimator architecture, which integrates the actor, the cost estimator, and SDM for cost advantage estimation to form a model-based SafeRL framework.
arXiv Detail & Related papers (2025-08-13T17:39:09Z) - LLM Meets the Sky: Heuristic Multi-Agent Reinforcement Learning for Secure Heterogeneous UAV Networks [57.27815890269697]
This work focuses on maximizing the secrecy rate in heterogeneous UAV networks (HetUAVNs) under energy constraints.<n>We introduce a Large Language Model (LLM)-guided multi-agent learning approach.<n>Results show that our method outperforms existing baselines in secrecy and energy efficiency.
arXiv Detail & Related papers (2025-07-23T04:22:57Z) - Online Planning for Multi-UAV Pursuit-Evasion in Unknown Environments Using Deep Reinforcement Learning [16.676389371667284]
Multi-UAV pursuit-evasion poses a key challenge for UAV swarm intelligence.<n>We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning.<n>We derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner.
arXiv Detail & Related papers (2024-09-24T08:40:04Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Bayesian Risk-Averse Q-Learning with Streaming Observations [7.330349128557128]
We consider a robust reinforcement learning problem, where a learning agent learns from a simulated training environment.
Observations from the real environment that is out of the agent's control arrive periodically.
We develop a multi-stage Bayesian risk-averse Q-learning algorithm to solve BRMDP with streaming observations from real environment.
arXiv Detail & Related papers (2023-05-18T20:48:50Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Policy Learning for Robust Markov Decision Process with a Mismatched
Generative Model [42.28001762749647]
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent.
We consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments.
Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties.
arXiv Detail & Related papers (2022-03-13T06:37:25Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.