An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid
- URL: http://arxiv.org/abs/2507.18738v1
- Date: Thu, 24 Jul 2025 18:38:51 GMT
- Title: An Explainable Equity-Aware P2P Energy Trading Framework for Socio-Economically Diverse Microgrid
- Authors: Abhijan Theja, Mayukha Pal,
- Abstract summary: This paper proposes a novel framework that integrates multi-objective mixed-integer linear programming (MILP), cooperative game theory, and a dynamic equity-adjustment mechanism driven by reinforcement learning (RL)<n>The framework demonstrates peak demand reductions of up to 72.6%, and significant cooperative gains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Fair and dynamic energy allocation in community microgrids remains a critical challenge, particularly when serving socio-economically diverse participants. Static optimization and cost-sharing methods often fail to adapt to evolving inequities, leading to participant dissatisfaction and unsustainable cooperation. This paper proposes a novel framework that integrates multi-objective mixed-integer linear programming (MILP), cooperative game theory, and a dynamic equity-adjustment mechanism driven by reinforcement learning (RL). At its core, the framework utilizes a bi-level optimization model grounded in Equity-regarding Welfare Maximization (EqWM) principles, which incorporate Rawlsian fairness to prioritize the welfare of the least advantaged participants. We introduce a Proximal Policy Optimization (PPO) agent that dynamically adjusts socio-economic weights in the optimization objective based on observed inequities in cost and renewable energy access. This RL-powered feedback loop enables the system to learn and adapt, continuously striving for a more equitable state. To ensure transparency, Explainable AI (XAI) is used to interpret the benefit allocations derived from a weighted Shapley value. Validated across six realistic scenarios, the framework demonstrates peak demand reductions of up to 72.6%, and significant cooperative gains. The adaptive RL mechanism further reduces the Gini coefficient over time, showcasing a pathway to truly sustainable and fair energy communities.
Related papers
- Provable and Practical In-Context Policy Optimization for Self-Improvement [49.670847804409874]
We study test-time scaling, where a model improves its answer through multi-round self-reflection at inference.<n>We introduce In-Context Policy Optimization (ICPO), in which an agent optimize its response in context using self-assessed or externally observed rewards without modifying its parameters.<n>We propose Minimum-Entropy ICPO (ME-ICPO), a practical algorithm that iteratively uses its response and self-assessed reward to refine its response in-context at inference time.
arXiv Detail & Related papers (2026-03-02T00:21:50Z) - Fairness Aware Reward Optimization [78.85867531002346]
We introduce Fairness Aware Reward Optimization (Faro), an in-processing framework that trains reward models under demographic parity, equalized odds, or counterfactual fairness constraints.<n>We provide the first theoretical analysis of reward-level fairness in LLM alignment.<n>Faro significantly reduces bias and harmful generations while maintaining or improving model quality.
arXiv Detail & Related papers (2026-02-08T03:35:49Z) - Toward a Sustainable Federated Learning Ecosystem: A Practical Least Core Mechanism for Payoff Allocation [71.86087908416255]
We introduce a payoff allocation framework based on the least core (LC) concept.<n>Unlike traditional methods, the LC prioritizes the cohesion of the federation by minimizing the maximum dissatisfaction.<n>Case studies in federated intrusion detection demonstrate that our mechanism correctly identifies pivotal contributors and strategic alliances.
arXiv Detail & Related papers (2026-02-03T11:10:50Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z) - Scalable Fairness Shaping with LLM-Guided Multi-Agent Reinforcement Learning for Peer-to-Peer Electricity Markets [3.7321070110102075]
Peer-to-peer (P2P) energy trading is becoming central to modern distribution systems.<n>A fairness-aware multiagent reinforcement learning framework, FairMarket-RL, is proposed.<n>The framework shifts exchanges toward local P2P trades, lowers consumer costs relative to grid-only procurement, sustains strong fairness across participants, and preserves utility viability.
arXiv Detail & Related papers (2025-08-26T02:25:17Z) - MOHAF: A Multi-Objective Hierarchical Auction Framework for Scalable and Fair Resource Allocation in IoT Ecosystems [0.565395466029518]
This paper proposes a distributed resource allocation mechanism that jointly optimize cost, Quality of Service (QoS), energy efficiency, and fairness.<n>Experiments on the Google Cluster Data trace, comprising 3,553 requests and 888 resources, demonstrate MOHAF's superior allocation efficiency (0.263 compared to Greedy (0.185), First-Price (0.138), and Random (0.101) auctions, while achieving perfect fairness (Jain's index = 1.000).
arXiv Detail & Related papers (2025-08-20T16:25:37Z) - VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets [3.498661956610689]
This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs) in a local energy market (LEM)<n>The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG)<n>We investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers.
arXiv Detail & Related papers (2025-07-26T07:38:27Z) - DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization [55.06360285372418]
Group Relative Policy Optimization is a reinforcement learning method for large reasoning models (LRMs)<n>In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias.<n>We introduce a new Discriminative Constrained Optimization framework for reinforcing LRMs, grounded in the principle of discriminative learning.
arXiv Detail & Related papers (2025-05-18T11:08:32Z) - Integrated Optimization and Game Theory Framework for Fair Cost Allocation in Community Microgrids [0.0]
This paper presents a novel framework integrating multi-objective optimization with cooperative game theory for fair and efficient microgrid operation and cost allocation.<n>Results show peak demand reductions ranging from 7.8% to 62.6%, solar utilization rates reaching 114.8% through effective storage integration, and cooperative gains of up to $1,801.01 per day.
arXiv Detail & Related papers (2025-02-13T04:28:17Z) - Demand Response Optimization MILP Framework for Microgrids with DERs [0.0]
This paper presents a framework for optimizing demand response in a microgrid with solar generation and battery storage systems.<n>The framework incorporates load classification, dynamic price thresholding, and multi-period coordination for optimal DR event scheduling.
arXiv Detail & Related papers (2025-02-12T20:10:51Z) - AlphaDPO: Adaptive Reward Margin for Direct Preference Optimization [45.46582930202524]
$alpha$-DPO is an adaptive preference optimization algorithm for large language models.<n>It balances the policy model and the reference model to achieve personalized reward margins.<n>It consistently outperforms DPO and SimPO across various model settings.
arXiv Detail & Related papers (2024-10-14T04:29:57Z) - Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Evaluation of Prosumer Networks for Peak Load Management in Iran: A Distributed Contextual Stochastic Optimization Approach [0.0]
This paper introduces a novel prosumers network framework aimed at mitigating peak loads in Iran.
A cost-oriented integrated prediction and optimization approach is proposed, empowering prosumers to make informed decisions.
Numerical results highlights that integrating prediction with optimization and implementing a contextual information-sharing network among prosumers significantly reduces peak loads as well as total costs.
arXiv Detail & Related papers (2024-08-31T16:09:38Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Self-Supervised Learning for Large-Scale Preventive Security Constrained DC Optimal Power Flow [20.078717680640214]
Security-Constrained Optimal Power Flow (SCOPF) plays a crucial role in power grid stability but becomes increasingly complex as systems grow.
This paper introduces PDL-SCOPF, a self-supervised end-to-end primal-dual learning framework for producing near-optimal solutions to large-scale SCOPF problems.
arXiv Detail & Related papers (2023-11-29T20:36:35Z) - Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)
We derive regret bounds for CBO-MW that naturally depend on graph-related quantities.
Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Decentralized Reinforcement Learning: Global Decision-Making via Local
Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems.
We derive a class of decentralized reinforcement learning algorithms.
We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.