Related papers: An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems

An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems

URL: http://arxiv.org/abs/2403.16809v1
Date: Mon, 25 Mar 2024 14:32:28 GMT
Title: An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems
Authors: Hanqing Yang, Marie Siew, Carlee Joe-Wong,
Abstract summary: We present a case study that employs large language models (LLMs) to mimic the behaviors and thermal preferences of various population groups in a shopping mall. The aggregated thermal preferences are integrated into an agent-in-the-loop based reinforcement learning algorithm AitL-RL. Our results show that LLMs are capable of simulating complex population movements within large open spaces.
Score: 13.388869442538399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increasing prevalence of Cyber-Physical Systems and the Internet of Things (CPS-IoT) applications and Foundation Models are enabling new applications that leverage real-time control of the environment. For example, real-time control of Heating, Ventilation and Air-Conditioning (HVAC) systems can reduce its usage when not needed for the comfort of human occupants, hence reducing energy consumption. Collecting real-time feedback on human preferences in such human-in-the-loop (HITL) systems, however, is difficult in practice. We propose the use of large language models (LLMs) to deal with the challenges of dynamic environments and difficult-to-obtain data in CPS optimization. In this paper, we present a case study that employs LLM agents to mimic the behaviors and thermal preferences of various population groups (e.g. young families, the elderly) in a shopping mall. The aggregated thermal preferences are integrated into an agent-in-the-loop based reinforcement learning algorithm AitL-RL, which employs the LLM as a dynamic simulation of the physical environment to learn how to balance between energy savings and occupant comfort. Our results show that LLMs are capable of simulating complex population movements within large open spaces. Besides, AitL-RL demonstrates superior performance compared to the popular existing policy of set point control, suggesting that adaptive and personalized decision-making is critical for efficient optimization in CPS-IoT applications. Through this case study, we demonstrate the potential of integrating advanced Foundation Models like LLMs into CPS-IoT to enhance system adaptability and efficiency. The project's code can be found on our GitHub repository.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Benchmarking LLMs' Swarm intelligence [50.544186914115045]
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) remains largely unexplored.<n>We introduce SwarmBench, a novel benchmark designed to systematically evaluate tasks of LLMs acting as decentralized agents.<n>We propose metrics for coordination effectiveness and analyze emergent group dynamics.
arXiv Detail & Related papers (2025-05-07T12:32:01Z)
Integration of Multi-Mode Preference into Home Energy Management System Using Deep Reinforcement Learning [0.0]
Home Energy Management Systems (HEMS) have emerged as a pivotal tool in the smart home ecosystem.<n>This paper introduces a multi-mode Deep Reinforcement Learning-based HEMS framework, meticulously designed to optimize based on dynamic, consumer-defined preferences.<n>Our results show that the model performs exceptionally well in optimizing energy consumption within different preference modes.
arXiv Detail & Related papers (2025-05-02T15:05:29Z)
Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents. We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z)
LLMs are everywhere: Ubiquitous Utilization of AI Models through Air Computing [6.185645393091031]
This study explores the synergy between Large Language Models (LLMs) and air computing. We present a disaster response case study demonstrating how the collaborative utilization of LLMs and air computing can significantly improve outcomes in critical situations.
arXiv Detail & Related papers (2025-03-02T07:24:34Z)
Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles [52.43886862287498]
This paper explores the integration of large language models (LLMs) into energy management. We propose an innovative solution that enhances LLMs with retrieval-augmented generation for automatic problem formulation, code generation, and customizing optimization. We present a case study to demonstrate the effectiveness of our proposed solution in charging scheduling and optimization for electric vehicles.
arXiv Detail & Related papers (2025-01-26T14:31:03Z)
Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing [3.379854610429579]
Recurrent Large Language Models (R-LLM) have proven effective in mitigating the complexity of self-attention. We propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware.
arXiv Detail & Related papers (2025-01-09T19:13:03Z)
Constrained Reinforcement Learning for Safe Heat Pump Control [24.6591923448048]
We propose a novel building simulator I4B which provides interfaces for different usages. We apply a model-free constrained RL algorithm named constrained Soft Actor-Critic with Linear Smoothed Log Barrier function (CSAC-LB) to the heating optimization problem. Benchmarking against baseline algorithms demonstrates CSAC-LB's efficiency in data exploration, constraint satisfaction and performance.
arXiv Detail & Related papers (2024-09-29T14:15:13Z)
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization [3.1944843830667766]
Large language models (LLMs) have revolutionized how we interact with technology, but their personalization to individual user preferences remains a significant challenge. We present Adaptive Self-Supervised Learning Strategies (ASLS), which utilize self-supervised learning techniques to personalize LLMs dynamically.
arXiv Detail & Related papers (2024-09-25T14:35:06Z)
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy. The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms. We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z)
Value Augmented Sampling for Language Model Alignment and Personalization [39.070662999014836]
We present a new framework for reward optimization, Value Augmented Sampling (VAS) VAS solves for the optimal reward-maximizing policy without co-training the policy and the value function. Our algorithm unlocks the new capability of composing several rewards and controlling the extent of each one during deployment time.
arXiv Detail & Related papers (2024-05-10T17:59:04Z)
Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z)
Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs) Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z)
SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision. We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration [17.365135977882215]
This study fills important voids by introducing innovative features to the policy algorithm. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system.
arXiv Detail & Related papers (2020-06-16T09:09:48Z)
Data-Driven Learning and Load Ensemble Control [1.647866856596524]
This study aims to engage distributed small-scale flexible loads, such as thermostatically controllable loads (TCLs) to provide grid support services. The efficiency of this data-driven learning is demonstrated through simulations on Heating, Cooling & Ventilation units in a testbed neighborhood of residential houses.
arXiv Detail & Related papers (2020-04-20T23:32:10Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.