LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
- URL: http://arxiv.org/abs/2506.09373v2
- Date: Sun, 15 Jun 2025 07:25:21 GMT
- Title: LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization
- Authors: Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, Shiyin Lu, Qifeng Chen,
- Abstract summary: Location Preference Optimization (LPO) is a novel approach that leverages locational data to optimize interaction preferences.<n>LPO uses information entropy to predict interaction positions by focusing on zones rich in information.<n>Our code will be made publicly available soon, at https://github.com/AIDC-AI/LPO.
- Score: 58.65395773049273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of autonomous agents is transforming interactions with Graphical User Interfaces (GUIs) by employing natural language as a powerful intermediary. Despite the predominance of Supervised Fine-Tuning (SFT) methods in current GUI agents for achieving spatial localization, these methods face substantial challenges due to their limited capacity to accurately perceive positional data. Existing strategies, such as reinforcement learning, often fail to assess positional accuracy effectively, thereby restricting their utility. In response, we introduce Location Preference Optimization (LPO), a novel approach that leverages locational data to optimize interaction preferences. LPO uses information entropy to predict interaction positions by focusing on zones rich in information. Besides, it further introduces a dynamic location reward function based on physical distance, reflecting the varying importance of interaction positions. Supported by Group Relative Preference Optimization (GRPO), LPO facilitates an extensive exploration of GUI environments and significantly enhances interaction precision. Comprehensive experiments demonstrate LPO's superior performance, achieving SOTA results across both offline benchmarks and real-world online evaluations. Our code will be made publicly available soon, at https://github.com/AIDC-AI/LPO.
Related papers
- ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [88.74638385288773]
Agentic Replay Policy Optimization improves performance on complex, long-horizon computer tasks.<n>We propose a task selection strategy that filters tasks based on baseline agent performance.<n>Experiments on the OSWorld benchmark demonstrate that ARPO achieves competitive results.
arXiv Detail & Related papers (2025-05-22T06:24:32Z) - ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search [53.40810298627443]
ReGUIDE is a framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism.<n>Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks.
arXiv Detail & Related papers (2025-05-21T08:36:18Z) - A Survey of Direct Preference Optimization [103.59317151002693]
Large Language Models (LLMs) have demonstrated unprecedented generative capabilities.<n>Their alignment with human values remains critical for ensuring helpful and harmless deployments.<n>Direct Preference Optimization (DPO) has recently gained prominence as a streamlined alternative.
arXiv Detail & Related papers (2025-03-12T08:45:15Z) - Robust LLM Alignment via Distributionally Robust Direct Preference Optimization [15.328510632723505]
A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift.<n>We develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO)<n>We demonstrate the superior performance of WDPO and KLDPO in substantially improving the alignment when there is a preference distribution shift.
arXiv Detail & Related papers (2025-02-04T02:03:19Z) - SDPO: Segment-Level Direct Preference Optimization for Social Agents [56.970902914217156]
Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex social dialogues.<n>We propose Segment-Level Direct Preference Optimization (SDPO), which dynamically select key segments within interactions to optimize multi-turn agent behavior.
arXiv Detail & Related papers (2025-01-03T14:09:46Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.