Related papers: UI-Venus-1.5 Technical Report

UI-Venus-1.5 Technical Report

URL: http://arxiv.org/abs/2602.09082v1
Date: Mon, 09 Feb 2026 18:43:40 GMT
Title: UI-Venus-1.5 Technical Report
Authors: Veuns-Team, :, Changlong Gao, Zhangxuan Gu, Yulin Liu, Xinyu Qiu, Shuheng Shen, Yue Wen, Tianyu Xia, Zhenyu Xu, Zhengwen Zeng, Beitong Zhou, Xingran Zhou, Weizhi Chen, Sunhao Dai, Jingya Dou, Yichen Gong, Yuan Guo, Zhenlin Guo, Feng Li, Qian Li, Jinzhen Lin, Yuqi Zhou, Linchao Zhu, Liang Chen, Zhenyu Guo, Changhua Meng, Weiqiang Wang,
Abstract summary: We present UI-Venus-1.5, a unified, end-to-end GUI Agent.<n>The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B)<n>In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps.
Score: 64.4832043785725
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus

Related papers

Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents [56.72789202127874]
The paper introduces GUI-Owl-1.5, the latest native GUI agent model.<n>It supports a range of platforms (desktop, mobile, browser, and more) to enable cloud-edge collaboration and real-time interaction.<n>It achieves state-of-the-art results on more than 20+ GUI benchmarks on open-source models.
arXiv Detail & Related papers (2026-02-15T01:52:19Z)
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution [32.992104943415995]
OmegaUse is a general-purpose GUI agent model for autonomous task execution on both mobile and desktop platforms.<n>It is highly competitive across established GUI benchmarks, achieving a state-of-the-art (SOTA) score of 96.3% on ScreenSpot-V2.<n>It also performs strongly on OS-Nav, reaching 74.24% step success on ChiM-Nav and 55.9% average success on Ubu-Nav.
arXiv Detail & Related papers (2026-01-28T08:45:17Z)
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning [155.51875080423883]
The development of autonomous agents for graphical user interfaces presents major challenges in artificial intelligence.<n>We present UI-TARS-2, a native GUI-centered agent model that addresses these challenges through a systematic training methodology.<n> Empirical evaluation demonstrates that UI-TARS-2 achieves significant improvements over its predecessor UI-TARS-1.5.
arXiv Detail & Related papers (2025-09-02T17:44:45Z)
Mobile-Agent-v3: Fundamental Agents for GUI Automation [59.775510710011325]
This paper introduces a foundational GUI agent model that achieves state-of-the-art performance among open-source end-to-end models.<n>We propose Mobile-Agent-v3, a general-purpose GUI agent framework that further improves performance to 73.3 on AndroidWorld and 37.7 on OSWorld.
arXiv Detail & Related papers (2025-08-21T00:39:12Z)
UI-Venus Technical Report: Building High-performance UI Agents with RFT [43.28453678270454]
We present UI-Venus, a native UI agent that takes only screenshots as input based on a multimodal large language model.<n>It achieves SOTA performance on both UI grounding and navigation tasks using only several hundred thousand high-quality training samples.
arXiv Detail & Related papers (2025-08-14T16:58:07Z)
UI-TARS: Pioneering Automated GUI Interaction with Native Agents [58.18100825673032]
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions.<n>In the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude (22.0 and 14.9 respectively)
arXiv Detail & Related papers (2025-01-21T17:48:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.