Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search
- URL: http://arxiv.org/abs/2511.18929v2
- Date: Tue, 25 Nov 2025 04:59:26 GMT
- Title: Human-Centric Open-Future Task Discovery: Formulation, Benchmark, and Scalable Tree-Based Search
- Authors: Zijian Song, Xiaoxin Lin, Tao Pu, Zhenlong Yuan, Guangrun Wang, Liang Lin,
- Abstract summary: We formalize the problem of Human-centric Open-future Task Discovery (HOTD), focusing on identifying tasks that reduce human effort across multiple plausible futures.<n>To facilitate this study, we propose an HOTD-Bench, which features over 2K real-world videos, a semi-automated annotation pipeline, and a simulation-based protocol tailored for open-set future evaluation.<n>We also propose the Collaborative Multi-Agent Search Tree (CMAST), which decomposes the complex reasoning through a multi-agent system and structures the reasoning process through a scalable search tree module.
- Score: 55.96277616578607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in robotics and embodied AI is largely driven by Large Multimodal Models (LMMs). However, a key challenge remains underexplored: how can we advance LMMs to discover tasks that directly assist humans in open-future scenarios, where human intentions are highly concurrent and dynamic. In this work, we formalize the problem of Human-centric Open-future Task Discovery (HOTD), focusing particularly on identifying tasks that reduce human effort across multiple plausible futures. To facilitate this study, we propose an HOTD-Bench, which features over 2K real-world videos, a semi-automated annotation pipeline, and a simulation-based protocol tailored for open-set future evaluation. Additionally, we propose the Collaborative Multi-Agent Search Tree (CMAST) framework, which decomposes the complex reasoning through a multi-agent system and structures the reasoning process through a scalable search tree module. In our experiments, CMAST achieves the best performance on the HOTD-Bench, significantly surpassing existing LMMs. It also integrates well with existing LMMs, consistently improving performance.
Related papers
- MADD: Multi-Agent Drug Discovery Orchestra [27.45459097009959]
We present MADD, a multi-agent system that builds and executes customized hit identification pipelines from natural language queries.<n>We pioneer the application of AI-first drug design to five biological targets and release the identified hit molecules.
arXiv Detail & Related papers (2025-11-11T13:20:35Z) - SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z) - MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z) - MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability [106.35604230971396]
Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
arXiv Detail & Related papers (2025-05-26T17:58:50Z) - Human-Robot Collaborative Minimum Time Search through Sub-priors in Ant Colony Optimization [3.04478108783992]
This paper presents an extension of the Ant Colony Optimization (ACO) meta-heuristic to solve the Minimum Time Search (MTS) task.
The proposed model consists of two main blocks. The first one is a convolutional neural network (CNN) that provides the prior probabilities about where an object may be from a segmented image.
The second one is the Sub-prior MTS-ACO algorithm (SP-MTS-ACO), which takes as inputs the prior probabilities and the particular search preferences of the agents in different sub-priors to generate search plans for all agents.
arXiv Detail & Related papers (2024-10-01T08:57:28Z) - MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines [95.48317207225378]
Large Multimodal Models (LMMs) have made impressive strides in AI search engines.<n>But, whether they can function as AI search engines remains under-explored.<n>We first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities.
arXiv Detail & Related papers (2024-09-19T17:59:45Z) - VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents [50.12414817737912]
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents.
Existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments.
VisualAgentBench (VAB) is a pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents.
arXiv Detail & Related papers (2024-08-12T17:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.