MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation
- URL: http://arxiv.org/abs/2509.13773v1
- Date: Wed, 17 Sep 2025 07:43:14 GMT
- Title: MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation
- Authors: Zhipeng Bian, Jieming Zhu, Xuyang Xie, Quanyu Dai, Zhou Zhao, Zhenhua Dong,
- Abstract summary: This paper introduces MIRA, a pioneering framework for task instruction recommendation.<n>With MIRA, users can long-press on images or text objects to receive contextually relevant instruction recommendations for executing AI tasks.<n> MIRA has demonstrated substantial improvements in the accuracy of instruction recommendation.
- Score: 61.19099947706954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of generative AI technologies is driving the integration of diverse AI-powered services into smartphones, transforming how users interact with their devices. To simplify access to predefined AI services, this paper introduces MIRA, a pioneering framework for task instruction recommendation that enables intuitive one-touch AI tasking on smartphones. With MIRA, users can long-press on images or text objects to receive contextually relevant instruction recommendations for executing AI tasks. Our work introduces three key innovations: 1) A multimodal large language model (MLLM)-based recommendation pipeline with structured reasoning to extract key entities, infer user intent, and generate precise instructions; 2) A template-augmented reasoning mechanism that integrates high-level reasoning templates, enhancing task inference accuracy; 3) A prefix-tree-based constrained decoding strategy that restricts outputs to predefined instruction candidates, ensuring coherent and intent-aligned suggestions. Through evaluation using a real-world annotated datasets and a user study, MIRA has demonstrated substantial improvements in the accuracy of instruction recommendation. The encouraging results highlight MIRA's potential to revolutionize the way users engage with AI services on their smartphones, offering a more seamless and efficient experience.
Related papers
- AIAP: A No-Code Workflow Builder for Non-Experts with Natural Language and Multi-Agent Collaboration [12.74618436015574]
We introduce AIAP, a no-code platform that integrates natural language input with visual system complexity.<n>A user study involving 32 participants showed that AIAP's AI-generated suggestions, modular, and automatic identification of data, actions, and context significantly improved participants' ability to develop services intuitively.
arXiv Detail & Related papers (2025-08-04T14:36:31Z) - Multi-Agent Actor-Critic Generative AI for Query Resolution and Analysis [1.0124625066746598]
We introduce MASQRAD, a transformative framework for query resolution based on the actor-critic model.<n> MASQRAD is excellent at translating imprecise or ambiguous user inquiries into precise and actionable requests.<n> MASQRAD functions as a sophisticated multi-agent system but "masquerades" to users as a single AI entity.
arXiv Detail & Related papers (2025-02-17T04:03:15Z) - Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.<n>Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.<n>We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval [19.87084105344227]
Composed Image Retrieval (CIR) aims to retrieve target images from candidate set using a hybrid-modality query consisting of a reference image and a relative caption that describes the user intent.<n>We propose CIR-LVLM, a novel framework that leverages the large vision-language model (LVLM) as the powerful user intent-aware encoder.
arXiv Detail & Related papers (2024-12-15T07:09:02Z) - MaestroMotif: Skill Design from Artificial Intelligence Feedback [67.17724089381056]
MaestroMotif is a method for AI-assisted skill design, which yields high-performing and adaptable agents.<n>We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents.
arXiv Detail & Related papers (2024-12-11T16:59:31Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - How to Build an Adaptive AI Tutor for Any Course Using Knowledge Graph-Enhanced Retrieval-Augmented Generation (KG-RAG) [5.305156933641317]
Large Language Models (LLMs) in Intelligent Tutoring Systems (ITS) presents transformative opportunities for personalized education.<n>Current implementations face two critical challenges: maintaining factual accuracy and delivering coherent, context-aware instruction.<n>This paper introduces Knowledge Graph-enhanced Retrieval-Augmented Generation (RAG), a novel framework that integrates structured knowledge representation with context-aware retrieval.
arXiv Detail & Related papers (2023-11-29T15:02:46Z) - New Interaction Paradigm for Complex EDA Software Leveraging GPT [5.386974905314838]
We present SmartonAI, an AI-assisted interaction system that integrates large language models into the EDA workflow.<n>SmartonAI consists of two main components: a ChatCommand that breaks down user instructions into subtasks, and a OneLine that retrieves tailored documentation.
arXiv Detail & Related papers (2023-07-27T09:53:02Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - MONAI Label: A framework for AI-assisted Interactive Labeling of 3D
Medical Images [49.664220687980006]
The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models.
We present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models.
arXiv Detail & Related papers (2022-03-23T12:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.