Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
- URL: http://arxiv.org/abs/2501.15727v1
- Date: Mon, 27 Jan 2025 01:47:57 GMT
- Title: Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
- Authors: Michael Xieyang Liu, Savvas Petridis, Vivian Tsai, Alexander J. Fiannaca, Alex Olwal, Michael Terry, Carrie J. Cai,
- Abstract summary: Gensors is a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs.
In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors.
- Score: 61.17099595835263
- License:
- Abstract: Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we discuss how unique characteristics of MLLMs--such as hallucinations and inconsistent responses--can impact the sensor-creation process. These findings contribute to the design of future intelligent sensing systems that are intuitive and customizable by everyday users.
Related papers
- SensorChat: Answering Qualitative and Quantitative Questions during Long-Term Multimodal Sensor Interactions [7.549011805153971]
We introduce SensorChat, the first end-to-end QA system designed for long-term sensor monitoring.
SensorChat effectively answers both qualitative (requiring high-level reasoning) and quantitative (requiring accurate responses from sensor data) questions in real-world scenarios.
We implement SensorChat and demonstrate its capability for real-time interactions on a cloud server while also being able to run entirely on edge platforms after quantization.
arXiv Detail & Related papers (2025-02-05T04:41:59Z) - Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input [54.81155589931697]
We propose a new task, Collaborative Instance Navigation (CoIN), with dynamic agent-human interaction during navigation.
To address CoIN, we propose a novel method, Agent-user Interaction with UncerTainty Awareness (AIUTA)
AIUTA achieves competitive performance in instance navigation against state-of-the-art methods, demonstrating great flexibility in handling user inputs.
arXiv Detail & Related papers (2024-12-02T08:16:38Z) - ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs [72.13489820420726]
ProSA is a framework designed to evaluate and comprehend prompt sensitivity in large language models.
Our study uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness.
arXiv Detail & Related papers (2024-10-16T09:38:13Z) - SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition [9.072495000412943]
We bridge the gap between wearable sensor technology and personalized AI assistants by enabling Large Language Models (LLMs) to understand time-series tasks like human activity recognition (HAR)
We introduce SensorLLM, a two-stage framework to unlock LLMs' potential for sensor data tasks.
We show that SensorLLM evolves into an effective sensor learner, reasoner, and learner, enabling it to generalize across diverse datasets for HAR tasks.
arXiv Detail & Related papers (2024-10-14T15:30:41Z) - Towards Empathetic Conversational Recommender Systems [77.53167131692]
We propose an empathetic conversational recommender (ECR) framework.
ECR contains two main modules: emotion-aware item recommendation and emotion-aligned response generation.
Our experiments on the ReDial dataset validate the efficacy of our framework in enhancing recommendation accuracy and improving user satisfaction.
arXiv Detail & Related papers (2024-08-30T15:43:07Z) - LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces [1.1137304094345333]
We design an effective prompting framework for Large Language Models (LLMs) on high-level reasoning tasks.
We also design two strategies to enhance performance with long sensor traces, including summarization before reasoning and selective inclusion of historical traces.
Our framework can be implemented in an edge-cloud setup, running small LLMs on the edge for data summarization and performing high-level reasoning on the cloud for privacy preservation.
arXiv Detail & Related papers (2024-03-28T22:06:04Z) - How are Prompts Different in Terms of Sensitivity? [50.67313477651395]
We present a comprehensive prompt analysis based on the sensitivity of a function.
We use gradient-based saliency scores to empirically demonstrate how different prompts affect the relevance of input tokens to the output.
We introduce sensitivity-aware decoding which incorporates sensitivity estimation as a penalty term in the standard greedy decoding.
arXiv Detail & Related papers (2023-11-13T10:52:01Z) - Datasheets for Machine Learning Sensors: Towards Transparency,
Auditability, and Responsibility for Intelligent Sensing [9.686781507805113]
Machine learning (ML) sensors are enabling intelligence at the edge by empowering end-users with greater control over their data.
We introduce a standard template for these ML sensors and discuss and evaluate the design and motivation for each section of the dasheet.
To provide a case study of the application of our template, we also designed and developed two examples for ML sensors performing computer vision-based person detection.
arXiv Detail & Related papers (2023-06-15T04:24:13Z) - A Neural Topical Expansion Framework for Unstructured Persona-oriented
Dialogue Generation [52.743311026230714]
Persona Exploration and Exploitation (PEE) is able to extend the predefined user persona description with semantically correlated content.
PEE consists of two main modules: persona exploration and persona exploitation.
Our approach outperforms state-of-the-art baselines in terms of both automatic and human evaluations.
arXiv Detail & Related papers (2020-02-06T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.