USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
- URL: http://arxiv.org/abs/2502.10636v2
- Date: Fri, 28 Feb 2025 09:38:19 GMT
- Title: USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
- Authors: Hamed Rahimi, Adil Bahaj, Mouad Abrini, Mahdi Khoramshahi, Mounir Ghogho, Mohamed Chetouani,
- Abstract summary: We propose User-VLM 360deg, a holistic framework integrating multimodal user modeling with bias-aware optimization.<n>Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360deg socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata.
- Score: 6.2486440301992605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg} socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.
Related papers
- Learning Personalized Agents from Human Feedback [36.47803872623135]
We introduce Personalized Agents from Human Feedback (PAHF), a framework for continual personalization.<n>PAHF learns online from live interaction using explicit per-user memory.<n> benchmarks quantify an agent's ability to learn initial preferences from scratch and subsequently adapt to persona shifts.
arXiv Detail & Related papers (2026-02-18T04:18:47Z) - Synthetic Interaction Data for Scalable Personalization in Large Language Models [67.31884245564086]
We introduce a high-fidelity synthetic data generation framework called PersonaGym.<n>Unlike prior work that treats personalization as static persona-preference pairs, PersonaGym models a dynamic preference process.<n>We release PersonaAtlas, a large-scale, high-quality, and diverse synthetic dataset of high-fidelity multi-turn personalized interaction trajectories.
arXiv Detail & Related papers (2026-02-12T20:41:22Z) - P-GenRM: Personalized Generative Reward Model with Test-time User-based Scaling [66.55381105691818]
We propose P-GenRM, the first Personalized Generative Reward Model with test-time user-based scaling.<n>P-GenRM transforms preference signals into structured evaluation chains that derive adaptive personas and scoring rubrics.<n>It further clusters users into User Prototypes and introduces a dual-granularity scaling mechanism.
arXiv Detail & Related papers (2026-02-12T16:07:22Z) - A Cloud-Based Cross-Modal Transformer for Emotion Recognition and Adaptive Human-Computer Interaction [4.6927139685668315]
Cloud-Based Cross-Modal Transformer (CMT) framework for multimodal emotion recognition and adaptive human-computer interaction.<n>Model integrates visual, auditory, and textual signals using pretrained encoders.<n>System enables scalable, low-latency emotion recognition for large-scale user interactions.
arXiv Detail & Related papers (2025-11-21T17:29:16Z) - Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It [81.50711040539566]
Current large language model (LLM) development treats task-solving and preference alignment as separate challenges.<n>We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks.<n>Our framework creates scenarios where identical questions require different reasoning chains depending on user context.
arXiv Detail & Related papers (2025-09-30T18:55:28Z) - RoboView-Bias: Benchmarking Visual Bias in Embodied Agents for Robotic Manipulation [67.38036090822982]
We propose RoboView-Bias, the first benchmark specifically designed to quantify visual bias in robotic manipulation.<n>We create 2,127 task instances that enable robust measurement of biases induced by individual visual factors and their interactions.<n>Our results highlight that systematic analysis of visual bias is a prerequisite for developing safe and reliable general-purpose embodied agents.
arXiv Detail & Related papers (2025-09-26T13:53:25Z) - HumAIne-Chatbot: Real-Time Personalized Conversational AI via Reinforcement Learning [0.4931504898146351]
textbfHumAIne-chatbot is an AI-driven conversational agent that personalizes responses through a novel user profiling framework.<n>During live interactions, an online reinforcement learning agent refines per-user models by combining implicit signals.<n>Results show consistent improvements in user satisfaction, personalization accuracy, and task achievement when personalization features were enabled.
arXiv Detail & Related papers (2025-09-04T15:16:38Z) - Robust Relevance Feedback for Interactive Known-Item Video Search [30.382002857815497]
We introduce a pairwise relative judgment feedback that improves the stability of top-k selections.<n>We decompose user perception into multiple sub-perceptions, each represented as an independent embedding space.<n>We develop a predictive user model that estimates the combination of sub-perceptions based on each user feedback instance.
arXiv Detail & Related papers (2025-05-21T05:31:49Z) - EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions [0.6650227510403052]
This paper introduces a novel dataset designed to assess and improve small language models deployable on edge devices.<n>At the core of the dataset are structured user profiles, each defined by a set of routines.<n>A large language model (LLM) generates corresponding interaction sessions that simulate realistic, diverse, and context-aware dialogues.
arXiv Detail & Related papers (2025-05-16T16:29:21Z) - Anyprefer: An Agentic Framework for Preference Data Synthesis [62.3856754548222]
We propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model.
external tools are introduced to assist the judge model in accurately rewarding the target model's responses.
The synthesized data is compiled into a new preference dataset, Anyprefer-V1, consisting of 58K high-quality preference pairs.
arXiv Detail & Related papers (2025-04-27T15:21:59Z) - Reasoning LLMs for User-Aware Multimodal Conversational Agents [3.533721662684487]
Personalization in social robotics is critical for fostering effective human-robot interactions.
This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent.
Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models.
arXiv Detail & Related papers (2025-04-02T13:00:17Z) - Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences.
This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z) - Uncertain Multi-Objective Recommendation via Orthogonal Meta-Learning Enhanced Bayesian Optimization [30.031396809114625]
We introduce a novel framework that categorizes RS autonomy into five distinct levels, ranging from basic rule-based accuracy-driven systems to behavior-aware, uncertain multi-objective RSs.
We propose an approach that dynamically identifies and optimize multiple objectives based on individual user preferences, fostering more ethical and intelligent user-centric recommendations.
arXiv Detail & Related papers (2025-02-18T08:10:09Z) - DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling [38.18345641589625]
We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization.<n>Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
arXiv Detail & Related papers (2025-02-16T11:02:37Z) - Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony [55.26315526382004]
We propose a novel framework, Combo, for co-speech holistic 3D human motion generation.
In particular, we identify that one fundamental challenge as the multiple-input-multiple-output nature of the generative model of interest.
Combo is highly effective in generating high-quality motions but also efficient in transferring identity and emotion.
arXiv Detail & Related papers (2024-08-18T07:48:49Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings [0.5461938536945723]
We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings.
Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant.
arXiv Detail & Related papers (2024-05-06T20:51:28Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Parameter-free Dynamic Graph Embedding for Link Prediction [18.104685554457394]
FreeGEM is a parameter-free dynamic graph embedding method for link prediction.
We show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency.
arXiv Detail & Related papers (2022-10-15T04:17:09Z) - Improving Personality Consistency in Conversation by Persona Extending [22.124187337032946]
We propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, Persona Retrieval Model (PRM) and Posterior-scored Transformer (PS-Transformer)
Our proposed model yields considerable improvements in both automatic metrics and human evaluations.
arXiv Detail & Related papers (2022-08-23T09:00:58Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Personalization in Human-AI Teams: Improving the Compatibility-Accuracy
Tradeoff [0.0]
We study the trade-off between improving the system's accuracy following an update and the compatibility of the updated system with prior user experience.
We show that by personalizing the loss function to specific users, in some cases it is possible to improve the compatibility-accuracy trade-off with respect to these users.
arXiv Detail & Related papers (2020-04-05T19:35:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.