Related papers: USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

URL: http://arxiv.org/abs/2502.10636v1
Date: Sat, 15 Feb 2025 02:25:49 GMT
Title: USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
Authors: Hamed Rahimi, Adil Bahaj, Mouad Abrini, Mahdi Khoramshahi, Mounir Ghogho, Mohamed Chetouani,
Abstract summary: We propose User-VLM 360deg, a holistic framework integrating multimodal user modeling with bias-aware optimization.<n>Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360deg socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata.
Score: 6.2486440301992605
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg} socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.

Related papers

Robust Relevance Feedback for Interactive Known-Item Video Search [30.382002857815497]
We introduce a pairwise relative judgment feedback that improves the stability of top-k selections.<n>We decompose user perception into multiple sub-perceptions, each represented as an independent embedding space.<n>We develop a predictive user model that estimates the combination of sub-perceptions based on each user feedback instance.
arXiv Detail & Related papers (2025-05-21T05:31:49Z)
EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions [0.6650227510403052]
This paper introduces a novel dataset designed to assess and improve small language models deployable on edge devices.<n>At the core of the dataset are structured user profiles, each defined by a set of routines.<n>A large language model (LLM) generates corresponding interaction sessions that simulate realistic, diverse, and context-aware dialogues.
arXiv Detail & Related papers (2025-05-16T16:29:21Z)
Anyprefer: An Agentic Framework for Preference Data Synthesis [62.3856754548222]
We propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model. external tools are introduced to assist the judge model in accurately rewarding the target model's responses. The synthesized data is compiled into a new preference dataset, Anyprefer-V1, consisting of 58K high-quality preference pairs.
arXiv Detail & Related papers (2025-04-27T15:21:59Z)
Reasoning LLMs for User-Aware Multimodal Conversational Agents [3.533721662684487]
Personalization in social robotics is critical for fostering effective human-robot interactions. This paper proposes a novel framework called USER-LLM R1 for a user-aware conversational agent. Our approach integrates chain-of-thought (CoT) reasoning models to iteratively infer user preferences and vision-language models.
arXiv Detail & Related papers (2025-04-02T13:00:17Z)
Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences. This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z)
Uncertain Multi-Objective Recommendation via Orthogonal Meta-Learning Enhanced Bayesian Optimization [30.031396809114625]
We introduce a novel framework that categorizes RS autonomy into five distinct levels, ranging from basic rule-based accuracy-driven systems to behavior-aware, uncertain multi-objective RSs. We propose an approach that dynamically identifies and optimize multiple objectives based on individual user preferences, fostering more ethical and intelligent user-centric recommendations.
arXiv Detail & Related papers (2025-02-18T08:10:09Z)
DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling [38.18345641589625]
We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization.<n>Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
arXiv Detail & Related papers (2025-02-16T11:02:37Z)
Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony [55.26315526382004]
We propose a novel framework, Combo, for co-speech holistic 3D human motion generation. In particular, we identify that one fundamental challenge as the multiple-input-multiple-output nature of the generative model of interest. Combo is highly effective in generating high-quality motions but also efficient in transferring identity and emotion.
arXiv Detail & Related papers (2024-08-18T07:48:49Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings [0.5461938536945723]
We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings. Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant.
arXiv Detail & Related papers (2024-05-06T20:51:28Z)
Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input. We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z)
Parameter-free Dynamic Graph Embedding for Link Prediction [18.104685554457394]
FreeGEM is a parameter-free dynamic graph embedding method for link prediction. We show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency.
arXiv Detail & Related papers (2022-10-15T04:17:09Z)
Improving Personality Consistency in Conversation by Persona Extending [22.124187337032946]
We propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, Persona Retrieval Model (PRM) and Posterior-scored Transformer (PS-Transformer) Our proposed model yields considerable improvements in both automatic metrics and human evaluations.
arXiv Detail & Related papers (2022-08-23T09:00:58Z)
Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions. These inferences can be made regardless of prior knowledge and across different types of user behavior. We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
Personalization in Human-AI Teams: Improving the Compatibility-Accuracy Tradeoff [0.0]
We study the trade-off between improving the system's accuracy following an update and the compatibility of the updated system with prior user experience. We show that by personalizing the loss function to specific users, in some cases it is possible to improve the compatibility-accuracy trade-off with respect to these users.
arXiv Detail & Related papers (2020-04-05T19:35:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.