USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
- URL: http://arxiv.org/abs/2502.10636v1
- Date: Sat, 15 Feb 2025 02:25:49 GMT
- Title: USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions
- Authors: Hamed Rahimi, Adil Bahaj, Mouad Abrini, Mahdi Khoramshahi, Mounir Ghogho, Mohamed Chetouani,
- Abstract summary: We propose User-VLM 360deg, a holistic framework integrating multimodal user modeling with bias-aware optimization.
Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360deg socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata.
- Score: 6.2486440301992605
- License:
- Abstract: The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg} socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.
Related papers
- Uncertain Multi-Objective Recommendation via Orthogonal Meta-Learning Enhanced Bayesian Optimization [30.031396809114625]
We introduce a novel framework that categorizes RS autonomy into five distinct levels, ranging from basic rule-based accuracy-driven systems to behavior-aware, uncertain multi-objective RSs.
We propose an approach that dynamically identifies and optimize multiple objectives based on individual user preferences, fostering more ethical and intelligent user-centric recommendations.
arXiv Detail & Related papers (2025-02-18T08:10:09Z) - DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling [38.18345641589625]
We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization.
Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
arXiv Detail & Related papers (2025-02-16T11:02:37Z) - Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony [55.26315526382004]
We propose a novel framework, Combo, for co-speech holistic 3D human motion generation.
In particular, we identify that one fundamental challenge as the multiple-input-multiple-output nature of the generative model of interest.
Combo is highly effective in generating high-quality motions but also efficient in transferring identity and emotion.
arXiv Detail & Related papers (2024-08-18T07:48:49Z) - DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings [0.5461938536945723]
We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings.
Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant.
arXiv Detail & Related papers (2024-05-06T20:51:28Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Parameter-free Dynamic Graph Embedding for Link Prediction [18.104685554457394]
FreeGEM is a parameter-free dynamic graph embedding method for link prediction.
We show that FreeGEM can outperform the state-of-the-art methods in accuracy while achieving over 36X improvement in efficiency.
arXiv Detail & Related papers (2022-10-15T04:17:09Z) - Improving Personality Consistency in Conversation by Persona Extending [22.124187337032946]
We propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, Persona Retrieval Model (PRM) and Posterior-scored Transformer (PS-Transformer)
Our proposed model yields considerable improvements in both automatic metrics and human evaluations.
arXiv Detail & Related papers (2022-08-23T09:00:58Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.