Related papers: Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations

URL: http://arxiv.org/abs/2507.19947v2
Date: Wed, 30 Jul 2025 13:52:22 GMT
Title: Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations
Authors: Supawich Sitdhipol, Waritwong Sukprasongdee, Ekapol Chuangsuwanich, Rina Tse,
Abstract summary: An uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs.<n>This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features.<n> Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements.
Score: 4.008130792416869
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.

Related papers

Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing [66.04926909181653]
We argue for anthropomimetic uncertainty, meaning that intuitive and trustworthy uncertainty communication requires a degree of linguistic authenticity and personalization to the user.<n>We conclude by pointing out unique factors in human-machine communication of uncertainty and deconstruct the data biases that influence machine uncertainty communication.
arXiv Detail & Related papers (2025-07-11T14:07:22Z)
Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media [12.479554210753664]
This study adopts a generative approach, where stance predictions include explicit, interpretable rationales.<n>We find that incorporating reasoning into stance detection enables the smaller model (FlanT5) to outperform GPT-3.5's zero-shot performance.
arXiv Detail & Related papers (2024-12-13T16:34:39Z)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z)
Offline Risk-sensitive RL with Partial Observability to Enhance Performance in Human-Robot Teaming [1.3980986259786223]
We propose a method to incorporate model uncertainty, thus enabling risk-sensitive sequential decision-making. Experiments were conducted with a group of twenty-six human participants within a simulated robot teleoperation environment.
arXiv Detail & Related papers (2024-02-08T14:27:34Z)
Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans. Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z)
BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios [14.23697277904244]
This paper uses the combined knowledge of Theory of Mind (ToM) and Object-Context Relations to investigate methods for enhancing collaboration between humans and autonomous systems. We propose a novel and challenging multimodal video dataset for assessing the capability of artificial intelligence (AI) systems in predicting human belief states in an object-context scenario.
arXiv Detail & Related papers (2022-06-21T18:29:17Z)
Data-driven emotional body language generation for social robotics [58.88028813371423]
In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration. We implement a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones.
arXiv Detail & Related papers (2022-05-02T09:21:39Z)
HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and Sensing [1.3678064890824186]
The Human Assisted Robotic Planning and Sensing (HARPS) framework is presented for active semantic sensing and planning in human-robot teams. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments. Simulations of a UAV-enabled target search application in a large-scale partially structured environment show significant improvements in time and belief state estimates.
arXiv Detail & Related papers (2021-10-20T00:41:57Z)
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z)
Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs [90.20235972293801]
Aiming to understand how human (false-temporal)-belief-a core socio-cognitive ability unify-would affect human interactions with robots, this paper proposes to adopt a graphical model to the representation of object states, robot knowledge, and human (false-)beliefs. An inference algorithm is derived to fuse individual pg from all robots across multi-views into a joint pg, which affords more effective reasoning inference capability to overcome the errors originated from a single view.
arXiv Detail & Related papers (2020-04-25T23:02:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.