Acquiring Grounded Representations of Words with Situated Interactive Instruction
- URL: http://arxiv.org/abs/2502.20754v1
- Date: Fri, 28 Feb 2025 06:04:52 GMT
- Title: Acquiring Grounded Representations of Words with Situated Interactive Instruction
- Authors: Shiwali Mohan, Aaron H. Mininger, James R. Kirk, John E. Laird,
- Abstract summary: We present an approach for acquiring grounded representations of words from mixed-initiative, situated interactions with a human instructor.<n>The work focuses on the acquisition of diverse types of knowledge including perceptual, semantic, and procedural knowledge along with learning grounded meanings.
- Score: 4.049850026698638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present an approach for acquiring grounded representations of words from mixed-initiative, situated interactions with a human instructor. The work focuses on the acquisition of diverse types of knowledge including perceptual, semantic, and procedural knowledge along with learning grounded meanings. Interactive learning allows the agent to control its learning by requesting instructions about unknown concepts, making learning efficient. Our approach has been instantiated in Soar and has been evaluated on a table-top robotic arm capable of manipulating small objects.
Related papers
- Prosody as a Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications [2.8243597585456017]
This paper advocates for the integration of prosody as a teaching signal to enhance agent learning from human teachers.
Our findings suggest that prosodic features, when coupled with explicit feedback, can enhance reinforcement learning outcomes.
arXiv Detail & Related papers (2024-10-31T01:51:23Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation [6.1400257928108575]
This research explores acquiring non-verbal communication skills through learning from demonstrations.
In particular, we focus on imitation learning for artificial agents, exemplified by teaching a simulated humanoid American Sign Language.
We use computer vision and deep learning to extract information from videos, and reinforcement learning to enable the agent to replicate observed actions.
arXiv Detail & Related papers (2024-06-14T13:50:29Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - LiFT: Unsupervised Reinforcement Learning with Foundation Models as
Teachers [59.69716962256727]
We propose a framework that guides a reinforcement learning agent to acquire semantically meaningful behavior without human feedback.
In our framework, the agent receives task instructions grounded in a training environment from large language models.
We demonstrate that our method can learn semantically meaningful skills in a challenging open-ended MineDojo environment.
arXiv Detail & Related papers (2023-12-14T14:07:41Z) - Solving Dialogue Grounding Embodied Task in a Simulated Environment
using Further Masked Language Modeling [0.0]
Our proposed method employs language modeling to enhance task understanding through state-of-the-art (SOTA) methods using language models.
Our experimental results provide compelling evidence of the superiority of our proposed method.
arXiv Detail & Related papers (2023-06-21T17:17:09Z) - From Interactive to Co-Constructive Task Learning [13.493719155524404]
We will review current proposals for interactive task learning and discuss their main contributions.
We then discuss our notion of co-construction and summarize research insights from adult-child and human-robot interactions.
arXiv Detail & Related papers (2023-05-24T19:45:30Z) - Interpreting Neural Policies with Disentangled Tree Representations [58.769048492254555]
We study interpretability of compact neural policies through the lens of disentangled representation.
We leverage decision trees to obtain factors of variation for disentanglement in robot learning.
We introduce interpretability metrics that measure disentanglement of learned neural dynamics.
arXiv Detail & Related papers (2022-10-13T01:10:41Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.