Yes, this Way! Learning to Ground Referring Expressions into Actions
with Intra-episodic Feedback from Supportive Teachers
- URL: http://arxiv.org/abs/2305.12880v1
- Date: Mon, 22 May 2023 10:01:15 GMT
- Title: Yes, this Way! Learning to Ground Referring Expressions into Actions
with Intra-episodic Feedback from Supportive Teachers
- Authors: Philipp Sadler, Sherzod Hakimov and David Schlangen
- Abstract summary: We present an initial study that evaluates intra-episodic feedback given in a collaborative setting.
Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity.
- Score: 15.211628096103475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to pick up on language signals in an ongoing interaction is
crucial for future machine learning models to collaborate and interact with
humans naturally. In this paper, we present an initial study that evaluates
intra-episodic feedback given in a collaborative setting. We use a referential
language game as a controllable example of a task-oriented collaborative joint
activity. A teacher utters a referring expression generated by a well-known
symbolic algorithm (the "Incremental Algorithm") as an initial instruction and
then monitors the follower's actions to possibly intervene with intra-episodic
feedback (which does not explicitly have to be requested). We frame this task
as a reinforcement learning problem with sparse rewards and learn a follower
policy for a heuristic teacher. Our results show that intra-episodic feedback
allows the follower to generalize on aspects of scene complexity and performs
better than providing only the initial statement.
Related papers
- Prosody as a Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications [2.8243597585456017]
This paper advocates for the integration of prosody as a teaching signal to enhance agent learning from human teachers.
Our findings suggest that prosodic features, when coupled with explicit feedback, can enhance reinforcement learning outcomes.
arXiv Detail & Related papers (2024-10-31T01:51:23Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - YODA: Teacher-Student Progressive Learning for Language Models [82.0172215948963]
This paper introduces YODA, a teacher-student progressive learning framework.
It emulates the teacher-student education process to improve the efficacy of model fine-tuning.
Experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain.
arXiv Detail & Related papers (2024-01-28T14:32:15Z) - PapagAI:Automated Feedback for Reflective Essays [48.4434976446053]
We present the first open-source automated feedback tool based on didactic theory and implemented as a hybrid AI system.
The main objective of our work is to enable better learning outcomes for students and to complement the teaching activities of lecturers.
arXiv Detail & Related papers (2023-07-10T11:05:51Z) - "You might think about slightly revising the title": identifying hedges
in peer-tutoring interactions [1.0466434989449724]
Hedges play an important role in the management of conversational interaction.
We use a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges.
We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations.
arXiv Detail & Related papers (2023-06-18T12:47:54Z) - Learning Intuitive Policies Using Action Features [7.260481131198059]
We investigate the effect of network architecture on the propensity of learning algorithms to exploit semantic relationships.
We find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies.
arXiv Detail & Related papers (2022-01-29T20:54:52Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Interaction-Grounded Learning [24.472306647094253]
We propose Interaction-Grounded Learning, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction.
arXiv Detail & Related papers (2021-06-09T08:13:29Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Learning Rewards from Linguistic Feedback [30.30912759796109]
We explore unconstrained natural language feedback as a learning signal for artificial agents.
We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards.
arXiv Detail & Related papers (2020-09-30T14:51:00Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.