Spoken Language Understanding for Conversational AI: Recent Advances and
Future Direction
- URL: http://arxiv.org/abs/2212.10728v1
- Date: Wed, 21 Dec 2022 02:47:52 GMT
- Title: Spoken Language Understanding for Conversational AI: Recent Advances and
Future Direction
- Authors: Soyeon Caren Han, Siqu Long, Henry Weld, Josiah Poon
- Abstract summary: This tutorial will discuss how the joint task is set up and introduce Spoken Language Understanding/Natural Language Understanding (SLU/NLU) with Deep Learning techniques.
We will describe how the machine uses the latest NLP and Deep Learning techniques to address the joint task.
- Score: 5.829344935864271
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a human communicates with a machine using natural language on the web
and online, how can it understand the human's intention and semantic context of
their talk? This is an important AI task as it enables the machine to construct
a sensible answer or perform a useful action for the human. Meaning is
represented at the sentence level, identification of which is known as intent
detection, and at the word level, a labelling task called slot filling. This
dual-level joint task requires innovative thinking about natural language and
deep learning network design, and as a result, many approaches and models have
been proposed and applied.
This tutorial will discuss how the joint task is set up and introduce Spoken
Language Understanding/Natural Language Understanding (SLU/NLU) with Deep
Learning techniques. We will cover the datasets, experiments and metrics used
in the field. We will describe how the machine uses the latest NLP and Deep
Learning techniques to address the joint task, including recurrent and
attention-based Transformer networks and pre-trained models (e.g. BERT). We
will then look in detail at a network that allows the two levels of the task,
intent classification and slot filling, to interact to boost performance
explicitly. We will do a code demonstration of a Python notebook for this model
and attendees will have an opportunity to watch coding demo tasks on this joint
NLU to further their understanding.
Related papers
- ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding [67.63933036920012]
Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location.
This study presents ClawMachine, offering a new methodology that notates an entity directly using the visual tokens.
ClawMachine unifies visual referring and grounding into an auto-regressive format and learns with a decoder-only architecture.
arXiv Detail & Related papers (2024-06-17T08:39:16Z) - Object-Centric Instruction Augmentation for Robotic Manipulation [29.491990994901666]
We introduce the textitObject-Centric Instruction Augmentation (OCI) framework to augment highly semantic and information-dense language instruction with position cues.
We utilize a Multi-modal Large Language Model (MLLM) to weave knowledge of object locations into natural language instruction.
We demonstrate that robotic manipulator imitation policies trained with our enhanced instructions outperform those relying solely on traditional language instructions.
arXiv Detail & Related papers (2024-01-05T13:54:45Z) - Robotic Skill Acquisition via Instruction Augmentation with
Vision-Language Models [70.82705830137708]
We introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL)
We utilize semi-language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data.
DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
arXiv Detail & Related papers (2022-11-21T18:56:00Z) - Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [119.29555551279155]
Large language models can encode a wealth of semantic knowledge about the world.
Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language.
We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions.
arXiv Detail & Related papers (2022-04-04T17:57:11Z) - Unified Multimodal Pre-training and Prompt-based Tuning for
Vision-Language Understanding and Generation [86.26522210882699]
We propose Unified multimodal pre-training for both Vision-Language understanding and generation.
The proposed UniVL is capable of handling both understanding tasks and generative tasks.
Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model.
arXiv Detail & Related papers (2021-12-10T14:59:06Z) - Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI.
The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL)
We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z) - Learning Language-Conditioned Robot Behavior from Offline Data and
Crowd-Sourced Annotation [80.29069988090912]
We study the problem of learning a range of vision-based manipulation tasks from a large offline dataset of robot interaction.
We propose to leverage offline robot datasets with crowd-sourced natural language labels.
We find that our approach outperforms both goal-image specifications and language conditioned imitation techniques by more than 25%.
arXiv Detail & Related papers (2021-09-02T17:42:13Z) - AttViz: Online exploration of self-attention for transparent neural
language modeling [7.574392147428978]
We propose AttViz, an online toolkit for exploration of self-attention---real values associated with individual text tokens.
We show how existing deep learning pipelines can produce outputs suitable for AttViz, offering novel visualizations of the attention heads and their aggregations with minimal effort, online.
arXiv Detail & Related papers (2020-05-12T12:21:40Z) - From text saliency to linguistic objects: learning linguistic
interpretable markers with a multi-channels convolutional architecture [2.064612766965483]
We propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process.
We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French.
arXiv Detail & Related papers (2020-04-07T10:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.