GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation
Understanding
- URL: http://arxiv.org/abs/2305.09360v3
- Date: Tue, 18 Jul 2023 02:01:14 GMT
- Title: GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation
Understanding
- Authors: Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu
- Abstract summary: GIFT can adapt various Transformer-based pre-trained language models for universal MPC understanding.
Four types of edges are designed to integrate graph-induced signals into attention mechanisms.
- Score: 51.37738394062851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Addressing the issues of who saying what to whom in multi-party conversations
(MPCs) has recently attracted a lot of research attention. However, existing
methods on MPC understanding typically embed interlocutors and utterances into
sequential information flows, or utilize only the superficial of inherent graph
structures in MPCs. To this end, we present a plug-and-play and lightweight
method named graph-induced fine-tuning (GIFT) which can adapt various
Transformer-based pre-trained language models (PLMs) for universal MPC
understanding. In detail, the full and equivalent connections among utterances
in regular Transformer ignore the sparse but distinctive dependency of an
utterance on another in MPCs. To distinguish different relationships between
utterances, four types of edges are designed to integrate graph-induced signals
into attention mechanisms to refine PLMs originally designed for processing
sequential texts. We evaluate GIFT by implementing it into three PLMs, and test
the performance on three downstream tasks including addressee recognition,
speaker identification and response selection. Experimental results show that
GIFT can significantly improve the performance of three PLMs on three
downstream tasks and two benchmarks with only 4 additional parameters per
encoding layer, achieving new state-of-the-art performance on MPC
understanding.
Related papers
- Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - An Efficient End-to-End Transformer with Progressive Tri-modal Attention
for Multi-modal Emotion Recognition [27.96711773593048]
We propose the multi-modal end-to-end transformer (ME2ET), which can effectively model the tri-modal features interaction.
At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy.
At the high-level, we introduce the tri-modal feature fusion layer to explicitly aggregate the semantic representations of three modalities.
arXiv Detail & Related papers (2022-09-20T14:51:38Z) - MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation
Understanding [58.95156916558384]
We present MPC-BERT, a pre-trained model for MPC understanding.
We evaluate MPC-BERT on three downstream tasks including addressee recognition, speaker identification and response selection.
arXiv Detail & Related papers (2021-06-03T01:49:12Z) - Video-aided Unsupervised Grammar Induction [108.53765268059425]
We investigate video-aided grammar induction, which learns a constituency from both unlabeled text and its corresponding video.
Video provides even richer information, including not only static objects but also actions and state changes useful for inducing verb phrases.
We propose a Multi-Modal Compound PCFG model (MMC-PCFG) to effectively aggregate these rich features from different modalities.
arXiv Detail & Related papers (2021-04-09T14:01:36Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.