VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction
- URL: http://arxiv.org/abs/2112.04195v1
- Date: Wed, 8 Dec 2021 09:49:28 GMT
- Title: VIRT: Improving Representation-based Models for Text Matching through
Virtual Interaction
- Authors: Dan Li, Yang Yang, Hongyin Tang, Jingang Wang, Tong Xu, Wei Wu, Enhong
Chen
- Abstract summary: We propose a novel textitVirtual InteRacTion mechanism, termed as VIRT, to enable full and deep interaction modeling in representation-based models.
VIRT asks representation-based encoders to conduct virtual interactions to mimic the behaviors as interaction-based models do.
- Score: 50.986371459817256
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: With the booming of pre-trained transformers, remarkable progress has been
made on textual pair modeling to support relevant natural language
applications. Two lines of approaches are developed for text matching:
interaction-based models performing full interactions over the textual pair,
and representation-based models encoding the pair independently with siamese
encoders. The former achieves compelling performance due to its deep
interaction modeling ability, yet with a sacrifice in inference latency. The
latter is efficient and widely adopted for practical use, however, suffers from
severe performance degradation due to the lack of interactions. Though some
prior works attempt to integrate interactive knowledge into
representation-based models, considering the computational cost, they only
perform late interaction or knowledge transferring at the top layers.
Interactive information in the lower layers is still missing, which limits the
performance of representation-based solutions. To remedy this, we propose a
novel \textit{Virtual} InteRacTion mechanism, termed as VIRT, to enable full
and deep interaction modeling in representation-based models without
\textit{actual} inference computations. Concretely, VIRT asks
representation-based encoders to conduct virtual interactions to mimic the
behaviors as interaction-based models do. In addition, the knowledge distilled
from interaction-based encoders is taken as supervised signals to promise the
effectiveness of virtual interactions. Since virtual interactions only happen
at the training stage, VIRT would not increase the inference cost. Furthermore,
we design a VIRT-adapted late interaction strategy to fully utilize the learned
virtual interactive knowledge.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction [27.10256777126629]
This paper showcases the potential of generating human-object interactions without direct training on text-interaction pair data.
We introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion.
By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences in a zero-shot manner.
arXiv Detail & Related papers (2024-03-28T17:59:30Z) - Improving Adversarial Transferability of Vision-Language Pre-training Models through Collaborative Multimodal Interaction [22.393624206051925]
Existing work rarely studies the transferability of attacks on Vision-Language Pre-training models.
We propose a novel attack, called Collaborative Multimodal Interaction Attack (CMI-Attack)
CMI-Attack raises the transfer success rates from ALBEF to TCL, $textCLIP_textViT$ and $textCLIP_textCNN$ by 8.11%-16.75% over state-of-the-art methods.
arXiv Detail & Related papers (2024-03-16T10:32:24Z) - Dyadic Interaction Modeling for Social Behavior Generation [6.626277726145613]
We present an effective framework for creating 3D facial motions in dyadic interactions.
The heart of our framework is Dyadic Interaction Modeling (DIM), a pre-training approach.
Experiments demonstrate the superiority of our framework in generating listener motions.
arXiv Detail & Related papers (2024-03-14T03:21:33Z) - Disentangled Neural Relational Inference for Interpretable Motion
Prediction [38.40799770648501]
We develop a variational auto-encoder framework that integrates graph-based representations and timesequence models.
Our model infers dynamic interaction graphs augmented with interpretable edge features that characterize the interactions.
We validate our approach through extensive experiments on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-01-07T22:49:24Z) - Semantic Interactive Learning for Text Classification: A Constructive
Approach for Contextual Interactions [0.0]
We propose a novel interaction framework called Semantic Interactive Learning for the text domain.
We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that enables more semantic alignment between humans and machines.
We introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples.
arXiv Detail & Related papers (2022-09-07T08:13:45Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - TCL: Transformer-based Dynamic Graph Modelling via Contrastive Learning [87.38675639186405]
We propose a novel graph neural network approach, called TCL, which deals with the dynamically-evolving graph in a continuous-time fashion.
To the best of our knowledge, this is the first attempt to apply contrastive learning to representation learning on dynamic graphs.
arXiv Detail & Related papers (2021-05-17T15:33:25Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.