Retrieval-based Disentangled Representation Learning with Natural
Language Supervision
- URL: http://arxiv.org/abs/2212.07699v2
- Date: Sat, 10 Feb 2024 10:12:22 GMT
- Title: Retrieval-based Disentangled Representation Learning with Natural
Language Supervision
- Authors: Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Lei Chen
- Abstract summary: We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
- Score: 61.75109410513864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Disentangled representation learning remains challenging as the underlying
factors of variation in the data do not naturally exist. The inherent
complexity of real-world data makes it unfeasible to exhaustively enumerate and
encapsulate all its variations within a finite set of factors. However, it is
worth noting that most real-world data have linguistic equivalents, typically
in the form of textual descriptions. These linguistic counterparts can
represent the data and effortlessly decomposed into distinct tokens. In light
of this, we present Vocabulary Disentangled Retrieval (VDR), a retrieval-based
framework that harnesses natural language as proxies of the underlying data
variation to drive disentangled representation learning. Our approach employ a
bi-encoder model to represent both data and natural language in a vocabulary
space, enabling the model to distinguish dimensions that capture intrinsic
characteristics within data through its natural language counterpart, thus
facilitating disentanglement. We extensively assess the performance of VDR
across 15 retrieval benchmark datasets, covering text-to-text and cross-modal
retrieval scenarios, as well as human evaluation. Our experimental results
compellingly demonstrate the superiority of VDR over previous bi-encoder
retrievers with comparable model size and training costs, achieving an
impressive 8.7% improvement in NDCG@10 on the BEIR benchmark, a 5.3% increase
on MS COCO, and a 6.0% increase on Flickr30k in terms of mean recall in the
zero-shot setting. Moreover, The results from human evaluation indicate that
interpretability of our method is on par with SOTA captioning models.
Related papers
- Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability [27.16741353384065]
Text-to-vis models often rely on lexical matching between words in the questions and tokens in data schemas.
In this study, we examine the robustness of current text-to-vis models, an area that has not previously been explored.
We propose a novel framework based on Retrieval-Augmented Generation (RAG) technique, named GRED, specifically designed to address input perturbations in two variants.
arXiv Detail & Related papers (2024-04-10T16:12:50Z) - Efficient data selection employing Semantic Similarity-based Graph
Structures for model training [1.5845679507219355]
This paper introduces Semantics for data SAliency in Model performance Estimation (SeSaME)
It is an efficient data sampling mechanism solely based on textual information without passing the data through a compute-heavy model.
The application of this approach is demonstrated in the use case of low-resource automated speech recognition (ASR) models.
arXiv Detail & Related papers (2024-02-22T09:43:53Z) - Dissecting vocabulary biases datasets through statistical testing and
automated data augmentation for artifact mitigation in Natural Language
Inference [3.154631846975021]
We focus on investigating dataset artifacts and developing strategies to address these issues.
We propose several automatic data augmentation strategies spanning character to word levels.
Experiments demonstrate that the proposed approaches effectively enhance model accuracy and reduce biases by up to 0.66% and 1.14%, respectively.
arXiv Detail & Related papers (2023-12-14T08:46:26Z) - BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks.
We develop the first Norwegian retrieval-based model by adapting the REALM framework.
We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z) - On-the-fly Text Retrieval for End-to-End ASR Adaptation [9.304386210911822]
We propose augmenting a transducer-based ASR model with a retrieval language model, which retrieves from an external text corpus plausible completions for a partial ASR hypothesis.
Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets.
arXiv Detail & Related papers (2023-03-20T08:54:40Z) - Learning to Decompose Visual Features with Latent Textual Prompts [140.2117637223449]
We propose Decomposed Feature Prompting (DeFo) to improve vision-language models.
Our empirical study shows DeFo's significance in improving the vision-language models.
arXiv Detail & Related papers (2022-10-09T15:40:13Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.