Pixel Sentence Representation Learning
- URL: http://arxiv.org/abs/2402.08183v1
- Date: Tue, 13 Feb 2024 02:46:45 GMT
- Title: Pixel Sentence Representation Learning
- Authors: Chenghao Xiao, Zhuoxu Huang, Danlu Chen, G Thomas Hudson, Yizhi Li,
Haoran Duan, Chenghua Lin, Jie Fu, Jungong Han, Noura Al Moubayed
- Abstract summary: In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
- Score: 67.4775296225521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models are long known to be subpar in capturing sentence
and document-level semantics. Though heavily investigated, transferring
perturbation-based methods from unsupervised visual representation learning to
NLP remains an unsolved problem. This is largely due to the discreteness of
subword units brought by tokenization of language models, limiting small
perturbations of inputs to form semantics-preserved positive pairs. In this
work, we conceptualize the learning of sentence-level textual semantics as a
visual representation learning process. Drawing from cognitive and linguistic
sciences, we introduce an unsupervised visual sentence representation learning
framework, employing visually-grounded text perturbation methods like typos and
word order shuffling, resonating with human cognitive patterns, and enabling
perturbation to texts to be perceived as continuous. Our approach is further
bolstered by large-scale unsupervised topical alignment training and natural
language inference supervision, achieving comparable performance in semantic
textual similarity (STS) to existing state-of-the-art NLP methods.
Additionally, we unveil our method's inherent zero-shot cross-lingual
transferability and a unique leapfrogging pattern across languages during
iterative training. To our knowledge, this is the first representation learning
method devoid of traditional language models for understanding sentence and
document semantics, marking a stride closer to human-like textual
comprehension. Our code is available at
https://github.com/gowitheflow-1998/Pixel-Linguist
Related papers
- DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - TExplain: Explaining Learned Visual Features via Pre-trained (Frozen) Language Models [14.019349267520541]
We propose a novel method that leverages the capabilities of language models to interpret the learned features of pre-trained image classifiers.
Our approach generates a vast number of sentences to explain the features learned by the classifier for a given image.
Our method, for the first time, utilizes these frequent words corresponding to a visual representation to provide insights into the decision-making process.
arXiv Detail & Related papers (2023-09-01T20:59:46Z) - Learning an Artificial Language for Knowledge-Sharing in Multilingual
Translation [15.32063273544696]
We discretize the latent space of multilingual models by assigning encoder states to entries in a codebook.
We validate our approach on large-scale experiments with realistic data volumes and domains.
We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.
arXiv Detail & Related papers (2022-11-02T17:14:42Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Text Transformations in Contrastive Self-Supervised Learning: A Review [27.25193476131943]
We formalize the contrastive learning framework in the domain of natural language processing.
We describe some challenges and potential directions for learning better text representations using contrastive methods.
arXiv Detail & Related papers (2022-03-22T19:02:43Z) - Adversarial Training with Contrastive Learning in NLP [0.0]
We propose adversarial training with contrastive learning (ATCL) to adversarially train a language processing task.
The core idea is to make linear perturbations in the embedding space of the input via fast gradient methods (FGM) and train the model to keep the original and perturbed representations close via contrastive learning.
The results show not only an improvement in the quantitative (perplexity and BLEU) scores when compared to the baselines, but ATCL also achieves good qualitative results in the semantic level for both tasks.
arXiv Detail & Related papers (2021-09-19T07:23:45Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.