Robust Open-Vocabulary Translation from Visual Text Representations
- URL: http://arxiv.org/abs/2104.08211v1
- Date: Fri, 16 Apr 2021 16:37:13 GMT
- Title: Robust Open-Vocabulary Translation from Visual Text Representations
- Authors: Elizabeth Salesky, David Etter, Matt Post
- Abstract summary: Machine translation models have discrete and commonly 'open-vocabulary' subword segmentation techniques.
This approach relies on consistent and correct underlying vocabularies.
Motivated by human language processing, we propose the use of visual text representations.
- Score: 15.646399508495133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine translation models have discrete vocabularies and commonly use
subword segmentation techniques to achieve an 'open-vocabulary.' This approach
relies on consistent and correct underlying unicode sequences, and makes models
susceptible to degradation from common types of noise and variation. Motivated
by the robustness of human language processing, we propose the use of visual
text representations, which dispense with a finite set of text embeddings in
favor of continuous vocabularies created by processing visually rendered text.
We show that models using visual text representations approach or match
performance of text baselines on clean TED datasets. More importantly, models
with visual embeddings demonstrate significant robustness to varied types of
noise, achieving e.g., 25.9 BLEU on a character permuted German--English task
where subword models degrade to 1.9.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - Fine-tuning CLIP Text Encoders with Two-step Paraphrasing [83.3736789315201]
We introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases.
Our model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks.
arXiv Detail & Related papers (2024-02-23T06:11:50Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - Text Generation with Text-Editing Models [78.03750739936956]
This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches.
We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias.
arXiv Detail & Related papers (2022-06-14T17:58:17Z) - Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models
Robust with Little Cost [5.672132510411465]
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary words.
We follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words.
We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters.
arXiv Detail & Related papers (2022-03-15T13:11:07Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Contextualized Spoken Word Representations from Convolutional
Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words.
The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.