Robust Open-Vocabulary Translation from Visual Text Representations
- URL: http://arxiv.org/abs/2104.08211v1
- Date: Fri, 16 Apr 2021 16:37:13 GMT
- Title: Robust Open-Vocabulary Translation from Visual Text Representations
- Authors: Elizabeth Salesky, David Etter, Matt Post
- Abstract summary: Machine translation models have discrete and commonly 'open-vocabulary' subword segmentation techniques.
This approach relies on consistent and correct underlying vocabularies.
Motivated by human language processing, we propose the use of visual text representations.
- Score: 15.646399508495133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine translation models have discrete vocabularies and commonly use
subword segmentation techniques to achieve an 'open-vocabulary.' This approach
relies on consistent and correct underlying unicode sequences, and makes models
susceptible to degradation from common types of noise and variation. Motivated
by the robustness of human language processing, we propose the use of visual
text representations, which dispense with a finite set of text embeddings in
favor of continuous vocabularies created by processing visually rendered text.
We show that models using visual text representations approach or match
performance of text baselines on clean TED datasets. More importantly, models
with visual embeddings demonstrate significant robustness to varied types of
noise, achieving e.g., 25.9 BLEU on a character permuted German--English task
where subword models degrade to 1.9.
Related papers
- Survey on Abstractive Text Summarization: Dataset, Models, and Metrics [0.8184895397419141]
Transformer models are distinguished by their attention mechanisms, pretraining on general knowledge, and fine-tuning for downstream tasks.
This survey examines the state of the art in text summarization models, with a specific focus on the abstractive summarization approach.
arXiv Detail & Related papers (2024-12-22T21:18:40Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models and Large Language Models [52.23899502520261]
We introduce a novel framework named, ARTIST, which incorporates a dedicated textual diffusion model to focus on the learning of text structures specifically.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
This disentangled architecture design and training strategy significantly enhance the text rendering ability of the diffusion models for text-rich image generation.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - Fine-tuning CLIP Text Encoders with Two-step Paraphrasing [83.3736789315201]
We introduce a straightforward fine-tuning approach to enhance the representations of CLIP models for paraphrases.
Our model, which we call ParaCLIP, exhibits significant improvements over baseline CLIP models across various tasks.
arXiv Detail & Related papers (2024-02-23T06:11:50Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - Text Generation with Text-Editing Models [78.03750739936956]
This tutorial provides a comprehensive overview of text-editing models and current state-of-the-art approaches.
We discuss challenges related to productionization and how these models can be used to mitigate hallucination and bias.
arXiv Detail & Related papers (2022-06-14T17:58:17Z) - Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models
Robust with Little Cost [5.672132510411465]
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary words.
We follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words.
We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters.
arXiv Detail & Related papers (2022-03-15T13:11:07Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Contextualized Spoken Word Representations from Convolutional
Autoencoders [2.28438857884398]
This paper proposes a Convolutional Autoencoder based neural architecture to model syntactically and semantically adequate contextualized representations of varying length spoken words.
The proposed model was able to demonstrate its robustness when compared to the other two language-based models.
arXiv Detail & Related papers (2020-07-06T16:48:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.