Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for
Two-Dimensional Sentence
- URL: http://arxiv.org/abs/2103.08387v1
- Date: Mon, 15 Mar 2021 13:52:47 GMT
- Title: Sent2Matrix: Folding Character Sequences in Serpentine Manifolds for
Two-Dimensional Sentence
- Authors: Hongyang Gao, Yi Liu, Xuan Zhang, Shuiwang Ji
- Abstract summary: We propose to convert texts into 2-D representations and develop the Sent2Matrix method.
Our method allows for the explicit incorporation of both word morphologies and boundaries.
Notably, our method is the first attempt to represent texts in 2-D formats.
- Score: 54.6266741821988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study text representation methods using deep models. Current methods, such
as word-level embedding and character-level embedding schemes, treat texts as
either a sequence of atomic words or a sequence of characters. These methods
either ignore word morphologies or word boundaries. To overcome these
limitations, we propose to convert texts into 2-D representations and develop
the Sent2Matrix method. Our method allows for the explicit incorporation of
both word morphologies and boundaries. When coupled with a novel serpentine
padding method, our Sent2Matrix method leads to an interesting visualization in
which 1-D character sequences are folded into 2-D serpentine manifolds.
Notably, our method is the first attempt to represent texts in 2-D formats.
Experimental results on text classification tasks shown that our method
consistently outperforms prior embedding methods.
Related papers
- Greed is All You Need: An Evaluation of Tokenizer Inference Methods [4.300681074103876]
We provide a controlled analysis of seven tokenizer inference methods across four different algorithms and three vocabulary sizes.
We show that for the most commonly used tokenizers, greedy inference performs surprisingly well; and that SaGe, a recently-introduced contextually-informed tokenizer, outperforms all others on morphological alignment.
arXiv Detail & Related papers (2024-03-02T19:01:40Z) - Unsupervised Text Style Transfer via LLMs and Attention Masking with
Multi-way Interactions [18.64326057581588]
Unsupervised Text Style Transfer (UTST) has emerged as a critical task within the domain of Natural Language Processing (NLP)
We propose four ways of interactions, that are pipeline framework with tuned orders; knowledge distillation from Large Language Models (LLMs) to attention masking model; in-context learning with constructed parallel examples.
We empirically show these multi-way interactions can improve the baselines in certain perspective of style strength, content preservation and text fluency.
arXiv Detail & Related papers (2024-02-21T09:28:02Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - Integrating Bidirectional Long Short-Term Memory with Subword Embedding
for Authorship Attribution [2.3429306644730854]
Manifold word-based stylistic markers have been successfully used in deep learning methods to deal with the intrinsic problem of authorship attribution.
The proposed method was experimentally evaluated against numerous state-of-the-art methods across the public corporal of CCAT50, IMDb62, Blog50, and Twitter50.
arXiv Detail & Related papers (2023-06-26T11:35:47Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Text Detoxification using Large Pre-trained Neural Models [57.72086777177844]
We present two novel unsupervised methods for eliminating toxicity in text.
First method combines guidance of the generation process with small style-conditional language models.
Second method uses BERT to replace toxic words with their non-offensive synonyms.
arXiv Detail & Related papers (2021-09-18T11:55:32Z) - Transductive Learning for Unsupervised Text Style Transfer [60.65782243927698]
Unsupervised style transfer models are mainly based on an inductive learning approach.
We propose a novel transductive learning approach based on a retrieval-based context-aware style representation.
arXiv Detail & Related papers (2021-09-16T08:57:20Z) - TextStyleBrush: Transfer of Text Aesthetics from a Single Example [16.29689649632619]
We present a novel approach for disentangling the content of a text image from all aspects of its appearance.
We learn this disentanglement in a self-supervised manner.
We show results in different text domains which were previously handled by specialized methods.
arXiv Detail & Related papers (2021-06-15T19:28:49Z) - Unsupervised learning of text line segmentation by differentiating
coarse patterns [0.0]
We present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity.
Text line segmentation can be easily implemented using standard techniques with the embedded feature vectors.
We evaluate the method qualitatively and quantitatively on several variants of text line segmentation datasets to demonstrate its effectivity.
arXiv Detail & Related papers (2021-05-19T21:21:30Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.