Semantic Text Compression for Classification
- URL: http://arxiv.org/abs/2309.10809v1
- Date: Tue, 19 Sep 2023 17:50:57 GMT
- Title: Semantic Text Compression for Classification
- Authors: Emrecan Kutay and Aylin Yener
- Abstract summary: We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification.
We propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning.
- Score: 17.259824817932294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study semantic compression for text where meanings contained in the text
are conveyed to a source decoder, e.g., for classification. The main motivator
to move to such an approach of recovering the meaning without requiring exact
reconstruction is the potential resource savings, both in storage and in
conveying the information to another node. Towards this end, we propose
semantic quantization and compression approaches for text where we utilize
sentence embeddings and the semantic distortion metric to preserve the meaning.
Our results demonstrate that the proposed semantic approaches result in
substantial (orders of magnitude) savings in the required number of bits for
message representation at the expense of very modest accuracy loss compared to
the semantic agnostic baseline. We compare the results of proposed approaches
and observe that resource savings enabled by semantic quantization can be
further amplified by semantic clustering. Importantly, we observe the
generalizability of the proposed methodology which produces excellent results
on many benchmark text classification datasets with a diverse array of
contexts.
Related papers
- Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion.
It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing.
Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - Crossword: A Semantic Approach to Data Compression via Masking [38.107509264270924]
This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further.
The proposed masking-based strategy resembles the above game.
In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer.
arXiv Detail & Related papers (2023-04-03T16:04:06Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Transferring Semantic Knowledge Into Language Encoders [6.85316573653194]
We introduce semantic form mid-tuning, an approach for transferring semantic knowledge from semantic meaning representations into language encoders.
We show that this alignment can be learned implicitly via classification or directly via triplet loss.
Our method yields language encoders that demonstrate improved predictive performance across inference, reading comprehension, textual similarity, and other semantic tasks.
arXiv Detail & Related papers (2021-10-14T14:11:12Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z) - Heavy-tailed Representations, Text Polarity Classification & Data
Augmentation [11.624944730002298]
We develop a novel method to learn a heavy-tailed embedding with desirable regularity properties.
A classifier dedicated to the tails of the proposed embedding is obtained which performance outperforms the baseline.
Numerical experiments on synthetic and real text data demonstrate the relevance of the proposed framework.
arXiv Detail & Related papers (2020-03-25T19:24:05Z) - Revisiting Paraphrase Question Generator using Pairwise Discriminator [25.449902612898594]
We propose a novel method for obtaining sentence-level embeddings.
The proposed method results in semantic embeddings and outperforms the state-of-the-art on the paraphrase generation and sentiment analysis task.
arXiv Detail & Related papers (2019-12-31T02:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.