Related papers: Crossword: A Semantic Approach to Data Compression via Masking

Crossword: A Semantic Approach to Data Compression via Masking

URL: http://arxiv.org/abs/2304.01106v1
Date: Mon, 3 Apr 2023 16:04:06 GMT
Title: Crossword: A Semantic Approach to Data Compression via Masking
Authors: Mingxiao Li, Rui Jin, Liyao Xiang, Kaiming Shen, Shuguang Cui
Abstract summary: This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer.
Score: 38.107509264270924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental limit as entropy for lossless compression and as mutual information for lossy compression. However, the source (including text, music, and speech) in the real world is often statistically ill-defined because of its close connection to human perception, and thus the model-driven approach can be quite suboptimal. This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The main idea stems from the puzzle crossword, observing that the hidden words can still be precisely reconstructed so long as some key letters are provided. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer. Our experiments show that the proposed semantic approach can achieve much higher compression efficiency than the traditional methods such as Huffman code and UTF-8 code, while preserving the meaning in the target text to a great extent.

Related papers

Statistical Mechanics of Semantic Compression [0.0]
We take inspiration from cognitive neuroscience and machine learning and model semantic space as a continuous Euclidean vector space. We map the optimization problem of determining the minimal-length, meaning-preserving message to a spin glass Hamiltonian. We argue that while the problem of finding a meaning-preserving compression is computationally hard in the worst case, there exist efficient algorithms which achieve near optimal performance.
arXiv Detail & Related papers (2025-03-01T20:38:16Z)
Hierarchical Semantic Compression for Consistent Image Semantic Restoration [62.97519327310638]
We propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision.
arXiv Detail & Related papers (2025-02-24T03:20:44Z)
AlphaZip: Neural Network-Enhanced Lossless Text Compression [0.0]
This paper introduces a lossless text compression approach using a Large Language Model (LLM) The method involves two key steps: first, prediction using a dense neural network architecture, such as a transformer block; second, compressing the predicted ranks with standard compression algorithms like Adaptive Huffman, LZ77, or Gzip.
arXiv Detail & Related papers (2024-09-23T14:21:06Z)
SMC++: Masked Learning of Unsupervised Video Semantic Compression [54.62883091552163]
We propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics. MVM is proficient at learning generalizable semantics through the masked patch prediction task. It may also encode non-semantic information like trivial textural details, wasting bitcost and bringing semantic noises.
arXiv Detail & Related papers (2024-06-07T09:06:40Z)
Perceptual Image Compression with Cooperative Cross-Modal Side Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z)
Semantic Text Compression for Classification [17.259824817932294]
We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. We propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning.
arXiv Detail & Related papers (2023-09-19T17:50:57Z)
EntropyRank: Unsupervised Keyphrase Extraction via Side-Information Optimization for Language Model-based Text Compression [62.261476176242724]
We propose an unsupervised method to extract keywords and keyphrases from texts based on a pre-trained language model (LM) and Shannon's information. Specifically, our method extracts phrases having the highest conditional entropy under the LM.
arXiv Detail & Related papers (2023-08-25T14:23:40Z)
Towards Semantic Communications: Deep Learning-Based Image Semantic Coding [42.453963827153856]
We conceive the semantic communications for image data that is much more richer in semantics and bandwidth sensitive. We propose an reinforcement learning based adaptive semantic coding (RL-ASC) approach that encodes images beyond pixel level. Experimental results demonstrate that the proposed RL-ASC is noise robust and could reconstruct visually pleasant and semantic consistent image.
arXiv Detail & Related papers (2022-08-08T12:29:55Z)
Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models. Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z)
Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding. backbone information, meaning the gist of the input text, is not specifically focused on. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z)
Learning Directly from Grammar Compressed Text [17.91878224879985]
We propose a method to apply neural sequence models to text data compressed with grammar compression algorithms without decompression. To encode the unique symbols that appear in compression rules, we introduce composer modules to incrementally encode the symbols into vector representations.
arXiv Detail & Related papers (2020-02-28T06:51:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.