Related papers: Compressibility of Distributed Document Representations

Compressibility of Distributed Document Representations

URL: http://arxiv.org/abs/2110.07595v1
Date: Thu, 14 Oct 2021 17:56:35 GMT
Title: Compressibility of Distributed Document Representations
Authors: Bla\v{z} \v{S}krlj and Matej Petkovi\v{c}
Abstract summary: CoRe is a representation learner-agnostic framework suitable for representation compression. We show CoRe's behavior when considering contextual and non-contextual document representations, different compression levels, and 9 different compression algorithms. Results based on more than 100,000 compression experiments indicate that CoRe offers a very good trade-off between the compression efficiency and performance.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contemporary natural language processing (NLP) revolves around learning from latent document representations, generated either implicitly by neural language models or explicitly by methods such as doc2vec or similar. One of the key properties of the obtained representations is their dimension. Whilst the commonly adopted dimensions of 256 and 768 offer sufficient performance on many tasks, it is many times unclear whether the default dimension is the most suitable choice for the subsequent downstream learning tasks. Furthermore, representation dimensions are seldom subject to hyperparameter tuning due to computational constraints. The purpose of this paper is to demonstrate that a surprisingly simple and efficient recursive compression procedure can be sufficient to both significantly compress the initial representation, but also potentially improve its performance when considering the task of text classification. Having smaller and less noisy representations is the desired property during deployment, as orders of magnitude smaller models can significantly reduce the computational overload and with it the deployment costs. We propose CoRe, a straightforward, representation learner-agnostic framework suitable for representation compression. The CoRe's performance is showcased and studied on a collection of 17 real-life corpora from biomedical, news, social media, and literary domains. We explored CoRe's behavior when considering contextual and non-contextual document representations, different compression levels, and 9 different compression algorithms. Current results based on more than 100,000 compression experiments indicate that recursive Singular Value Decomposition offers a very good trade-off between the compression efficiency and performance, making CoRe useful in many existing, representation-dependent NLP pipelines.

Related papers

When Dimensionality Hurts: The Role of LLM Embedding Compression for Noisy Regression Tasks [17.109522466982476]
We show that compressed representations of text can yield better performance in regression tasks. Our results suggest that the success of interpretable compressed representations such as sentiment may be due to a regularising effect.
arXiv Detail & Related papers (2025-02-04T10:23:11Z)
Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties. We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation [28.432799973328127]
We propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations.
arXiv Detail & Related papers (2022-03-15T07:05:43Z)
Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects. Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency. We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z)
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform [4.622165486890318]
An intrinsic limitation of the Trasformer architectures arises from the computation of the dot-product attention. Our idea takes inspiration from the world of lossy data compression (such as the JPEG algorithm) to derive an approximation of the attention module. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time.
arXiv Detail & Related papers (2022-03-02T15:25:27Z)
Implicit Neural Representations for Image Compression [103.78615661013623]
Implicit Neural Representations (INRs) have gained attention as a novel and effective representation for various data types. We propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. We find that our approach to source compression with INRs vastly outperforms similar prior work.
arXiv Detail & Related papers (2021-12-08T13:02:53Z)
SDR: Efficient Neural Re-ranking using Succinct Document Representation [4.9278175139681215]
We propose the Succinct Document Representation scheme that computes emphhighly compressed intermediate document representations. Our method is highly efficient, achieving 4x-11.6x better compression rates for the same ranking quality.
arXiv Detail & Related papers (2021-10-03T07:43:16Z)
Efficient Inference via Universal LSH Kernel [35.22983601434134]
We propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name. We show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in complexity without any drop in accuracy.
arXiv Detail & Related papers (2021-06-21T22:06:32Z)
Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective [69.99087941471882]
We study the problem of supporting multiple machine vision analytics tasks with the compressed visual representation. By utilizing the intrinsic transferability among different tasks, our framework successfully constructs compact and expressive representations at low bit-rates. In order to impose compactness in the representations, we propose a codebook-based hyperprior.
arXiv Detail & Related papers (2021-06-16T01:44:32Z)
Dimensionality Reduction for Sentiment Classification: Evolving for the Most Prominent and Separable Features [4.156782836736784]
In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions. In the existing dimensionality reduction techniques, the number of components needs to be set manually which results in loss of the most prominent features. We have proposed a new framework that consists of two-dimensionality reduction techniques i.e., Sentiment Term Presence Count (SentiTPC) and Sentiment Term Presence Ratio (SentiTPR)
arXiv Detail & Related papers (2020-06-01T09:46:52Z)
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE) We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.