Related papers: More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss

More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss

URL: http://arxiv.org/abs/2409.14364v2
Date: Fri, 27 Sep 2024 09:13:19 GMT
Title: More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss
Authors: Runsong Zhao, Pengcheng Huang, Xinyu Liu, Chunyang Xiao, Tong Xiao, Jingbo Zhu,
Abstract summary: We study the position identifier choices for compressed tokens and also propose a new compression loss. We demonstrate empirically that our proposed methods achieve significantly higher compression ratios (15x compared to 4x for ICAE)
Score: 51.05017281146084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency. Based on the compression method ICAE, we carefully examine the position identifier choices for compressed tokens and also propose a new compression loss. We demonstrate empirically that our proposed methods achieve significantly higher compression ratios (15x compared to 4x for ICAE), while being able to attain comparable reconstruction performance.

Related papers

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [76.72669153805018]
EoRA consistently outperforms previous methods in compensating errors for compressed LLaMA2/3 models on various tasks. EoRA offers a scalable, training-free solution to compensate for compression errors.
arXiv Detail & Related papers (2024-10-28T17:59:03Z)
Perception Compressor:A training-free prompt compression method in long context scenarios [17.720102137585503]
Perception is a training-free prompt compression method for large language models. It outperforms existing methods by a large margin, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-09-28T07:13:33Z)
Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression. Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively. Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z)
Lossy and Lossless (L$^2$) Post-training Model Size Compression [12.926354646945397]
We propose a post-training model size compression method that combines lossy and lossless compression in a unified way. Our method can achieve a stable $10times$ compression ratio without sacrificing accuracy and a $20times$ compression ratio with minor accuracy loss in a short time.
arXiv Detail & Related papers (2023-08-08T14:10:16Z)
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods. KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation. Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z)
Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image Compression [85.93207826513192]
We propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. We solve the joint lossy and residual compression problem in the approach of VAEs. In the near-lossless mode, we quantize the original residuals to satisfy a given $ell_infty$ error bound.
arXiv Detail & Related papers (2022-09-11T12:11:56Z)
Modeling Image Quantization Tradeoffs for Optimal Compression [0.0]
Lossy compression algorithms target tradeoffs by quantizating high frequency data to increase compression rates. We propose a new method of optimizing quantization tables using Deep Learning and a minimax loss function.
arXiv Detail & Related papers (2021-12-14T07:35:22Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding [5.687243501594734]
Variational Autoencoders (VAEs) have seen widespread use in learned image compression. We propose a novel method, Relative Entropy Coding (REC), that can directly encode the latent representation with codelength close to the relative entropy for single images.
arXiv Detail & Related papers (2020-10-02T20:23:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.