Related papers: Semantic Text Transmission via Prediction with Small Language Models: Cost-Similarity Trade-off

Semantic Text Transmission via Prediction with Small Language Models: Cost-Similarity Trade-off

URL: http://arxiv.org/abs/2403.00290v1
Date: Fri, 1 Mar 2024 05:20:16 GMT
Title: Semantic Text Transmission via Prediction with Small Language Models: Cost-Similarity Trade-off
Authors: Bhavani A Madhabhavi, Gangadhar Karevvanavar, Rajshekhar V Bhat and Nikolaos Pappas
Abstract summary: We exploit language's inherent correlations and predictability to constrain transmission costs by allowing the destination to predict or complete words. We obtain $(barc, bars)$ pairs for neural language and first-order Markov chain-based small language models. We demonstrate that, when communication occurs over a noiseless channel, the threshold policy achieves a higher $bars$ for a given $barc$ than the periodic policy.
Score: 7.666188363531336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the communication of natural language text from a source to a destination over noiseless and character-erasure channels. We exploit language's inherent correlations and predictability to constrain transmission costs by allowing the destination to predict or complete words with potential dissimilarity with the source text. Concretely, our objective is to obtain achievable $(\bar{c}, \bar{s})$ pairs, where $\bar{c}$ is the average transmission cost at the source and $\bar{s}$ is the average semantic similarity measured via cosine similarity between vector embedding of words at the source and those predicted/completed at the destination. We obtain $(\bar{c}, \bar{s})$ pairs for neural language and first-order Markov chain-based small language models (SLM) for prediction, using both a threshold policy that transmits a word if its cosine similarity with that predicted/completed at the destination is below a threshold, and a periodic policy, which transmits words after a specific interval and predicts/completes the words in between, at the destination. We adopt an SLM for word completion. We demonstrate that, when communication occurs over a noiseless channel, the threshold policy achieves a higher $\bar{s}$ for a given $\bar{c}$ than the periodic policy and that the $\bar{s}$ achieved with the neural SLM is greater than or equal to that of the Markov chain-based algorithm for the same $\bar{c}$. The improved performance comes with a higher complexity in terms of time and computing requirements. However, when communication occurs over a character-erasure channel, all prediction algorithms and scheduling policies perform poorly. Furthermore, if character-level Huffman coding is used, the required $\bar{c}$ to achieve a given $\bar{s}$ is reduced, but the above observations still apply.

Related papers

Coupling without Communication and Drafter-Invariant Speculative Decoding [21.19028671377106]
Communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding. We show that communication-free protocols yield a variant of speculative decoding that we call Drafter-Invariant Speculative Decoding.
arXiv Detail & Related papers (2024-08-15T06:52:24Z)
A Semantic Distance Metric Learning approach for Lexical Semantic Change Detection [30.563130208194977]
A Lexical Semantic Change Detection (SCD) task involves predicting whether a given target word, $w$, changes its meaning between two different text corpora. We propose a supervised two-staged SCD method that uses existing Word-in-Context (WiC) datasets. Experimental results on multiple benchmark datasets for SCD show that our proposed method achieves strong performance in multiple languages.
arXiv Detail & Related papers (2024-03-01T02:09:25Z)
$\textit{Swap and Predict}$ -- Predicting the Semantic Changes in Words across Corpora by Context Swapping [36.10628959436778]
We consider the problem of predicting whether a given target word, $w$, changes its meaning between two different text corpora. We propose an unsupervised method that randomly swaps contexts between $mathcalC$ and $mathcalC$. Our method achieves significant performance improvements compared to strong baselines for the English semantic change prediction task.
arXiv Detail & Related papers (2023-10-16T13:39:44Z)
Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes. We propose a provably efficient algorithm based on value that enable asynchronous communication. We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z)
Biased Random Access Codes [0.0]
A random access code (RAC) is a communication task in which the sender encodes a random message into a shorter one to be decoded by the receiver. We present algorithms that allow a numerical evaluation of the optimal performance over both classical and quantum strategies.
arXiv Detail & Related papers (2023-02-16T18:53:25Z)
Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms set some words' probabilities to zero at each step. We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z)
Blessing of Class Diversity in Pre-training [54.335530406959435]
We prove that when the classes of the pre-training task are sufficiently diverse, pre-training can significantly improve the sample efficiency of downstream tasks. Our proof relies on a vector-form Rademacher complexity chain rule for composite function classes and a modified self-concordance condition.
arXiv Detail & Related papers (2022-09-07T20:10:12Z)
Supervised Training of Conditional Monge Maps [107.78770597815242]
Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. We introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells.
arXiv Detail & Related papers (2022-06-28T19:34:44Z)
Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$. We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM) Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z)
Communication-efficient SGD: From Local SGD to One-Shot Averaging [16.00658606157781]
We consider speeding up gradient descent (SGD) by parallelizing it across multiple workers. We suggest a Local SGD scheme that communicates less overall by communicating less frequently as the number of iterations grows.
arXiv Detail & Related papers (2021-06-09T01:10:34Z)
Sparse Normal Means Estimation with Sublinear Communication [13.257075075199781]
We consider the problem of normal means estimation in a distributed setting with communication constraints. We show that once the signal-to-noise ratio (SNR) is slightly higher, the support of $mu$ can be correctly recovered with much less communication.
arXiv Detail & Related papers (2021-02-05T08:52:25Z)
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps. DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z)
List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions. Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset. A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
Capacity Approaching Coding for Low Noise Interactive Quantum Communication, Part I: Large Alphabets [15.078027648304115]
We consider the problem of implementing two-party interactive quantum communication over noisy channels. For a noiseless qudit channel over a $mathrmpoly(n)$ size alphabet, our main result is a simulation method that fails with probability less than $2-Theta(nepsilon)$ We conjecture that it is optimal up to a constant factor in the $sqrtepsilon$ term.
arXiv Detail & Related papers (2020-01-09T02:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.